- Jul 19, 2022
-
-
Johan Alfvén authored
Tensorflow 2.9 contains a bug for int16x8 without biases. Revert "MLBEDSW-6635: Update to TensorFlow 2.9" This reverts commit 93f492ba. Signed-off-by:
Johan Alfven <johan.alfven@arm.com> Change-Id: I366d201ce4134a877d333be2aade546dfcb5d6d7
-
- Jul 15, 2022
-
-
Fredrik Svedberg authored
Added SHAPE operator to the supported operators report. Updated the constraints for QUANTIZE and SHAPE operator. Also fixed RESHAPE consuming statically optimised shape. Signed-off-by:
Fredrik Svedberg <fredrik.svedberg@arm.com> Change-Id: I1d964d602d3f361a0f16dae8133197280dd84c48
-
- Jul 14, 2022
-
-
Erik Andersson authored
Update the flatbuffers generated code to comply with TensorFlow 2.9 Signed-off-by:
erik.andersson@arm.com <erik.andersson@arm.com> Change-Id: I6bf506ffb85da2d4a57a32198b471513deeaca73
-
- Jul 13, 2022
-
-
Fredrik Svedberg authored
Added check to see if additional stripe data is needed from producer op when cascading to make sure the stripes are not overwriting data still being used. Also changed scheduler to make sure ResizeBilinear always runs with even stripe height. Signed-off-by:
Fredrik Svedberg <fredrik.svedberg@arm.com> Change-Id: If7d723e6be29575c2b55c400eebbe8275a1aa328
-
Fredrik Svedberg authored
Fixed static optimisation of Quantize operator by running unsupported formats on CPU. Also added support for int16 and corrected the calculation. Change-Id: I861c712aa6258dba53fcf4d5dae45d1d416e6141 Signed-off-by:
Fredrik Svedberg <fredrik.svedberg@arm.com>
-
- Jul 12, 2022
-
-
oliper01 authored
Hardswish activation function gets converted to LUT in graph optimizer. The case for it was removed, as it was never called. Signed-off-by:
oliper01 <oliver.perssonbogdanovski@arm.com> Change-Id: I376e8d7b81489c06b66d4e49f59b207600c0ccce
-
- Jul 11, 2022
-
-
Erik Andersson authored
Enabled elementwise cascading for binary/single variable IFM operators. Signed-off-by:
erik.andersson@arm.com <erik.andersson@arm.com> Change-Id: I1c0867875fdc5c4980224fb570185c11e719d5cd
-
- Jun 29, 2022
-
-
Ayaan Masood authored
*Quantise op becomes constant if input is known at compile time *Quantised values calculated if input of op is const and float *Const inputs to quant op that are int are requantized Change-Id: Ic94a72a392af709fe6a640d7dacbb5dc2334f16f Signed-off-by:
Ayaan Masood <Ayaan.Masood@arm.com>
-
Ayaan Masood authored
*Shape OP value is available at compile time hence it can be optimised *Disconnected shape OP at compile time from parent tensor *Transformed shape OP tensor into constant Change-Id: I0a024269e2b592c6146dd72e62d7a41951fb727a Signed-off-by:
Ayaan Masood <Ayaan.Masood@arm.com>
-
- Jun 27, 2022
-
-
Johan Alfvén authored
- The fast storage allocator is supposed to add all feature maps that does not fit in SRAM to an evicted list. However, in the case when conflicting tensors were handled the list was not updated. -This patch makes sure to update the list correctly. Signed-off-by:
Johan Alfven <johan.alfven@arm.com> Change-Id: Ibeb3b4e4927f22a8206784a478f1ac38bd7f5a87
-
- Jun 20, 2022
-
-
Johan Alfvén authored
- The fast storage allocator only looked at tensor size, giving priority to larger tensors. The problem with this method is that it does not consider the actual read/write access of the tensor. So, a smaller tensor size can cause higher memory transactions than a bigger one. - The solution is to calculate the read/write access of the tensor and add that score to the decision when deciding where to place the tensors. Signed-off-by:
Johan Alfven <johan.alfven@arm.com> Change-Id: I59eb9bd3a44a0238b576cfd8f09ff27012b99070
-
- Jun 17, 2022
-
-
Fredrik Svedberg authored
Improved block size selection by favouring larger block sizes for elementwise operations. Signed-off-by:
Fredrik Svedberg <fredrik.svedberg@arm.com> Change-Id: I5b30b358d84fcd672935b863c2154bd8f4ccd928
-
- Jun 08, 2022
-
-
Rickard Bolin authored
Vela was not able to parse config file paths entered with forward slashes. This patch will make it possible to use both forward and backslashes when specifying paths. Signed-off-by:
Rickard Bolin <rickard.bolin@arm.com> Change-Id: I0f4cfc16bde5738c73059af6216d2bdc3821c68b
-
- May 24, 2022
-
-
- Updated release notes and setup.py tag for 3.4 - Regenerated supported ops information Signed-off-by:
Tim Hall <tim.hall@arm.com> Change-Id: I4ec88544b84cab168cb3e5cbc6bc392b6b3d8a39
-
Rickard Bolin authored
One level deep relative paths (ie ./vela.ini) were treated as the name of a folder in config_files was ".". They are now treated as relative paths. The warning message when using an absolute path has also been moved to to the error message instead for a better user experience. Signed-off-by:
Rickard Bolin <rickard.bolin@arm.com> Change-Id: I7f7d4f904b9fbba97593e42203566057a2d36925
-
Rickard Bolin authored
The argument to the lstrip function is a list of all characters that should be stripped from the beginning of the string, in any order. To remove the actual prefix, check if the string starts with the string instead and then remove that amount of characters. The function "removeprefix" was added in python3.9 which does exactly this, but that is not yet available to vela since it supports python 3.7. Signed-off-by:
Rickard Bolin <rickard.bolin@arm.com> Change-Id: Ibc5a173c6d422cb5f55feb80caef6c5c30cf7d39
-
- May 23, 2022
-
-
Tim Hall authored
- The latest numpy versions require Python 3.8 - This can cause issues if Python 3.7 is installed which is the version that Vela is tested against - The fix is to limit the numpy version to those that support Python 3.7 Signed-off-by:
Tim Hall <tim.hall@arm.com> Change-Id: I3a388976d5aa76395ca93202e496640c8de9f6f4
-
- May 19, 2022
-
-
- For allocations that have a hard memory limit the Hill Climb allocator should be given more attempts to find a solution that would fit - The fix is to use a memory limit when there is a hard constraint, and a minimum iteration count, reset on every improvement, when there is a soft constraint - Added maximum number iterations CLI option Signed-off-by:
Tim Hall <tim.hall@arm.com> Change-Id: I19ff53a0b68412de280263626778a3102cbe52fa
-
- Problem is due to a divide by zero - Fix is simply to detect and assign zero. This could also affect improvement_sram Signed-off-by:
Tim Hall <tim.hall@arm.com> Change-Id: I29a67710a17ef22656fb5ecfe9476953ffa5533d
-
- The print_performance function that is called when using the --verbose-performance option crashed with KeyError when no SRAM was used. Signed-off-by:
Rickard Bolin <rickard.bolin@arm.com> Change-Id: Ib6af3193e8f4f368cb28d51e65afa0751773628a
-
- The npu cycles are not correct calculated when only one weight buffer is used, since weights can not be fetched in parallel. - Added new calculation in the single buffer case. Signed-off-by:
Johan Alfven <johan.alfven@arm.com> Change-Id: I8568912d11d137a298225ab77b8b3272613c76f6
-
Johan Alfvén authored
Update to the "Vela splitting network into two ethos operators" patch allowing the CPU pass to be moved last in the pass_list. Signed-off-by:
Johan Alfven <johan.alfven@arm.com> Change-Id: I2e8a299101e5d65e963327bed7c8d891fff6523e
-
- May 18, 2022
-
-
- Due to how the graph is traversed, the final pass list contained unnecessary multiple Ethos-U operators. Functionality wise not a problem but it adds extra context switching between CPU and NPU. - By applying sorting rules to the pass list, it is possible to create a more optimal pass list that reduces the numbers of Ethos-U operator. Signed-off-by:
Johan Alfven <johan.alfven@arm.com> Change-Id: Ib556f902e1f321b5c50238fada7aa92b9810b27a
-
Add directory structure to support third party config files. Config files should now be placed in an appropriately named directory under the config_files directory, but can also be accessed by providing its absolute path to vela --config. Signed-off-by:
Rickard Bolin <rickard.bolin@arm.com> Change-Id: I2fcf52e7b2ddd2c4491dc370c85c0b3937d18062
-
- May 17, 2022
-
-
Tim Hall authored
- Added support to print per operator sram usage and performance information - Added new CLI option --verbose-performance to control this feature Signed-off-by:
Tim Hall <tim.hall@arm.com> Change-Id: I368599b410e5d441d9804871fc51b7a1049d85b3
-
Johan Alfvén authored
Allow schedule do be used when calculations says zero total improvement but calculations on the other hand shows there are dram improvement. When testing on real target, total performance is improvement. Signed-off-by:
Johan Alfven <johan.alfven@arm.com> Change-Id: Ib4f2a37710dc7954b72b48c38fce4817ccd7187b
-
- May 16, 2022
-
-
Rickard Bolin authored
Uses separate tensors for the individual weight buffers in case of weight double buffering. Each weight buffer tensor gets its own individual live range. This patch is a clone of a previously reverted patch, but with some additional bug fixes applied. Signed-off-by:
Rickard Bolin <rickard.bolin@arm.com> Change-Id: I868c70d15821eb9f1399186f2da6e7345f6ee343
-
- May 12, 2022
-
-
Johan Alfvén authored
- Due to that bigger weight buffer sizes are being used, there are use cases when feature maps are evicted from SRAM, causing the total performance to drop. - A way to improve this is to limit the memory for those weight buffer ops, to get the feature maps back to SRAM, and see if total performance is improved. Signed-off-by:
Johan Alfven <johan.alfven@arm.com> Change-Id: Ibfaff330677185186af9f6362dfbe04824a329f6
-
- May 11, 2022
-
-
Johan Alfvén authored
Removing constraint for negative alpha value in ReLu for int8 and uint8. Signed-off-by:
Johan Alfven <johan.alfven@arm.com> Change-Id: Id7a3a30bf5d1f0a591f990bd04cd0dbbad5819c6
-
Dwight Lidman authored
This commit downgrades the required Python version to 3.7 from 3.8. Signed-off-by:
Dwight Lidman <dwight.lidman@arm.com> Change-Id: I07057908b97bcd94663f001474d877ba41411ae1
-
- Added the offset address to the command stream disassembly Signed-off-by:
Tim Hall <tim.hall@arm.com> Change-Id: I55c6ef59878c90c21d41051c076da6c1f0fa4201
-
This reverts commit d2b55106. Signed-off-by:
Tim Hall <tim.hall@arm.com> Change-Id: Ia3043bc9c27fe2f72f3ab2f6f7341b3a9adb4231
-
- May 09, 2022
-
-
Johan Alfvén authored
- Cascading a slice operator with read offsets is not supported by the rolling buffer mechanism causing the address to get out of range. - The fix is to prevent ops to be cascaded if they have read offsets. Signed-off-by:
Johan Alfven <johan.alfven@arm.com> Change-Id: Iea7f054ac4b5a7dadf905bbe947033247284c27e
-
- May 04, 2022
-
-
Tim Hall authored
This reverts commit cc5f4de1. Signed-off-by:
Tim Hall <tim.hall@arm.com> Change-Id: I0fa5babfe9ad9ec668720d04fe1c16d9a9092131
-
- Apr 27, 2022
-
-
Rickard Bolin authored
Generate flatbuffer files with relative imports. Signed-off-by:
Rickard Bolin <rickard.bolin@arm.com> Change-Id: Idd59bb2ebb829bc42677920577c1f8a04e23ca68
-
Rickard Bolin authored
Update the flatbuffers generated code to comply with TensorFlow 2.8 Signed-off-by:
Rickard Bolin <rickard.bolin@arm.com> Change-Id: Ia65325b88745e49dbafa803a38c0ea0e7d0478ba
-
- Apr 21, 2022
-
-
Ayaan Masood authored
*Added generic function which checks if underlying shape of FullyConnected operation is 2D and performs shape reduction *Fully connected operation >2 dimensions now run on NPU if the above case is satisfied *constraint_fc_output_2d and rewrite_fully_connected_input refactored *Added unit test to confirm this functionality Signed-off-by:
Ayaan Masood <Ayaan.Masood@arm.com> Change-Id: I0e29c767e5b84841eb53bbc44464b36a454f7b38
-
- Apr 20, 2022
-
-
Tim Hall authored
- This is due to calling range() on a non-integer value which in turn is due to a change in the behaviour of round() on numpy.float64 values - The fix is to always force the output of the round() to be an integer and thereby stop whole number floating point values propagating into the kernel dimensions which later feed into the range(). Signed-off-by:
Tim Hall <tim.hall@arm.com> Change-Id: Ic75cb6ba85a90c81c1d762067d89a10caaa13b92
-
Rickard Bolin authored
- Modify the operator clone function to also clone resampling mode attribute. A previous patch changed the ifm resampling mode to be an attribute of an operator rather than a tensor but did not modify the operator clone function to clone the new attribute. Signed-off-by:
Rickard Bolin <rickard.bolin@arm.com> Change-Id: I7a2f6103666a0997f657de20ad962e849976b904
-
- Apr 08, 2022
-
-
Johan Alfvén authored
Corrected calculation for used bufferering depth. Before change there were scenarios when it was set to smaller sizes than needed. Signed-off-by:
Johan Alfven <johan.alfven@arm.com> Change-Id: I162859ade78487e848510c6a605685e4568c7068
-