Commits · a9fba260ec050794bd24e968c3dbd4e202b9d643 · Limin Tang / Vela

Jul 19, 2022

MLBEDSW-6710: Revert Tensorflow 2.9 · a9fba260

Johan Alfvén authored Jul 19, 2022



Tensorflow 2.9 contains a bug for int16x8 without biases.

Revert "MLBEDSW-6635: Update to TensorFlow 2.9"

This reverts commit 93f492ba.

Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I366d201ce4134a877d333be2aade546dfcb5d6d7

a9fba260

Jul 15, 2022

MLBEDSW-6703 Add SHAPE operator to supported operators · 1156317a

Fredrik Svedberg authored Jul 06, 2022



Added SHAPE operator to the supported operators report.
Updated the constraints for QUANTIZE and SHAPE operator.
Also fixed RESHAPE consuming statically optimised shape.

Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I1d964d602d3f361a0f16dae8133197280dd84c48

1156317a

Jul 14, 2022

MLBEDSW-6635: Update to TensorFlow 2.9 · 93f492ba

Erik Andersson authored Jun 29, 2022



Update the flatbuffers generated code to comply with TensorFlow 2.9

Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: I6bf506ffb85da2d4a57a32198b471513deeaca73

93f492ba

Jul 13, 2022

MLBEDSW-6496 mlperf_deeplabv3_mnv2_ade20k_int8 fails at verify_output for u65 · d03dc504

Fredrik Svedberg authored Jun 30, 2022



Added check to see if additional stripe data is needed from producer op
when cascading to make sure the stripes are not overwriting data still
being used. Also changed scheduler to make sure ResizeBilinear always
runs with even stripe height.

Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: If7d723e6be29575c2b55c400eebbe8275a1aa328

d03dc504

MLBEDSW-6687 Vela crashes in npu_serialisation.py and tflite_graph_optimiser.py · a04f2f73

Fredrik Svedberg authored Jul 06, 2022



Fixed static optimisation of Quantize operator by running unsupported
formats on CPU. Also added support for int16 and corrected the
calculation.

Change-Id: I861c712aa6258dba53fcf4d5dae45d1d416e6141
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>

a04f2f73

Jul 12, 2022

MLBEDSW-4856: Removed dead code · c4d35eb5

oliper01 authored Jun 21, 2022

Hardswish activation function gets converted to LUT in graph optimizer. The case for it was removed, as it was never called.

Signed-off-by: oliper01 <oliver.perssonbogdanovski@arm.com>
Change-Id: I376e8d7b81489c06b66d4e49f59b207600c0ccce

c4d35eb5

Jul 11, 2022

MLBEDSW-6261: Elementwise cascading · 6b2a0b4a

Erik Andersson authored Mar 22, 2022

Enabled elementwise cascading for binary/single variable IFM operators.

Signed-off-by: erik.andersson@arm.com <erik.andersson@arm.com>
Change-Id: I1c0867875fdc5c4980224fb570185c11e719d5cd

6b2a0b4a

Jun 29, 2022

MLBEDSW-6314 Static optimisation for quantise OP · 25f48dd7

Ayaan Masood authored Jun 29, 2022



*Quantise op becomes constant if input is known at compile time
*Quantised values calculated if input of op is const and float
*Const inputs to quant op that are int are requantized

Change-Id: Ic94a72a392af709fe6a640d7dacbb5dc2334f16f
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>

25f48dd7

MLBEDSW-6313 Static optimisation for Shape OP · 4965faee

Ayaan Masood authored Jun 29, 2022



*Shape OP value is available at compile time hence
it can be optimised
*Disconnected shape OP at compile time from parent
tensor
*Transformed shape OP tensor into constant

Change-Id: I0a024269e2b592c6146dd72e62d7a41951fb727a
Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>

4965faee

Jun 27, 2022

MLBEDSW-6639: Bug fix for evicted FMS in the fast storage allocator · 68b8f2f9

Johan Alfvén authored Jun 24, 2022



- The fast storage allocator is supposed to add all feature maps
that does not fit in SRAM to an evicted list. However, in the
case when conflicting tensors were handled the list was not updated.
-This patch makes sure to update the list correctly.

Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ibeb3b4e4927f22a8206784a478f1ac38bd7f5a87

68b8f2f9

Jun 20, 2022

MLBEDSW-6347: Improved fast storage allocator · 5c30971e

Johan Alfvén authored Jun 10, 2022



- The fast storage allocator only looked at tensor size, giving priority
to larger tensors. The problem with this method is that it does not
consider the actual read/write access of the tensor. So, a smaller
tensor size can cause higher memory transactions than a bigger one.
- The solution is to calculate the read/write access of the tensor and
add that score to the decision when deciding where to place the tensors.

Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I59eb9bd3a44a0238b576cfd8f09ff27012b99070

5c30971e

Jun 17, 2022

MLBEDSW-6614 Improve elementwise block size selection · 5cc4c769

Fredrik Svedberg authored Jun 16, 2022



Improved block size selection by favouring larger
block sizes for elementwise operations.

Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>
Change-Id: I5b30b358d84fcd672935b863c2154bd8f4ccd928

5cc4c769

Jun 08, 2022

MLBEDSW-4783: Make config handling more user friendly · 7ce6b32b

Rickard Bolin authored Jun 02, 2022



Vela was not able to parse config file paths entered with forward
slashes. This patch will make it possible to use both forward and
backslashes when specifying paths.

Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I0f4cfc16bde5738c73059af6216d2bdc3821c68b

7ce6b32b

May 24, 2022

MLBEDSW-6422: Update release notes · 3ea7025f

Tim Hall authored May 24, 2022 and

Tim Hall committed May 24, 2022



 - Updated release notes and setup.py tag for 3.4
 - Regenerated supported ops information

Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I4ec88544b84cab168cb3e5cbc6bc392b6b3d8a39

3ea7025f

MLBEDSW-4783: Fix issue with relative paths to config files · 6d7a4f0e

Rickard Bolin authored May 24, 2022

One level deep relative paths (ie ./vela.ini) were treated as the name of a
folder in config_files was ".". They are now treated as relative paths.

The warning message when using an absolute path has also been moved to
to the error message instead for a better user experience.

Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7f7d4f904b9fbba97593e42203566057a2d36925

6d7a4f0e

MLBEDSW-6593: Issue with finding some config files · 9b8b4489

Rickard Bolin authored May 24, 2022

The argument to the lstrip function is a list of all characters that
should be stripped from the beginning of the string, in any order. To
remove the actual prefix, check if the string starts with the string
instead and then remove that amount of characters. The function
"removeprefix" was added in python3.9 which does exactly this, but
that is not yet available to vela since it supports python 3.7.

Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ibc5a173c6d422cb5f55feb80caef6c5c30cf7d39

9b8b4489

May 23, 2022

MLBEDSW-6406: Restrict numpy version limit · c3597d11

Tim Hall authored May 23, 2022



 - The latest numpy versions require Python 3.8
 - This can cause issues if Python 3.7 is installed which is the version that
Vela is tested against
 - The fix is to limit the numpy version to those that support Python 3.7

Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I3a388976d5aa76395ca93202e496640c8de9f6f4

c3597d11

May 19, 2022

MLBEDSW-6563: networks failing with memory area exceeded in vela · cda4fcb0

Tim Hall authored May 19, 2022 and

Tim Hall committed May 19, 2022



 - For allocations that have a hard memory limit the Hill Climb allocator
should be given more attempts to find a solution that would fit
 - The fix is to use a memory limit when there is a hard constraint, and
a minimum iteration count, reset on every improvement, when there is a soft
constraint
 - Added maximum number iterations CLI option

Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I19ff53a0b68412de280263626778a3102cbe52fa

cda4fcb0

MLBEDSW-6296: improvement_dram can become NaN · 8bc7a652

Tim Hall authored May 19, 2022 and

Tim Hall committed May 19, 2022



 - Problem is due to a divide by zero
 - Fix is simply to detect and assign zero. This could also affect
improvement_sram

Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I29a67710a17ef22656fb5ecfe9476953ffa5533d

8bc7a652

MLBEDSW-6271: Key error when using --verbose-performance option · eb5a4a85

Rickard Bolin authored May 19, 2022 and

Tim Hall committed May 19, 2022



- The print_performance function that is called when using the
--verbose-performance option crashed with KeyError when no SRAM was
used.

Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ib6af3193e8f4f368cb28d51e65afa0751773628a

eb5a4a85

MLBEDSW-6384: Updated weight buffering cycle calculation · 0f98de67

Johan Alfvén authored May 15, 2022 and

Tim Hall committed May 19, 2022



- The npu cycles are not correct calculated when only
one weight buffer is used, since weights can not
be fetched in parallel.
- Added new calculation in the single buffer case.

Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I8568912d11d137a298225ab77b8b3272613c76f6

0f98de67

MLBEDSW-6430: MLCE: Update to graph has sequential ethos-u ops · 1e363b10

Johan Alfvén authored May 19, 2022



Update to the "Vela splitting network into two ethos operators" patch
allowing the CPU pass to be moved last in the pass_list.

Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I2e8a299101e5d65e963327bed7c8d891fff6523e

1e363b10

May 18, 2022

MLBEDSW-6430: MLCE: Vela splitting network into two ethos operators · 0b20781c

Johan Alfvén authored Apr 19, 2022 and

Tim Hall committed May 18, 2022



- Due to how the graph is traversed, the final pass list contained unnecessary
multiple Ethos-U operators. Functionality wise not a problem but it adds extra
context switching between CPU and NPU.
- By applying sorting rules to the pass list, it is possible to create a more
optimal pass list that reduces the numbers of Ethos-U operator.

Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ib556f902e1f321b5c50238fada7aa92b9810b27a

0b20781c

MLBEDSW-4783: Add config file directory structure · 1538dce9

Rickard Bolin authored Apr 25, 2022 and

Tim Hall committed May 18, 2022



Add directory structure to support third party config files. Config
files should now be placed in an appropriately named directory under
the config_files directory, but can also be accessed by providing its
absolute path to vela --config.

Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I2fcf52e7b2ddd2c4491dc370c85c0b3937d18062

1538dce9

May 17, 2022

MLBEDSW-6271: MLCE: Layer wise Utilization info from Vela · c1be0873

Tim Hall authored Mar 03, 2022



 - Added support to print per operator sram usage and performance
information
 - Added new CLI option --verbose-performance to control this feature

Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I368599b410e5d441d9804871fc51b7a1049d85b3

c1be0873

MLBEDSW-6296: Updated condition for the opt size weight buffering schedule · 3dae1b60

Johan Alfvén authored May 17, 2022



Allow schedule do be used when calculations says zero total improvement
but calculations on the other hand shows there are dram improvement.
When testing on real target, total performance is improvement.

Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ib4f2a37710dc7954b72b48c38fce4817ccd7187b

3dae1b60

May 16, 2022

MLBEDSW-6263: Use separate tensors for double buffering · fd8b5000

Rickard Bolin authored May 16, 2022



Uses separate tensors for the individual weight buffers
in case of weight double buffering.

Each weight buffer tensor gets its own individual live range.

This patch is a clone of a previously reverted patch, but with some
additional bug fixes applied.

Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I868c70d15821eb9f1399186f2da6e7345f6ee343

fd8b5000

May 12, 2022

MLBEDSW-6296: Regression caused by bigger weight buffering size · 6f4cb036

Johan Alfvén authored May 05, 2022



- Due to that bigger weight buffer sizes are being used, there are use cases
when feature maps are evicted from SRAM, causing the total performance to drop.
- A way to improve this is to limit the memory for those weight buffer ops,
to get the feature maps back to SRAM, and see if total performance is improved.

Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Ibfaff330677185186af9f6362dfbe04824a329f6

6f4cb036

May 11, 2022

MLBEDSW-6454: Enable ReLu with negative alpha value · e51a05c4

Johan Alfvén authored May 11, 2022



Removing constraint for negative alpha value in ReLu
for int8 and uint8.

Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Id7a3a30bf5d1f0a591f990bd04cd0dbbad5819c6

e51a05c4

MLBEDSW-6518: Change to Python 3.7 · f018a683

Dwight Lidman authored May 11, 2022



This commit downgrades the required Python version
to 3.7 from 3.8.

Signed-off-by: Dwight Lidman <dwight.lidman@arm.com>
Change-Id: I07057908b97bcd94663f001474d877ba41411ae1

f018a683

MLBEDSW-6452: Add byte offset in command stream · 114baba1

Tim Hall authored May 10, 2022 and

Johan Alfvén committed May 11, 2022



 - Added the offset address to the command stream disassembly

Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I55c6ef59878c90c21d41051c076da6c1f0fa4201

114baba1

Revert "MLBEDSW-6312: Find block config improvement" · e80038a6
Tim Hall authored May 10, 2022 and Johan Alfvén committed May 11, 2022
```
This reverts commit d2b55106.

Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ia3043bc9c27fe2f72f3ab2f6f7341b3a9adb4231
```
e80038a6

May 09, 2022

MLBEDSW-6500: Address offset out of range · ab677b30

Johan Alfvén authored May 09, 2022



- Cascading a slice operator with read offsets is not
supported by the rolling buffer mechanism causing the
address to get out of range.
- The fix is to prevent ops to be cascaded if they have
read offsets.

Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: Iea7f054ac4b5a7dadf905bbe947033247284c27e

ab677b30

May 04, 2022

Revert "MLBEDSW-6263: Use separate tensors for double buffering" · b5df773e

Tim Hall authored May 04, 2022



This reverts commit cc5f4de1.

Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: I0fa5babfe9ad9ec668720d04fe1c16d9a9092131

b5df773e

Apr 27, 2022

MLBEDSW-6425: Update to TensorFlow 2.8 (bugfix) · 95b07c1c

Rickard Bolin authored Apr 27, 2022



Generate flatbuffer files with relative imports.

Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Idd59bb2ebb829bc42677920577c1f8a04e23ca68

95b07c1c

MLBEDSW-6425: Update to TensorFlow 2.8 · d66f8017

Rickard Bolin authored Apr 21, 2022



Update the flatbuffers generated code to comply with TensorFlow 2.8

Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: Ia65325b88745e49dbafa803a38c0ea0e7d0478ba

d66f8017

Apr 21, 2022

MLBEDSW-5384 FC layers run on NPU if underlying shape is 2D · a2ec5aa7

Ayaan Masood authored Apr 21, 2022



*Added generic function which checks if underlying shape of
FullyConnected operation is 2D and performs shape reduction
*Fully connected operation >2 dimensions now run on NPU if the above
case is satisfied
*constraint_fc_output_2d and rewrite_fully_connected_input refactored
*Added unit test to confirm this functionality

Signed-off-by: Ayaan Masood <Ayaan.Masood@arm.com>
Change-Id: I0e29c767e5b84841eb53bbc44464b36a454f7b38

a2ec5aa7

Apr 20, 2022

MLBEDSW-6407: Vela fails with TypeError in npu_performance · f9267da3

Tim Hall authored Apr 20, 2022



 - This is due to calling range() on a non-integer value which in turn is due
to a change in the behaviour of round() on numpy.float64 values
 - The fix is to always force the output of the round() to be an integer and
thereby stop whole number floating point values propagating into the kernel
dimensions which later feed into the range().

Signed-off-by: Tim Hall <tim.hall@arm.com>
Change-Id: Ic75cb6ba85a90c81c1d762067d89a10caaa13b92

f9267da3

MLBEDSW-6371: Output diff caused by operator clone bug · 814d01f5

Rickard Bolin authored Apr 19, 2022



- Modify the operator clone function to also clone resampling mode
attribute.

A previous patch changed the ifm resampling mode to be an attribute of
an operator rather than a tensor but did not modify the operator clone
function to clone the new attribute.

Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: I7a2f6103666a0997f657de20ad962e849976b904

814d01f5

Apr 08, 2022

MLBEDSW-6339 Performance drop on wav2letter · cce7f2d9

Johan Alfvén authored Apr 08, 2022



Corrected calculation for used bufferering depth. Before change there
were scenarios when it was set to smaller sizes than needed.

Signed-off-by: Johan Alfven <johan.alfven@arm.com>
Change-Id: I162859ade78487e848510c6a605685e4568c7068

cce7f2d9