Commits · e575dd7576e8869bad2c14bb21902ffde3e630fd · artificial-intelligence / ethos-u / Vela

Feb 07, 2025

MLBEDSW-9879 Update contribution documentation · e575dd75

Fredrik Svedberg authored Jan 13, 2025



Updated contribution documentation with information about forks.

Change-Id: I47c03c03aa2bf90013705941b6342b9dbb6207ef
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>

e575dd75

Add DCO check to CICD · ad32a54c

Fredrik Svedberg authored Feb 07, 2025



Added DCO check to CICD pipeline.

Change-Id: I9eeec783935cc01725b2656aee7bc57a8696b864
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>

ad32a54c

Feb 06, 2025

MLBEDSW-10098: Add scheduled op tracking to debugDB · ea6803e7

William Isaksson authored Dec 09, 2024 and

Fredrik Svedberg committed Feb 06, 2025



This patch adds trackability for scheduled ops.

Change-Id: I93181468b2459150026785f49ef128ec72998a5d
Signed-off-by: William Isaksson <william.isaksson@arm.com>

ea6803e7

MLBEDSW-10091: Add check to not add optimised ops to src table · 8a70b986

William Isaksson authored Oct 29, 2024

Adds check to check that already optimised ops are not added as source ops in the GraphIR pass.

Change-Id: Ib966080f7ee3b5bf4e52ef9e601dcc37153dcdfe
Signed-off-by: William Isaksson <william.isaksson@arm.com>

8a70b986

Feb 05, 2025

MLBEDSW-10367: Fix assert in AreaFit · 84712f8e

Johan Alfvén authored Feb 04, 2025



 - AreaFit failed to find a shape for a small aspect ratio. The
adjustment ratio step was too small and the retry counter
caused the iteration to break before finding a solution
 - The fix is to increase the allowed number of iterations and
as a last resort change the scaling ratio

Change-Id: Ic07e1cc60beae592dc832c9e71706d87621b1219
Signed-off-by: Johan Alfvén <johan.alfven@arm.com>

84712f8e

MLBEDSW-10181: Infer OFM shape if it's missing · 11a0744c

Johan Gunnarsson authored Feb 04, 2025



There are cases where, for example, a QUANTIZE op has a constant
tensor input, but shapless output. With this patch, OFM will
inherit IFM's shape for certain ops.

Signed-off-by: Johan Gunnarsson <johan.gunnarsson@arm.com>
Change-Id: Ibcdc503ed56cbe0822e3d5d33dfde76f5620cd07

11a0744c

Feb 04, 2025

MLBEDSW-10231 Fix find block config for AvgPool · c9ed0bee

Johan Alfvén authored Feb 03, 2025



 - Find block config for an AvgPool failed and triggered an assert
 - Adding a final step which halves the depth until minimum granule
depth is reached solves the problem

Change-Id: I81689fb26e20744b5f5b23226570ef8df499b8ea
Signed-off-by: Johan Alfvén <johan.alfven@arm.com>

c9ed0bee

Feb 03, 2025

MLBEDSW-10364: Don't run ConvertZeroPoint for Passthrough ops · 3f7624d3

Johan Gunnarsson authored Feb 03, 2025



This is a regression since af541e72.

Signed-off-by: Johan Gunnarsson <johan.gunnarsson@arm.com>
Change-Id: I53a5a555d738a8e2527a70e0c5f2ad7dcb6baf29

3f7624d3

MLBEDSW-9868: Adjust BufferReader asserts to always allow index 0 · 22f69adb

Johan Gunnarsson authored Jan 24, 2025 and

Fredrik Svedberg committed Feb 03, 2025



Index 0 is unstrided so we can always allow reading that index
regardless of stride. Also added a few more asserts.

Signed-off-by: Johan Gunnarsson <johan.gunnarsson@arm.com>
Change-Id: Icf7ac8c46665d4e651232d0c0ba66733dce99dc6

22f69adb

Jan 31, 2025

MLBEDSW-10313: Use "New" Python mode in pybind11 · 7b722079

Rickard Bolin authored Jan 29, 2025



Updating pybind11 to the newest version changed our build to use the
"Classic" pybind11 Python mode. Set PYBIND11_FINDPYTHON ON to use the
"New" Python mode, which was used before the update.

Change-Id: Ia54ef7363dba259778151de920e11959484580a0
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>

7b722079

Jan 29, 2025

MLBEDSW-9881 Update pipelines · f676ca8f

Fredrik Svedberg authored Jan 29, 2025



Updated workflow rules.

Change-Id: I5f05858bef52e08872092cac2e6fc2d3bf1711e3
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>

f676ca8f

CMake: silence policy CMP0117 warning · 18260a78

Mauricio Briceno authored Jan 24, 2025 and

Fredrik Svedberg committed Jan 29, 2025



- Warns when a Destination path is different from its normalized version
- This happens for the wheels' destination directory
- But this is benign as there's a trailing path separator that gets
  removed by the path normalization algorithm resulting in essentially
  the same original path

Change-Id: Ie6f6382941ea533d3b68b5acf275a0f7c07291d5
Signed-off-by: Mauricio Briceno <mauricio.briceno@arm.com>

18260a78

MLBEDSW-10293: TFLite buffer offset support · 8b69a3f4

Mauricio Briceno authored Jan 18, 2025



- Regenerated TFLite schema with mutable API
- TFLite reader: implement mechanism to load buffers at the end
  of the file as described in the schema
- Update vela.py to read via mmap
- TFlite writer: implement mechanism to write buffers at the end of the
  file as described in the schema

Change-Id: I169a5f0e512f1b038393145495ec7040be783969
Signed-off-by: Mauricio Briceno <mauricio.briceno@arm.com>

8b69a3f4

Jan 28, 2025

MLBEDSW-9881 Add CI/CD pipelines · 55b06cf5

Fredrik Svedberg authored Jan 09, 2025



Added initial CI/CD pipelines and Dockerfile for building the
runner Docker image.

Change-Id: I14dccd8c28e5e8c703210a3ca18a16ef673614c4
Signed-off-by: Fredrik Svedberg <fredrik.svedberg@arm.com>

55b06cf5

MLBEDSW-10106: Fix bitflags casting issue. · 3f5c9104

Philip Hall authored Jan 27, 2025 and

Fredrik Svedberg committed Jan 28, 2025



An obscure cast issue occured during another implementation
where attempting to cast the bitflags object causes it to
take the implicit boolean value, rather than the explicit
unsigned cast.

 - This commit prevents such casts by removing the
   implicit boolean cast.
 - Added specific flag-test operator, resulting in
   an actual boolean value.
 - Allowed double-negation semantics (!!) to check
   for non-zero flags.

Signed-off-by: Philip Hall <philip.hall@arm.com>
Change-Id: Ide581e840a0c848be68bbc3249518ab901ce480b

3f5c9104

MLBEDSW-10319: Add single-consumer check before OFM-fusing · e131bf4f

Alexander Bengtsson authored Jan 27, 2025 and

Alexander Bengtsson committed Jan 28, 2025



- OFM-fusing of Rescales is only valid if the Rescale
  operation (to be fused) is the single consumer of
  the preceding operations OFM.

Change-Id: Ie341d3e462cf7ce7ec4721f83b459d364542304c
Signed-off-by: Alexander Bengtsson <Alexander.Bengtsson@arm.com>

e131bf4f

Jan 27, 2025

MLBEDSW-10299: Size types and allocator fixes for ordered_map · 6ee583fa

Philip Hall authored Jan 22, 2025



 - Normalise the sized types used by ordered_map to present
   an interface more consistent with the standard library.
 - Set initial allocation to zero such that declaring an
   empty map allocates no storage, and add tests for the
   same. Storage is now allocated on first-use.
 - Fix potential range issue with initial hashtable size
   being greater than the chosen indexer.
 - Fix issue where it was not possible to resize up to
   the maximum indexer limit.

Signed-off-by: Philip Hall <philip.hall@arm.com>
Change-Id: I12742431808d73625ac6bcdbd7b701b52f763834

6ee583fa

Jan 24, 2025

MLBEDSW-10274 Add TOSA Pad support for Ethos-U55/U65 · af5f7df9

Jacob Bohlin authored Jan 21, 2025



Change-Id: I91e1fdc69807b0a8702663932944b327f4728a1e
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>

af5f7df9

MLBEDSW-10158: Don't allocate fast memory for cross-CPU tensors · 2e0d490c

Johan Gunnarsson authored Jan 17, 2025 and

Johan Alfvén committed Jan 24, 2025



A cross-CPU tensor is a tensor with a live range that covers two
command streams, potentially with a CPU op in between them. In a
multi-threaded setup, the fast storage is not guaranteed to be
left unchanged between the execution of the command streams.

This patch can have a negative performance impact for networks
with multiple command streams.

Signed-off-by: Johan Gunnarsson <johan.gunnarsson@arm.com>
Change-Id: I1cc8f51bf5c1b01e212dedc86c9cd2a415648ec0

2e0d490c

MLBEDSW-10249 Fixed scaling issue with out-of-range shifts · a570a6ba

Jacob Bohlin authored Jan 23, 2025 and

Fredrik Svedberg committed Jan 24, 2025



When the shift is above 63 it cannot be encoded and would cause an
assert. However, a shift larger than 63 would lead to all the bits being
shifted away and result in a 0 anyway so the same behaviour can be
achieved by setting the scale to 0.

Also addressed two minor integer type issues in the python graph
optimiser.

Change-Id: I5da2b4170f95c4e1cf161c93a932c92ad9c242ea
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>

a570a6ba

Jan 23, 2025

MLBEDSW-10310: MLCE: Wrong ofm shape for Prelu ops implemented by AvgPool · 0b5aafa8

Johan Alfvén authored Jan 23, 2025



 - An Avgpool with the wrong ofm shape caused an output difference
 - The wrong shape is caused by the removal of a Reshape op causing
 the Prelu actual op ofm shape and tensor shape to differ
 - Prelu was then later converted the an AvgPool with an faulty value on
the op ofm shape (using the Reshape output shape)
 - The fix is to make sure the original Prelu output shape is used for
 the AvgPool

Change-Id: I878695cbac2e9c0a5323eb7f620047588454b138
Signed-off-by: Johan Alfvén <johan.alfven@arm.com>

0b5aafa8

MLBEDSW-10206: Add support for Resize when H or W is 1 · 1d6f2924

Rickard Bolin authored Dec 23, 2024



Allow either height or width dimension to be 1 when the other is
upscaled

Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: If423338b9f6aeacf2c02cc9437588c13c1949cd6

1d6f2924

MLBEDSW-10285: MLCE: Output diff caused by wrong ifm box · 5078129c

Johan Alfvén authored Jan 21, 2025



 - A slice op followed by a conv2d with stride 2 caused an output diff
 - The slice read is moved to the consumer (conv2d) but the problem in
this case was that the ifm box calculation was not correct when having
a stride greater than one
 - The issue is solved by backporting various fixes from Regor that is
making sure ifm and ofm box have correct offsets and sizes
 - Also fixed a hidden problem that read_shape in rewrite_split_ops was
calculated erroneously since start and end offset can be less than
rank 4 but ifm shape is always 4. That gave a corrupt read_shape.
However, read_shape height was not used before this commit so corrupt
value was not used and did not cause any problems

Change-Id: Ib71c13cfecf77b2cdc2b5aaf437938577c433bb5
Signed-off-by: Johan Alfvén <johan.alfven@arm.com>

5078129c

Jan 22, 2025

MLBEDSW-9229: Move more sliced MemoryCopy ops to consumer · 99993f89

Johan Gunnarsson authored Jan 15, 2025 and

Rickard Bolin committed Jan 22, 2025



It's possible to move slices to consumer if the shapes are equal
except for leading dimensions that are 1.

Signed-off-by: Johan Gunnarsson <johan.gunnarsson@arm.com>
Change-Id: I5a3a4e8ba1f9eda2fc16b6d9959feac0b73be664

99993f89

MLBEDSW-8459: Add GATHER and SCATTER support checks · 9d7a89f4

Rickard Bolin authored Aug 28, 2024



Can only support constant index tensors where no indexes are duplicated.

Change-Id: Iddf44bef8f0c6aca3bef1339aebea60507077540
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>

9d7a89f4

Jan 21, 2025

MLBEDSW-10088: Cast shapes and attributes from NumPy to Python types · b7867177

Rickard Bolin authored Dec 27, 2024



- When using NumPy 2.0 and above, calculations involving both a Python
and NumPy integer are no longer implicitly cast to an int64 data type,
which can result in overflows.
- Cast all shapes and attributes to Python integer as early as possible
  to avoid accidentally mixing NumPy and Python types in calculations.

Change-Id: I11502a58ada8361954af0cf7b1d8c3b5585291a0
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>

b7867177

MLBEDSW-10286: Missing memory library include · 63db5894

Rickard Bolin authored Jan 20, 2025



Was previously included indirectly from a third-party include in common.hpp.

Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>
Change-Id: If5384a085b95d46b14c4b08670e5dde4078601c1

63db5894

Jan 20, 2025

MLBEDSW-10227: Additional checks for rescale IFM-fusing · 14f2efca

Alexander Bengtsson authored Jan 13, 2025 and

Alexander Bengtsson committed Jan 20, 2025



- Add 3 missing checks when performing IFM rescale-fusing on binary
  elementwise operations.
  1. Pass both rescale in/out dtypes and operation in/out dtypes to
     SupportsFusedRescale. All 4 are required to determine whether
     an operation can be fused.
  2. When performing IFM-fusing, the fused tensor should not be in
     graph-outputs.
  3. When checking whether binary elementwise operations can IFM-fuse
     The compiler must also check following for the second input:
       * input/output unsigned attributes
       * that the fused tensor is not in graph-output.

Change-Id: I82ed1d07f14d48c70c8a94b9579be20200029f95
Signed-off-by: Alexander Bengtsson <Alexander.Bengtsson@arm.com>

14f2efca

Jan 17, 2025

MLBEDSW-8588 Add support for ConvGroups in Regor · c52f1490
Jacob Bohlin authored Jan 15, 2025 and Fredrik Svedberg committed Jan 17, 2025
```
Change-Id: Idace12b6fe663722b8c50cc8e3c475feca044ebd
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>
```
c52f1490

dependencies: updated thirdparties · 698142c3

Mauricio Briceno authored Jan 09, 2025



- To latest versions
- Solves some GCC warnings and static analysis issues
- Somewhat improves build time
- regor-namespaced all anonymous scoped enums
- Fixed unit tests for MSVC

Change-Id: Ibdbaee5d7ceb12327d640707257e0b0cb37404fe
Signed-off-by: Mauricio Briceno <mauricio.briceno@arm.com>

698142c3

Jan 15, 2025

MLBEDSW-8927 Extend PAD/MIRROR_PAD support to 3D tensors · 8b52522f

Jacob Bohlin authored Jan 06, 2025



Created utility functions for getting the padding values for different
axes which are not dependent on the input tensor being 4-dimensional.

Change-Id: If3398f7e665f5dae39b184efecbb8caed439f225
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>

8b52522f

Jan 14, 2025

MLBEDSW-10247: Fix non-8-bit transpose implementation · 019ea394

Philip Hall authored Jan 13, 2025



While Ethos-U55 does not support 32-bit transposing, it appears
that a previous approach that treats everything as 8-bit quantities
works for the most common transpose cases (NHWC, NHCW).

 - Restore the treat-as-8-bit transpose path for Ethos-U55.
 - Fixes bad slice address offsets when source tensor is unsliced.

Signed-off-by: Philip Hall <philip.hall@arm.com>
Change-Id: I2e28db217c50ce37b34e5265402acef318fdb873

019ea394

MLBEDSW-8927 Add Ethos-U55/U65 support for MIRROR_PAD in Regor · 119138f5

Jacob Bohlin authored Jan 09, 2025



* Updated the regor Ethos-U55/U65 backend to handle OFM reverse in H and W axes.
* Added architectural constraints for negative striding in order to
force linear format for OFM reversals for Ethos-U55/U65.
* Minor print-formatting change to high_level_command_stream.py.

Change-Id: I30b97b8c7fadf5306ca960bd4b45087513270864
Signed-off-by: Jacob Bohlin <jacob.bohlin@arm.com>

119138f5

MLBEDSW-10238: Fix LeakyRelu int16 · b571999c

Johan Alfvén authored Jan 13, 2025



 - Use correct lowering on Ethos-U55 and Ethos-U65
for int16 LeakyRelu

Change-Id: I984e66f7dcd83ce54f2a785bd27493dff2e63ed7
Signed-off-by: Johan Alfvén <johan.alfven@arm.com>

b571999c

Jan 13, 2025

MLBEDSW-10253: Normalise scales when ofm-fusing · 735ce225

Alexander Bengtsson authored Jan 13, 2025 and

Alexander Bengtsson committed Jan 13, 2025



- Unit scales were not normalized when performing OFM-fusing
  This triggered asserts in RCS-gen for operations that can only handle
  unit scaling

Change-Id: I8f06f070aa4bb308afa5223f15d2e1bb88465210
Signed-off-by: Alexander Bengtsson <Alexander.Bengtsson@arm.com>

735ce225

MLBEDSW-10227: Fix Ethos-U55 rescale fusing for 32-bit operations · eb82844e

Alexander Bengtsson authored Jan 10, 2025 and

Alexander Bengtsson committed Jan 13, 2025



- Ethos-U55 rescales 32-bit operations with shift-only
  this needs to be accounted for when performing rescale fusing.

Change-Id: I71755c8aa7c6ac23e85858655c1659cb0899d3b2
Signed-off-by: Alexander Bengtsson <Alexander.Bengtsson@arm.com>

eb82844e

Jan 10, 2025

MLBEDSW-9633: Extend RewriteRescale to other DataTypes · 3513dac0

Alexander Bengtsson authored Dec 03, 2024 and

Johan Alfvén committed Jan 10, 2025



- Extend RewriteRescale to other IFM precisions by adding a cast to int32
  This adds support for rescales with input_unsigned attributes, where
  the unsigned datatype is not supported by pooling operations.
  operations.
- Handle input_unsigned/output_unsigned on cast operations.
- Rewrite Rescales with 32-bit OFM to Cast + Mul on Ethos-U55

Change-Id: I1dcc1e4e3b4e11494ff322e67fd878ef0a6ea66a
Signed-off-by: Alexander Bengtsson <Alexander.Bengtsson@arm.com>

3513dac0

MLBEDSW-10199: Normalize quantization before checking OFM Rescale fusing · 48cd7c60

Rickard Bolin authored Dec 19, 2024 and

Johan Alfvén committed Jan 10, 2025

On Ethos-U55 and Ethos-U65, only Rescales with unit scale can be fused
to Minimum operators. However, the output quantization shift was not
normalized to zero before checking if the scale was unit or not.

Change-Id: If8e04034045853e0c7b715547b2a68609c11cbe1
Signed-off-by: Rickard Bolin <rickard.bolin@arm.com>

48cd7c60

MLBEDSW-10244: Update HLCS source ID key to track via scheduler operations · 2bbb2f28

Philip Hall authored Dec 13, 2024 and

Johan Alfvén committed Jan 10, 2025



The existing mechanism that maps input operation to
output command skips over the scheduler operation,
connecting optimised graph ops to NPU commands.

 - This commit changes the source mapping to map via
   the scheduler operation that generated the HLCS
   operation, instead of skipping straight to the graph
   operation.
 - Fix alignment warning using deprecated std::aligned_storage

Signed-off-by: Philip Hall <philip.hall@arm.com>
Change-Id: Idbdb9da8aede418bc7e8c00ef0f0c11de34dde44

2bbb2f28

build: Fix toolchain files · 2fb464ef

Mauricio Briceno authored Jan 09, 2025 and

Johan Alfvén committed Jan 10, 2025



- Remove cross-compile 32b support (not working)
- clang: use clang system headers

Change-Id: If1da8ff9d01e2070ff99fc7f61a2625f75077cbd
Signed-off-by: Mauricio Briceno <mauricio.briceno@arm.com>

2fb464ef