Commits · v1.9.0-rc1 · Kleidi / KleidiAI

Jun 02, 2025

Bump version and update changelog · a15b6725

Emil Ohlsson authored Jun 02, 2025



Signed-off-by: Emil Ohlsson <emil.ohlsson@arm.com>

Approved-by: Viet-Hoa Do <viet-hoa.do@arm.com>

a15b6725

May 30, 2025

Matmul Micro-kernels F32 <- QAI8DXP(LHS) x QSI8CXP(RHS) optimized for SME · 3d8217c2

Anitha Raj authored May 30, 2025 and

Felix Johnny Thomasmathibalan committed May 30, 2025



* Micro-kernels (1xN) to compute the matrix multiplication of dynamically quantized asymmetric 8-bit integer with per-channel quantization (QAI8DX) LHS matrix and quantized symmetric 8-bit integer with per-channel quantization (QSI8CX) RHS matrix and the accumulation of the result into a single-precision (F32) output, optimized for SME2 technology.
*  Micro-kernels (MxN) to compute the matrix multiplication of dynamically quantized asymmetric 8-bit integer with per-channel quantization (QAI8DX) LHS matrix and quantized symmetric 8-bit integer with per-channel quantization (QSI8CX) RHS matrix and the accumulation of the result into a single-precision (F32) output, optimized for SME2 technology.

Signed-off-by: Anitha Raj <anitha.raj@arm.com>

Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
Reviewed-by: Anton Bondarenko <anton.bondarenko@arm.com>
Approved-by: Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>

3d8217c2

May 28, 2025

Extend support for signed 4-bit integer inputs in kai_rhs_pack_nxk_qsi4cxps1s0_qsu4cxs1s0_neon · bf64dadd

Evie Wright authored May 28, 2025 and

Felix Johnny Thomasmathibalan committed May 28, 2025

Signed-off-by: Evie Wright <evie.wright@arm.com>

Reviewed-by: Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
Reviewed-by: Anton Bondarenko <anton.bondarenko@arm.com>
Approved-by: Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>

bf64dadd

May 27, 2025

Update build image components · 1bb52851

Anton Bondarenko authored May 27, 2025



Update missing or outdated components in build environment

Signed-off-by: Anton Bondarenko <anton.bondarenko@arm.com>

Approved-by: Jakub Sujak <jakub.sujak@arm.com>

1bb52851

Address linter issues · c24de178

Emil Ohlsson authored May 27, 2025



Newer versions of the linter flags issues with parentheis in
expressionss, as well as use of `size_t` without inclusion of `stddef.h`

Signed-off-by: Emil Ohlsson <emil.ohlsson@arm.com>

Reviewed-by: Emil Ohlsson <emil.ohlsson@arm.com>
Approved-by: Jakub Sujak <jakub.sujak@arm.com>

c24de178

Replace templated clamp finding with dynamic · 6ef4683c

Emil Ohlsson authored May 27, 2025



Given that there is a non-templated version for doing clamp testing,
there is no need to toggle on type in matmul_test.cpp.

Signed-off-by: Emil Ohlsson <emil.ohlsson@arm.com>

Approved-by: Jens Elofsson <jens.elofsson@arm.com>

6ef4683c

May 26, 2025

Use new Buffer class for the entire test framework · 5cb98201

Viet-Hoa Do authored May 26, 2025 and

Anton Bondarenko committed May 26, 2025



* Replace `std::vector<uint8_t>` by `Buffer` class.
* Update `Buffer` class:
  - Add support for initial value of the buffer.
  - Always initialize the buffer with 0 by default.
* Add `pad_matrix` reference function to support extending
  the data buffer.

Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>

Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Anton Bondarenko <anton.bondarenko@arm.com>
Reviewed-by: Emil Ohlsson <emil.ohlsson@arm.com>
Approved-by: Anton Bondarenko <anton.bondarenko@arm.com>
Approved-by: Emil Ohlsson <emil.ohlsson@arm.com>

5cb98201

May 22, 2025

Fix clamping issue · 32546c02

Viet-Hoa Do authored May 22, 2025



* Numeric limits report the lowest and highest finite values
  of F16 and BF16 to be 0 which disables testing of all F16
  and BF16 kernels with clamping.
* Update numeric limits to have the correct limits.
* Update numeric limits to make sure compilation error when
  a type is not supported.

Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>

Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Anton Bondarenko <anton.bondarenko@arm.com>
Approved-by: Anton Bondarenko <anton.bondarenko@arm.com>

32546c02

Avoid FEAT_FP16 requirement unnecessarily · 695a717f

Viet-Hoa Do authored May 22, 2025



* The conversion between FP32 and FP16 is part of the base
  instruction set and does not require FEAT_FP16.
  The equivalent functions in the test framework need to change
  to avoid the need for FEAT_FP16.

Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>

Approved-by: Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
Approved-by: Anton Bondarenko <anton.bondarenko@arm.com>

695a717f

May 13, 2025

Add imatmul documentation · 382c2b25

Emil Ohlsson authored May 13, 2025



This commit introduces documentation for the imatmul kernels, with extra
focus on the left hand side packing.

Signed-off-by: Emil Ohlsson <emil.ohlsson@arm.com>

Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Emil Ohlsson <emil.ohlsson@arm.com>
Approved-by: Anton Bondarenko <anton.bondarenko@arm.com>

382c2b25

May 08, 2025

Steamline test functionality · bfc48f79

Anton Bondarenko authored May 08, 2025



* Remove duplicate data types and structures
* Cleanup unused headers
* Move common CPU feature checkers to cpu_info

Signed-off-by: Anton Bondarenko <anton.bondarenko@arm.com>

Reviewed-by: Emil Ohlsson <emil.ohlsson@arm.com>
Approved-by: Emil Ohlsson <emil.ohlsson@arm.com>

bfc48f79

May 02, 2025

Smarter memory protection with Buffer class · 73d1e85d

Jakub Sujak authored May 02, 2025



Introduces a dedicated `Buffer` abstraction for managing blocks of memory. Buffer comes with protection mechanisms that can be enabled by setting the `KAI_TEST_BUFFER_POLICY` environment variable.

Example usage:

```KAI_TEST_BUFFER_POLICY=PROTECT_OVERFLOW ./kleidiai_test```

Available memory protection mechanisms:

- `KAI_TEST_BUFFER_POLICY=PROTECT_UNDERFLOW`
- `KAI_TEST_BUFFER_POLICY=PROTECT_OVERFLOW`

If `KAI_TEST_BUFFER_POLICY` is not set or is not one of the above values, then no memory protection mechanisms are enabled and Buffer performs naive malloc() allocation of memory.

When `KAI_TEST_BUFFER_POLICY` is set to one of the above values, the following protections are enabled:

- `PROTECT_UNDERFLOW`: Memory equal to the size of the user buffer rounded to the nearest whole page plus adjacent guard pages is allocated, and the user buffer is aligned to the end of the head guard page thus detecting whenever a buffer underflow occurs.
- `PROTECT_OVERFLOW`: Same as above, but now the edge of the user buffer is aligned to the start of the tail guard page thus detecting whenever a buffer overflow occurs.

Buffer is only intended to opaquely allocate and manage memory. The underlying memory resource can be requested using the familiar `Buffer::data()` method and interacted with using `kai::test::read_array<T>()` and `kai::test::write_array<T>()` utilities.

Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>

Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Emil Ohlsson <emil.ohlsson@arm.com>
Approved-by: Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>

73d1e85d

Apr 29, 2025

Fix segmentation faults in benchmark tool · 49e0f869

Jakub Sujak authored Apr 29, 2025



* Fix incorrect calculation of LHS matrix stride value

For kernels that use the LHS matrix stride in their API, namely
`kai_matmul_clamp_f32_f32_f32p8x1biasf32_6x8x4_neon_mla` and
`kai_matmul_clamp_f16_f16_f16p16x1biasf16_6x16x8_neon_mla` kernels, the
LHS stride value was calculated incorrectly by computing in terms of
bits, not bytes.

* Fix insufficient allocation of memory for SME kernels

For SME kernels, such as
`kai_matmul_clamp_f32_f32_f32p16vlx1b_1x16vl_sme2_mla`, the tensor sizes
 are in terms of the streaming SVE vector length. Thus, when running SME
  kernels we must scale the LHS/RHS/DST buffer sizes by the VL
  appropriately.

The segmentation faults were discovered when running with address
sanitizer enabled.

Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>

Reviewed-by: Emil Ohlsson <emil.ohlsson@arm.com>
Approved-by: Emil Ohlsson <emil.ohlsson@arm.com>

49e0f869

Apr 25, 2025
- Fix incorrect comment in FP16 IGEMM RHS Packing Header · aa5bd2bf
  Suhail M authored Apr 25, 2025 and Anton Bondarenko committed Apr 25, 2025
```
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>

Approved-by: Anton Bondarenko <anton.bondarenko@arm.com>
```
  aa5bd2bf
Apr 24, 2025

Set version to 1.8.0 · cca02c2f

Jens Elofsson authored Apr 24, 2025



Update all version indicators to 1.8.0.

Signed-off-by: Jens Elofsson <jens.elofsson@arm.com>

Approved-by: Jakub Sujak <jakub.sujak@arm.com>

cca02c2f

Allow matmul_clamp_f32_qai8dxp_qsi4c32p tests to run without bfloat16 CPU support · 8059c1b8

Matthew Bentham authored Apr 24, 2025 and

Anton Bondarenko committed Apr 24, 2025

Related to #9 - allows some tests to run when the CPU doesn't have bfloat16 support, by reducing need for bfloat16 support in the test framework.

Use of bfcvt in the test framework is replaced by simple truncation - this will effectively
round towards 0.

Add a few tests for negative numbers in the test framework BFloat16 code, and allow running the
unit tests for the framework code on CPUs without bfloat16 hardware support.

Signed-off-by: Matthew Bentham <Matthew.Bentham@arm.com>

Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Approved-by: Anton Bondarenko <anton.bondarenko@arm.com>

8059c1b8

Add support for FP32 Indirect GEMM with SME · 1a450344

Suhail M authored Apr 24, 2025 and

Jakub Sujak committed Apr 24, 2025



- Adds packing and matmul kernels for FP32 SME Indirect GEMM
- Adds tests for Indirect Gemm with FP32 inputs/outputs.

Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>

Reviewed-by: Anton Bondarenko <anton.bondarenko@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Mohammed Suhail Munshi <mohammedsuhail.munshi@arm.com>
Reviewed-by: Emil Ohlsson <emil.ohlsson@arm.com>
Approved-by: Emil Ohlsson <emil.ohlsson@arm.com>

1a450344

Update the changelog for the 1.8.0 release. · a42ea653

Jens Elofsson authored Apr 24, 2025 and

Jakub Sujak committed Apr 24, 2025



Signed-off-by: Jens Elofsson <jens.elofsson@arm.com>

Approved-by: Anton Bondarenko <anton.bondarenko@arm.com>
Approved-by: Jakub Sujak <jakub.sujak@arm.com>

a42ea653

Reduce noise from comparison · 66006416

Emil Ohlsson authored Apr 24, 2025



This change bundles some cleanup done while doing FP16 IGEMM work. The
main change is the reduced output from the comparison function, which
instead of printing each differing value on a new line instead bundles
them on a per matrix row, with some block description. Also, the output
is only printed if there is a mismatch

Signed-off-by: Emil Ohlsson <emil.ohlsson@arm.com>

Approved-by: Jakub Sujak <jakub.sujak@arm.com>

66006416

Update the QAI8 matmul interface · 058f2fb4

Emil Ohlsson authored Apr 24, 2025



The QAI8 interface doesn't match the interface of the actual kernel.
This change changes the interface to align with the kernel, and then
enforces the interface alignment using unit testing.

Signed-off-by: Emil Ohlsson <emil.ohlsson@arm.com>

Approved-by: Jakub Sujak <jakub.sujak@arm.com>
Approved-by: Anton Bondarenko <anton.bondarenko@arm.com>

058f2fb4

Apr 23, 2025

Add FP16 IGEMM support · d8642031

Emil Ohlsson authored Apr 23, 2025



This change introduces three new kernels
* `kai_lhs_imatmul_pack_x16p2vlx2_x16p_sme`
* `kai_rhs_imatmul_pack_kxn_x16p2vlx2b_x16_x16_sme`
* `kai_imatmul_clamp_f16_f16p2vlx2_f16p2vlx2_2vlx2vl_sme2_mopa`

These are used for indirect matmul, for 16-bit floating point data.

This change also adds unit testing for the FP16 imatmul kernels. The
code is written in a type agnostic manner, as to easily allow testing
for other data types with very low effort. This required the addition of
non-templated `read`/`write` functions, as to allow runtime-generic
access.

Signed-off-by: Emil Ohlsson <emil.ohlsson@arm.com>

Reviewed-by: Anton Bondarenko <anton.bondarenko@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Emil Ohlsson <emil.ohlsson@arm.com>
Approved-by: Anton Bondarenko <anton.bondarenko@arm.com>

d8642031

QAI8 cleanup changes · e9d5f3b9

Emil Ohlsson authored Apr 23, 2025



There are few cleanups for QAI8 that were discovered while doing FP16
work. This change bundles these cleanups

Signed-off-by: Emil Ohlsson <emil.ohlsson@arm.com>

Approved-by: Jakub Sujak <jakub.sujak@arm.com>

e9d5f3b9

Apr 22, 2025

Transition QAI8 tests to lazy initialization · 18c4b5cf

Emil Ohlsson authored Apr 22, 2025



This change moves the QAI8 static initializations to lazy, C++17
compliant, initializations. This change also makes use of kernel
interfaces, as to make sure they're exercised

Signed-off-by: Emil Ohlsson <emil.ohlsson@arm.com>

Approved-by: Jens Elofsson <jens.elofsson@arm.com>

18c4b5cf

Apr 16, 2025

Disable link time optimization for microkernel library · 12beb006

Viet-Hoa Do authored Apr 16, 2025



* Our SME kernels are compiled with auto-vectorization disabled,
  but the linker is unable to respect this flag, which can
  cause miscompilation and result in illegal instruction.

Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>

Approved-by: Emil Ohlsson <emil.ohlsson@arm.com>

12beb006

Apr 15, 2025

add support for f16 with asymmetric int8 LHS, symmetric int8 RHS · 8120ad23
Evie Wright authored Apr 15, 2025 and Jakub Sujak committed Apr 15, 2025
```
Signed-off-by: Evie Wright <evie.wright@arm.com>

Approved-by: Jakub Sujak <jakub.sujak@arm.com>
```
8120ad23

Set version to 1.7.0 · 22053433

Jens Elofsson authored Apr 15, 2025



Update all version indicators to 1.7.0.

Signed-off-by: Jens Elofsson <jens.elofsson@arm.com>

Approved-by: Emil Ohlsson <emil.ohlsson@arm.com>

22053433

Apr 14, 2025

Fix typo in interface file · 8c095880

Felix Johnny Thomasmathibalan authored Apr 14, 2025



The get_lhs_offset function pointer type
in f32_bf16p_bf16p is fixed to read get_lhs_packed_offset
as the LHS is packed as well.

Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>

Approved-by: Jakub Sujak <jakub.sujak@arm.com>

8c095880

Apr 11, 2025

Matmul Micro-kernels F16<-(QAI8DX) LHS x (QSI4CX) RHS · 315ed95c

Anitha Raj authored Apr 11, 2025 and

Viet-Hoa Do committed Apr 11, 2025

Micro-kernels to compute the matrix multiplication of dynamically quantized asymmetric signed 8-bit integer with per-channel quantization (QAI8DX) LHS matrix and quantized symmetric 4-bit signed integer with per-channel quantization (QSI4CX) RHS matrix and the accumulation of the result into a half-precision (F16):

Matrix multiplication (MxN) Micro-kernels of QAI8DX LHS and QSI4CX RHS with F16 output, optimized for FEAT_I8MM and FEAT_DotProd.
Matrix multiplication (1xN) Micro-kernels of QAI8DX LHS and QSI4CX RHS with F16 output, optimized for FEAT_DotProd.

Signed-off-by: Anitha Raj <anitha.raj@arm.com>

Signed-off-by: Evie Wright <evie.wright@arm.com>

Approved-by: Viet-Hoa Do <viet-hoa.do@arm.com>

315ed95c

Apr 10, 2025

Use lazy initialization in unit tests. · 8ddbdc0a

Jens Elofsson authored Apr 10, 2025



Static initialization has no guaranteed order, which may cause test
listing to be initialized before the list of kernels.

This fixes unit tests
- matmul_clamp_f32_bf16p_bf16p_test
- matmul_clamp_f16_bf16p_bf16p_test

Signed-off-by: Jens Elofsson <jens.elofsson@arm.com>

Approved-by: Anton Bondarenko <anton.bondarenko@arm.com>

8ddbdc0a

Use memcpy instead of union for BF16-FP32 conversion · 2d05e618

Viet-Hoa Do authored Apr 10, 2025



* Writing to one union member and reading the other union member
  is considered undefined behavior, therefore we need to avoid it.

Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>

Approved-by: Emil Ohlsson <emil.ohlsson@arm.com>

2d05e618

Apr 09, 2025

Matmul Micro-kernels F32/F16 <- (QSI8D32) LHS x (QAI4C32) RHS · b27875d9

Anitha Raj authored Apr 09, 2025 and

Anton Bondarenko committed Apr 09, 2025



Micro-kernels to compute the matrix multiplication of dynamically quantized symmetric signed 8-bit integer with per-block quantization (QSI8D32) LHS matrix and quantized asymmetric 4-bit signed integer with per-block quantization (QAI4C32) RHS matrix and the accumulation of the result into a single-precision (F32) and half-precision (F16) output:

- Matrix multiplication (MxN) Micro-kernels of QSI8D32 LHS and QAI4C32 RHS with F32 output, optimized for FEAT_I8MM.
- Matrix multiplication (1xN) Micro-kernels of QSI8D32 LHS and QAI4C32 RHS with F32 output, optimized for FEAT_DotProd.
- Matrix multiplication (MxN) Micro-kernels of QSI8D32 LHS and QAI4C32 RHS with F16 output, optimized for FEAT_I8MM.
- Matrix multiplication (1xN) Micro-kernels of QSI8D32 LHS and QAI4C32 RHS with F16 output, optimized for FEAT_DotProd.

Signed-off-by: Anitha Raj <anitha.raj@arm.com>

Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Anitha Raj <anitha.raj@arm.com>
Reviewed-by: Anton Bondarenko <anton.bondarenko@arm.com>
Approved-by: Anton Bondarenko <anton.bondarenko@arm.com>

b27875d9

Add QAI8 IGEMM kernels · 22c47616

Emil Ohlsson authored Apr 09, 2025



This change introduces three new kernels:

* kai_imatmul_clamp_qai8_qai8p2vlx4_qsi8cxpsb2vlx4_2vlx2vl_sme2_mopa
* kai_lhs_imatmul_pack_x8p2vlx4_x8p_sme
* kai_rhs_imatmul_pack_kxn_qsi8cxp2vlx4sb_qs8cx_f32_i32_sme

These kernels are used for _indirect matmul_. The big difference between
these kernels and matmul kernels is that the LHS packing kernel takes an
indirection buffer where each pointer refers to a chunk in K dimension.
The pointers are laid out in a packed manner, where instead of being in
row major order, a column of `get_m_step` chunk pointers are placed
linearly in indirection buffer.

In addition to the kernels themselves, the
`matmul_clamp_qai8_qai8p_qsi8cxp_test.cpp` is extended to perform
testing of these new kernels. The testing flow for these new kernels is
a bit different, in that the packing kernels themselves are not directly
tested, instead only end-to-end flow is tested.

Signed-off-by: Emil Ohlsson <emil.ohlsson@arm.com>
Signed-off-by: Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
Signed-off-by: Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com>

Reviewed-by: Viet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Anton Bondarenko <anton.bondarenko@arm.com>
Reviewed-by: Jakub Sujak <jakub.sujak@arm.com>
Reviewed-by: Emil Ohlsson <emil.ohlsson@arm.com>
Approved-by: Jakub Sujak <jakub.sujak@arm.com>

22c47616

Apr 08, 2025

Change kernel lists to use lazy initialization · f7a00947

Emil Ohlsson authored Apr 08, 2025



There is an issue where the order of static initializations has no
guaranteed order, which can cause test listing to be initialized before
list of kernels. This can be solved by lazily initialize kernel lists on
first use.

This patch applies this fix for `matmul_test.cpp`

Signed-off-by: Emil Ohlsson <emil.ohlsson@arm.com>

Approved-by: Anton Bondarenko <anton.bondarenko@arm.com>

f7a00947

Apr 03, 2025

Remove designated initializers for matmul_clamp_f32_qai8dxp_qsi4cxp_test · af905324

Jens Elofsson authored Apr 03, 2025



- Remove designated initializers for matmul_clamp_f32_qai8dxp_qsi4cxp_test
to comply with C++17 standard.

Signed-off-by: Jens Elofsson <jens.elofsson@arm.com>

Approved-by: Viet-Hoa Do <viet-hoa.do@arm.com>

af905324

Apr 02, 2025

Remove FP16 and BF16 requirement to build and run tests · c3b6138a

Viet-Hoa Do authored Apr 02, 2025



* FP16 and BF16 classes are implemented in assembly so the rest
  of the test framework doesn't need to be compiled with FP16
  and BF16 support anymore. It allows the test to be run
  on system with base architecture.
* Remove unnecessary feature guard in kernel header file.
  The user of our API must not need to compile their code
  with BF16 support.

Signed-off-by: Viet-Hoa Do <viet-hoa.do@arm.com>

Approved-by: Jakub Sujak <jakub.sujak@arm.com>

c3b6138a

Apr 01, 2025

Remove use of designated initializers in certain unit tests · b6c8989c

Jens Elofsson authored Apr 01, 2025



Removes designated initializers from
- matmul_test.cpp
- matmul_clamp_f16_bf16p_bf16p_test.cpp
- matmul_clamp_f32_bf16p_bf16p_test.cpp

Following changes are made to the test framework:
- Added default value to data_type in DataFormats constructor
- Initialize members of struct MatMulMethod
- Add '-Wpedantic' as a build flag to the affected unit tests

Signed-off-by: Jens Elofsson <jens.elofsson@arm.com>

Reviewed-by: Anton Bondarenko <anton.bondarenko@arm.com>
Reviewed-by: Emil Ohlsson <emil.ohlsson@arm.com>
Approved-by: Emil Ohlsson <emil.ohlsson@arm.com>

b6c8989c

Replace check for Advanced SIMD · 7a5c20a7

Jakub Sujak authored Apr 01, 2025

Although `hw.optional.AdvSIMD` is the replacement for `hw.optional.neon`, this parameter is not always present in different versions of
the OS. This may lead to the test suite crashing or tests being erroneously skipped.

Instead, we check if the machine supports `hw.optional.arm64` and, if
true, we can assume Advanced SIMD support is always present.

Signed-off-by: Jakub Sujak <jakub.sujak@arm.com>

Approved-by: Viet-Hoa Do <viet-hoa.do@arm.com>

7a5c20a7

Mar 26, 2025

Set version to 1.6.0 · 9668db3d

Jens Elofsson authored Mar 26, 2025



Update all version indicators to 1.6.0.

Signed-off-by: Jens Elofsson <jens.elofsson@arm.com>

Approved-by: Jakub Sujak <jakub.sujak@arm.com>

9668db3d

Update changelog with new changes. · 998c8683

Jens Elofsson authored Mar 26, 2025



Signed-off-by: Jens Elofsson <jens.elofsson@arm.com>

Approved-by: Jakub Sujak <jakub.sujak@arm.com>

998c8683

Optimizes RHS packing qsu4c32s16s0->qsi4c32pscalef16 · 9186e07d

Dan Johansson authored Mar 26, 2025 and

Emil Ohlsson committed Mar 26, 2025



Optimizes this RHS packing by vectorizing the XOR operation. This is done
for segment lenghts of 4 or 8 bytes. The unoptimized path is used for
any other segment length.

Signed-off-by: Dan Johansson <dan.johansson@arm.com>

Approved-by: Emil Ohlsson <emil.ohlsson@arm.com>

9186e07d