- Apr 15, 2025
-
-
Jens Elofsson authored
Update all version indicators to 1.7.0. Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Approved-by:
Emil Ohlsson <emil.ohlsson@arm.com>
-
- Apr 14, 2025
-
-
Felix Johnny Thomasmathibalan authored
The get_lhs_offset function pointer type in f32_bf16p_bf16p is fixed to read get_lhs_packed_offset as the LHS is packed as well. Signed-off-by:
Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Apr 11, 2025
-
-
Micro-kernels to compute the matrix multiplication of dynamically quantized asymmetric signed 8-bit integer with per-channel quantization (QAI8DX) LHS matrix and quantized symmetric 4-bit signed integer with per-channel quantization (QSI4CX) RHS matrix and the accumulation of the result into a half-precision (F16): Matrix multiplication (MxN) Micro-kernels of QAI8DX LHS and QSI4CX RHS with F16 output, optimized for FEAT_I8MM and FEAT_DotProd. Matrix multiplication (1xN) Micro-kernels of QAI8DX LHS and QSI4CX RHS with F16 output, optimized for FEAT_DotProd. Signed-off-by:
Anitha Raj <anitha.raj@arm.com> Signed-off-by:
Evie Wright <evie.wright@arm.com> Approved-by:
Viet-Hoa Do <viet-hoa.do@arm.com>
-
- Apr 10, 2025
-
-
Jens Elofsson authored
Static initialization has no guaranteed order, which may cause test listing to be initialized before the list of kernels. This fixes unit tests - matmul_clamp_f32_bf16p_bf16p_test - matmul_clamp_f16_bf16p_bf16p_test Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>
-
Viet-Hoa Do authored
* Writing to one union member and reading the other union member is considered undefined behavior, therefore we need to avoid it. Signed-off-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Approved-by:
Emil Ohlsson <emil.ohlsson@arm.com>
-
- Apr 09, 2025
-
-
Micro-kernels to compute the matrix multiplication of dynamically quantized symmetric signed 8-bit integer with per-block quantization (QSI8D32) LHS matrix and quantized asymmetric 4-bit signed integer with per-block quantization (QAI4C32) RHS matrix and the accumulation of the result into a single-precision (F32) and half-precision (F16) output: - Matrix multiplication (MxN) Micro-kernels of QSI8D32 LHS and QAI4C32 RHS with F32 output, optimized for FEAT_I8MM. - Matrix multiplication (1xN) Micro-kernels of QSI8D32 LHS and QAI4C32 RHS with F32 output, optimized for FEAT_DotProd. - Matrix multiplication (MxN) Micro-kernels of QSI8D32 LHS and QAI4C32 RHS with F16 output, optimized for FEAT_I8MM. - Matrix multiplication (1xN) Micro-kernels of QSI8D32 LHS and QAI4C32 RHS with F16 output, optimized for FEAT_DotProd. Signed-off-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>
-
Emil Ohlsson authored
This change introduces three new kernels: * kai_imatmul_clamp_qai8_qai8p2vlx4_qsi8cxpsb2vlx4_2vlx2vl_sme2_mopa * kai_lhs_imatmul_pack_x8p2vlx4_x8p_sme * kai_rhs_imatmul_pack_kxn_qsi8cxp2vlx4sb_qs8cx_f32_i32_sme These kernels are used for _indirect matmul_. The big difference between these kernels and matmul kernels is that the LHS packing kernel takes an indirection buffer where each pointer refers to a chunk in K dimension. The pointers are laid out in a packed manner, where instead of being in row major order, a column of `get_m_step` chunk pointers are placed linearly in indirection buffer. In addition to the kernels themselves, the `matmul_clamp_qai8_qai8p_qsi8cxp_test.cpp` is extended to perform testing of these new kernels. The testing flow for these new kernels is a bit different, in that the packing kernels themselves are not directly tested, instead only end-to-end flow is tested. Signed-off-by:
Emil Ohlsson <emil.ohlsson@arm.com> Signed-off-by:
Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Signed-off-by:
Mohammed Suhail Munshi <MohammedSuhail.Munshi@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Reviewed-by:
Jakub Sujak <jakub.sujak@arm.com> Reviewed-by:
Emil Ohlsson <emil.ohlsson@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Apr 08, 2025
-
-
Emil Ohlsson authored
There is an issue where the order of static initializations has no guaranteed order, which can cause test listing to be initialized before list of kernels. This can be solved by lazily initialize kernel lists on first use. This patch applies this fix for `matmul_test.cpp` Signed-off-by:
Emil Ohlsson <emil.ohlsson@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>
-
- Apr 03, 2025
-
-
Jens Elofsson authored
- Remove designated initializers for matmul_clamp_f32_qai8dxp_qsi4cxp_test to comply with C++17 standard. Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Approved-by:
Viet-Hoa Do <viet-hoa.do@arm.com>
-
- Apr 02, 2025
-
-
Viet-Hoa Do authored
* FP16 and BF16 classes are implemented in assembly so the rest of the test framework doesn't need to be compiled with FP16 and BF16 support anymore. It allows the test to be run on system with base architecture. * Remove unnecessary feature guard in kernel header file. The user of our API must not need to compile their code with BF16 support. Signed-off-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Apr 01, 2025
-
-
Jens Elofsson authored
Removes designated initializers from - matmul_test.cpp - matmul_clamp_f16_bf16p_bf16p_test.cpp - matmul_clamp_f32_bf16p_bf16p_test.cpp Following changes are made to the test framework: - Added default value to data_type in DataFormats constructor - Initialize members of struct MatMulMethod - Add '-Wpedantic' as a build flag to the affected unit tests Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Reviewed-by:
Emil Ohlsson <emil.ohlsson@arm.com> Approved-by:
Emil Ohlsson <emil.ohlsson@arm.com>
-
Jakub Sujak authored
Although `hw.optional.AdvSIMD` is the replacement for `hw.optional.neon`, this parameter is not always present in different versions of the OS. This may lead to the test suite crashing or tests being erroneously skipped. Instead, we check if the machine supports `hw.optional.arm64` and, if true, we can assume Advanced SIMD support is always present. Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Viet-Hoa Do <viet-hoa.do@arm.com>
-
- Mar 26, 2025
-
-
Jens Elofsson authored
Update all version indicators to 1.6.0. Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
Jens Elofsson authored
Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
Optimizes this RHS packing by vectorizing the XOR operation. This is done for segment lenghts of 4 or 8 bytes. The unoptimized path is used for any other segment length. Signed-off-by:
Dan Johansson <dan.johansson@arm.com> Approved-by:
Emil Ohlsson <emil.ohlsson@arm.com>
-
- Mar 21, 2025
-
-
Viet-Hoa Do authored
* The flag is set incorrectly that disables activation function in the GEMV asssembly kernel. * Test is updated to drive the clamping parameters properly. - The clamping parameter is set to reduce 20% the dynamic range of the output. Resolves: KLEIDIAI-545 Signed-off-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Mar 14, 2025
-
-
Jakub Sujak authored
* Alias the KleidiAI target for consistent linking to KleidiAI with `KleidiAI::kleidiai`. * Add install instructions for the KleidiAI library target and its public headers. * Export the KleidiAI target so that projects may use the imported target with `find_package()`. Usage: Install KleidiAI: ``` cmake -S . -B build cmake --build build cmake --install build ``` Once installed, KleidiAI can be imported and linked to using `find_package()`. ``` find_package(KleidiAI CONFIG REQUIRED) target_link_libraries(my_framework PRIVATE KleidiAI::kleidiai) ``` Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Viet-Hoa Do <viet-hoa.do@arm.com>
-
- Mar 12, 2025
-
-
Update all version indicators to 1.5.0. Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
Fix for reading LHS scale values from in kai_matmul_clamp_f32_qsi8d32p1vlx4_qsi4c32p4vlx4_1vlx4vl_sme2_mopa Fix the out-of-bounds read while loading the scale values from LHS packed matrix in \`kai_matmul_clamp_f32_qsi8d32p1vlx4_qsi4c32p4vlx4_1vlx4vl_sme2_mopa\` by updating the predicate Resolves: KLEIDIAI-507 Signed-off-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Emil Ohlsson <emil.ohlsson@arm.com> Approved-by:
Emil Ohlsson <emil.ohlsson@arm.com>
-
- Mar 11, 2025
-
-
Build system robustness improved by several methods: * Mark standard 'build' folder as ignored. This helps when doing different builds from a same folder * Combine source files for assembler kernels in same targets * Add sorting for new kernel lists * Relax clean step in CI for faster builds Signed-off-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
Jens Elofsson authored
Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Approved-by:
Emil Ohlsson <emil.ohlsson@arm.com>
-
- Mar 07, 2025
-
-
Anton Bondarenko authored
Analyzing skip test w/o a proper report message is hard. Providing more details helps with that. Signed-off-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
Emil Ohlsson authored
A couple of cleanups were done while adding support for QAI8 GEMV, these have been moved out to this patch * Sorts file lists in `CMakeLists.txt` * Add additional test shapes * Minor readability tweaks Signed-off-by:
Emil Ohlsson <emil.ohlsson@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>
-
- Mar 05, 2025
-
-
Jens Elofsson authored
This flag have been removed from CMakeLists.txt, but accidentally left in kai_defs.bzl. Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>
-
- Feb 27, 2025
-
-
Jens Elofsson authored
Change type of rhs_zero_point to uint8_t to match the data type in the kai_rhs_pack_qs4cxs1s0_param-struct. Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>
-
- Feb 26, 2025
-
-
Jens Elofsson authored
The argument to std::mt19937:s constructor is uint32_t, but the supplied value (the variable "seed") was uint64_t. This has been changed to uint32_t. Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>
-
- Feb 24, 2025
-
-
* Refactor the benchmark tool to create a generic abstraction that allows for running matrix multiplication micro-kernels with different interfaces. * Extend benchmark support to all matrix multiplication micro-kernels in the library. Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
This flag is a stylistic option in GCC and does not add to security hardening. Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Feb 20, 2025
-
-
- Add new assembly ukernel optimized with FEAT_I8MM for matrix multiplication with 4x8 block size. - Update build script. - Add to unit test. Signed-off-by:
Michael Kozlov <michael.kozlov@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Feb 18, 2025
-
-
Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
Jens Elofsson authored
Update all version indicators to 1.4.0. Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Reviewed-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
Felix Johnny Thomasmathibalan authored
lhs stride is removed from kai_run_matmul_clamp_qai8_qai8_qsi8cxp2vlx4sb_1x16vl_sme2_dot Signed-off-by:
Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Feb 17, 2025
-
-
- Add new assembly ukernel optimized with FEAT_DOTPROD for matrix multiplication with 1x8 block size. - Update build script. - Add to unit test. Signed-off-by:
Michael Kozlov <michael.kozlov@arm.com> Reviewed-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
Add support for GEMV like kernel for producing QAI8 from QAI8 LHS and QSI8CXP packed RHS. Update unit tests to include support for new kernel Signed-off-by:
Emil Ohlsson <emil.ohlsson@arm.com> Reviewed-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Reviewed-by:
Emil Ohlsson <emil.ohlsson@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
When adding negative tests a potential crash from positive one got masked. Enforce exit from a test script for positive test crash. Signed-off-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
Add negative test for CPU features to make sure actual binaries using non-supported instructions would crash with illegal instruction signal. Signed-off-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Feb 13, 2025
-
-
Testing with bare minimum of CPU features allows to verify that optional features are properly guarded with feature checks. Signed-off-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Feb 12, 2025
-
-
Anton Bondarenko authored
Number of parallel jobs suitable for current CI runner configuration not always could be retrieved by standard system utilities. The value need to be hardcoded to avoid unexpected memory usage and could be aligned with current CI resources and configuration. Signed-off-by:
Anton Bondarenko <anton.bondarenko@arm.com> Reviewed-by:
Emil Ohlsson <emil.ohlsson@arm.com> Approved-by:
Emil Ohlsson <emil.ohlsson@arm.com>
-
- Update build script - Add to unit test Signed-off-by:
Michael Kozlov <michael.kozlov@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>
-