- Nov 28, 2024
-
-
This change fixes a minor copy paste error in the kernel interfaces. Signed-off-by:
Emil Ohlsson <emil.ohlsson@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Nov 27, 2024
-
-
Signed-off-by:
Michael Kozlov <michael.kozlov@arm.com> Signed-off-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>
-
- Nov 21, 2024
-
-
Add fp16 kernels for LHS and RHS packing, and matmul. Also add related unit tests for said kernels, and extend unit Matmul tests to support calling fp16 kernels. Signed-off-by:
Emil Ohlsson <emil.ohlsson@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Nov 20, 2024
-
-
Emil Ohlsson authored
KleidiAI is intended to target certain build environments, this means that KleidiAI should be buildable using CMake version 3.16 Signed-off-by:
Emil Ohlsson <emil.ohlsson@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Nov 19, 2024
-
-
* Round off the odd strides for the int4 RHS by padding with 0s Signed-off-by:
Anitha Raj <anitha.raj@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Nov 18, 2024
-
-
Alias existing nxk/kxn packing parameter structs to new one. To keep a consistent interface for the packing function(s) within a microkernel folder. Signed-off-by:
Michael Kozlov <michael.kozlov@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Michael Kozlov <michael.kozlov@arm.com> Approved-by:
Gian Marco Iodice <gianmarco.iodice@arm.com>
-
Move the data generation in the unit tests to after feature support have been confirmed. Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Nov 14, 2024
-
-
One of the f32 kernel tests incorrectly uses matmtul functionality for both main and rhs support functionality. This commit addresses this issue Signed-off-by:
Emil Ohlsson <emil.ohlsson@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
Add generic transpose function, use it for non-transposed (kxn) RHS packing tests. Signed-off-by:
Anitha Raj <anitha.raj@arm.com> Signed-off-by:
Michael Kozlov <michael.kozlov@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
Anton Bondarenko authored
Signed-off-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Emil Ohlsson <emil.ohlsson@arm.com>
-
- Nov 13, 2024
-
-
Emil Ohlsson authored
This change addresses testing issues related to RHS and LHS packing tests. One issue related to reference data comparisson, which is addressed by adding proper rounding for comparisons. LHS packing tests did not correctly pass rolling parameters back to the packing kernel RHS packing had issues with blocking, which affected portioned testing Signed-off-by:
Emil Ohlsson <emil.ohlsson@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
Emil Ohlsson authored
This change makes some readability changes to the testing framework. Which allows printing of rectangles and intermediate values for easy dumping while debugging. Signed-off-by:
Emil Ohlsson <emil.ohlsson@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Nov 11, 2024
-
-
Anton Bondarenko authored
10 minutes should be enough for regular ones. Job with external dependency has a bigger timeout of 1 hour for better robustness. Signed-off-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Nov 06, 2024
-
-
Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
The `CMAKE_SOURCE_DIR` variable always corresponds to the top level directory of the CMakefile being processed by CMake. This causes issues for CMake projects that fetch KleidiAI using `FetchContent()` as it incorrectly assumes KleidiAI's dependencies reside in that project's top level directory, rather than in KleidiAI's source tree. Resolve this issue by using the `CMAKE_CURRENT_SOURCE_DIR` variable to use relative paths to the KleidiAI project. Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Nov 04, 2024
-
-
Jakub Sujak authored
Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
Jakub Sujak authored
Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
Now when we have major third party components in repository there is no need to download them anymore. Signed-off-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Add variant to examples - Add unit test for variant Signed-off-by:
Michael Kozlov <michael.kozlov@arm.com> Approved-by:
Gian Marco Iodice <gianmarco.iodice@arm.com>
-
Local third party components provides a better clarity for KleidiAI library external dependencies. Commands used to get files: LICENSES/BSD-3-Clause.txt -> reuse download BSD-3-Clause third_party/benchmark-v1.8.4.zip -> wget https://github.com/google/benchmark/archive/refs/tags/v1.8.4.zip third_party/googletest-v.1.14.0.zip -> wget https://github.com/google/googletest/archive/refs/tags/v1.14.0.zip Signed-off-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
Anton Bondarenko authored
It's preferred to keep original license text untouched so common project code style should not apply to it. Signed-off-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
This helps distinguish it from the other SME GEMV kernel. Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>
-
Felix Johnny Thomasmathibalan authored
This reverts commit 6b3c6fad. Signed-off-by:
Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>
-
Anton Bondarenko authored
It's preferred to keep original license text untouched so common project code style should not apply to it. Signed-off-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Oct 31, 2024
-
-
Signed-off-by:
Gunes Bayir <gunes.bayir@arm.com> Reviewed-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Gian Marco Iodice <gianmarco.iodice@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
Adds an SME F32 MatMul (1xN) micro-kernel that computes on the same packed RHS as the main SME F32 MatMul (MxN) micro-kernel. In other words, both the (1xN) and (MxN) micro-kernels share a RHS with the same packing parameters `nr` and `kr`, indicated by 2vlx1. Having a (1xN) and (MxN) micro-kernel pairing is optimal to handle cases in AI frameworks where the LHS is dynamic (and as such the value of M can change) but where the RHS is shared so it only needs to be packed once. Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Reviewed-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Oct 30, 2024
-
-
Signed-off-by:
Gian Marco Iodice <gianmarco.iodice@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
Signed-off-by:
Gunes Bayir <gunes.bayir@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Reviewed-by:
Gunes Bayir <gunes.bayir@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Added a specialization for nr=4, kr=16, sr=2 - Improved the RHS packing function performance by 55% Signed-off-by:
Gian Marco Iodice <gianmarco.iodice@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Oct 29, 2024
-
-
- Integrate ASM matmul micro-kernel for the GeMV and GeMM variants - Refactor the LHS and RHS packing function to load the scale from the beginning of the block - Add timer in the example for profiling the ukernels Signed-off-by:
Gian Marco Iodice <gianmarco.iodice@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
Viet-Hoa Do authored
* Add packing function for transposed RHS matrix for SME2 FP32. * Update the test. Signed-off-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Oct 25, 2024
-
-
* This patch also makes clang-tidy job failed when there is any warning in the library code. Signed-off-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Oct 24, 2024
-
-
Emil Ohlsson authored
Currently the C++ standard version is set in the head CMake file, but it doesn't require the version to be available. Due to this, CMake will not actually pass `-std=c++17` to `g++`. This commit fixes the above issue by marking all targets using C++ to require C++17, but will not actually set this requirement at top level, as C++ is not a requirement for using KleidiAI Signed-off-by:
Emil Ohlsson <emil.ohlsson@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Oct 22, 2024
-
-
Update the following GEMV kernels with assembly micro-kernels using SDOT * kai_matmul_clamp_f32_qai8dxp1x8_qsi4cxp4x8_1x4x32_neon_dotprod * kai_matmul_clamp_f32_qai8dxp1x8_qsi4cxp8x8_1x8x32_neon_dotprod Signed-off-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Anitha Raj <anitha.raj@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Oct 21, 2024
-
-
This commit - Adds bf16 x bf16 = fp32 matmul microkernel with 8x12 output block size - Lhs/Rhs packing functions that packs and converts the inputs from fp32 to bf16 - Corresponding tests, and modifications to the testing framework, and reference implementation Signed-off-by:
Gunes Bayir <gunes.bayir@arm.com> Reviewed-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Reviewed-by:
Gunes Bayir <gunes.bayir@arm.com> Approved-by:
Gian Marco Iodice <gianmarco.iodice@arm.com>
-
- Oct 17, 2024
-
-
Jakub Sujak authored
The RDSVL instruction allows for reading the effective Streaming SVE Vector Length (SVL) without having the Processing Element enter Streaming SVE mode. Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Approved-by:
Viet-Hoa Do <viet-hoa.do@arm.com>
-
Signed-off-by:
Anitha Raj <anitha.raj@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Oct 16, 2024
-
-
Jakub Sujak authored
Compute the Vector-Matrix multiply of F32 inputs to produce an F32 matrix, optimized using SME instructions. Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Oct 15, 2024
-
-
Jens Elofsson authored
Remove functions when the architecture support they need is missing so that invalid use of them results in a build error. Signed-off-by:
Jens Elofsson <jens.elofsson@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-