- Dec 24, 2024
-
-
- Add unit test Signed-off-by:
Michael Kozlov <michael.kozlov@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Reviewed-by:
Emil Ohlsson <emil.ohlsson@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
* GEMM and GEMV micro-kernels to compute the matrix multiplication of dynamically quantized 8-bit integer (QAI8DX) LHS matrix and quantized 4-bit integer (QSI4CX) RHS matrix and the accumulation of the result into a single-precision (F32) output, optimized for SME2 technology. Signed-off-by:
Mohamad Najem <mohamad.najem@arm.com> Signed-off-by:
Anitha Raj <anitha.raj@arm.com> Signed-off-by:
Michael Kozlov <michael.kozlov@arm.com> Signed-off-by:
Thomas Bamelis <thomas.bamelis@arm.com> Reviewed-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Reviewed-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Dec 13, 2024
-
-
Emil Ohlsson authored
Changes reported release version, and updates changelog since last release Signed-off-by:
Emil Ohlsson <emil.ohlsson@arm.com> Reviewed-by:
Emil Ohlsson <emil.ohlsson@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
There are a few items missing in the changelog since last relase. This commit updates the list with recent changes. Signed-off-by:
Emil Ohlsson <emil.ohlsson@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Optimize the generic RHS packing NxK. The performance improvement is around ~1.5x Signed-off-by:
Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>
-
Signed-off-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Emil Ohlsson <emil.ohlsson@arm.com> Reviewed-by:
alankelly <alankelly@google.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Dec 11, 2024
-
-
Jakub Sujak authored
The `__fp16` type is known to be troublesome for certain build environments. Instead use type `float` in the interface, and in the implementation use types `float16_t` (an alias for `__fp16`) for casting or `uint16_t` for pointer arithmetic. Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Dec 02, 2024
-
-
- GEMM and GEMV Micro-kernels to compute the matrix multiplication of dynamically quantized symmetric signed 8-bit integer with per-block quantization (QSI8D32) LHS matrix and quantized symmetric 4-bit signed integer with per-block quantization (QSI4C32) RHS matrix and the accumulation of the result into a single-precision (F32) output, optimized for SME2 technology. Signed-off-by:
Gian Marco Iodice <gianmarco.iodice@arm.com> Signed-off-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Nov 29, 2024
-
-
Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>
-
- Nov 28, 2024
-
-
- Add GeMM-like micro-kernels - Add GeMV-like micro-kernels Signed-off-by:
Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Nov 27, 2024
-
-
Signed-off-by:
Michael Kozlov <michael.kozlov@arm.com> Signed-off-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>
-
- Nov 04, 2024
-
-
Jakub Sujak authored
Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Oct 16, 2024
-
-
Jakub Sujak authored
Compute the Vector-Matrix multiply of F32 inputs to produce an F32 matrix, optimized using SME instructions. Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Oct 14, 2024
-
-
Signed-off-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Anitha Raj <anitha.raj@arm.com> Approved-by:
Gian Marco Iodice <gianmarco.iodice@arm.com>
-
- Sep 27, 2024
-
-
Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
Signed-off-by:
Anitha Raj <anitha.raj@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
Felix Johnny Thomasmathibalan authored
Signed-off-by:
Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Sep 25, 2024
-
-
This reverts commit f4f59599 The existing FP32 GEMM micro-kernel (matmul_clamp_f32_f32_f32p8x1biasf32_6x8x4_neon_mla) has a dedicated path for M=1 (a "GEMV" operation). Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Sep 12, 2024
-
-
- The LHS matrix is Quantized (Q) Asymmetric (A) Signed 8-bit (I8) with per-row (DX) quantization parameters - The RHS matrix is quantized (Q) Symmetric (S) Signed 4-bit (I4) with per-block quantization - The destination is F32 - Implement micro-kernels to perform the matrix multiplication - Implement a micro-kernel to pack the RHS matrix Signed-off-by:
Gian Marco Iodice <gianmarco.iodice@arm.com> Signed-off-by:
Anitha Raj <anitha.raj@arm.com> Signed-off-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Reviewed-by:
Jakub Sujak <jakub.sujak@arm.com> Reviewed-by:
Max Ren <maxren@meta.com> Approved-by:
Viet-Hoa Do <viet-hoa.do@arm.com>
-
- Aug 30, 2024
-
-
Signed-off-by:
Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Aug 19, 2024
-
-
Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Aug 16, 2024
-
-
* The LHS matrix is Quantized (Q) Symmetric (S) Signed 8-bit (I8) with per-block quantization (D32) quantization parameters * The RHS matrix is Quantized (Q) Symmetric (S) Signed 4-bit (I4) with per-block quantization(C32) F16 scale factors, * The destination is F32 * Implement micro-kernels to perform the matrix multiplication * Implement a micro-kernel to pack the LHS and RHS matrices * Added unit tests Signed-off-by:
Gian Marco <Iodice gianmarco.iodice@arm.com> Signed-off-by:
Anitha <Raj Anitha.Raj@arm.com> Signed-off-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Reviewed-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Jul 15, 2024
-
-
Felix Johnny Thomasmathibalan authored
Signed-off-by:
Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>
-
- Jul 05, 2024
-
-
Jakub Sujak authored
Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>
-
- Jul 04, 2024
-
-
Jakub Sujak authored
Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Reviewed-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-