Matmul Micro-kernels F32 <- QAI8DXP(LHS) x QSI8CXP(RHS) optimized for SME (!378) · Merge requests · Kleidi / KleidiAI

Anitha Raj requested to merge f32_int8_sme into main May 19, 2025

Micro-kernels (1xN) to compute the matrix multiplication of dynamically quantized asymmetric 8-bit integer with per-channel quantization (QAI8DX) LHS matrix and quantized symmetric 8-bit integer with per-channel quantization (QSI8CX) RHS matrix and the accumulation of the result into a single-precision (F32) output, optimized for SME2 technology.
Micro-kernels (MxN) to compute the matrix multiplication of dynamically quantized asymmetric 8-bit integer with per-channel quantization (QAI8DX) LHS matrix and quantized symmetric 8-bit integer with per-channel quantization (QSI8CX) RHS matrix and the accumulation of the result into a single-precision (F32) output, optimized for SME2 technology.

Signed-off-by: Anitha Raj anitha.raj@arm.com

Edited May 28, 2025 by Anitha Raj

Matmul Micro-kernels F32 <- QAI8DXP(LHS) x QSI8CXP(RHS) optimized for SME