Skip to content
Commit ba20def9 authored by Gian Marco Iodice's avatar Gian Marco Iodice Committed by Jakub Sujak
Browse files

Optimize F32 <- QSI8D32 (LHS) x QSI4C32 (RHS) for SME



- GEMM and GEMV Micro-kernels to compute the matrix multiplication of dynamically quantized symmetric signed 8-bit integer with per-block quantization (QSI8D32) LHS matrix and quantized symmetric 4-bit signed integer with per-block quantization (QSI4C32) RHS matrix and the accumulation of the result into a single-precision (F32) output, optimized for SME2 technology.

Signed-off-by: Gian Marco Iodice's avatarGian Marco Iodice <gianmarco.iodice@arm.com>

Signed-off-by: Anitha Raj's avatarAnitha Raj <anitha.raj@arm.com>

Reviewed-by: Viet-Hoa Do's avatarViet-Hoa Do <viet-hoa.do@arm.com>
Reviewed-by: Anitha Raj's avatarAnitha Raj <anitha.raj@arm.com>
Reviewed-by: Felix Johnny Thomasmathibalan's avatarFelix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
Reviewed-by: Anton Bondarenko's avatarAnton Bondarenko <anton.bondarenko@arm.com>
Approved-by: Jakub Sujak's avatarJakub Sujak <jakub.sujak@arm.com>
parent ef4ffe0b
Loading
Loading
Loading
Pipeline #16804 passed with stages
in 3 minutes and 18 seconds
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment