Add QAI8 IGEMM kernels (!335) · Merge requests · Kleidi / KleidiAI

Emil Ohlsson requested to merge feature/int8-igemm into main Apr 02, 2025

This change introduces three new kernels:

kai_imatmul_clamp_qai8_qai8p2vlx4_qsi8cxpsb2vlx4_2vlx2vl_sme2_mopa
kai_lhs_imatmul_pack_x8p2vlx4_x8p_sme
kai_rhs_imatmul_pack_kxn_qsi8cxp2vlx4sb_qs8cx_f32_i32_sme

These kernels are used for indirect matmul. The big difference between these kernels and matmul kernels is that the LHS packing kernel takes an indirection buffer where each pointer refers to a chunk in K dimension. The pointers are laid out in a packed manner, where instead of being in row major order, a column of get_m_step chunk pointers are placed linearly in indirection buffer.

In addition to the kernels themselves, the matmul_clamp_qai8_qai8p_qsi8cxp_test.cpp is extended to perform testing of these new kernels. The testing flow for these new kernels is a bit different, in that the packing kernels themselves are not directly tested, instead only end-to-end flow is tested.

Signed-off-by: Emil Ohlsson emil.ohlsson@arm.com Signed-off-by: Felix Thomasmathibalan felixjohnny.thomasmathibalan@arm.com Signed-off-by: Mohammed Suhail Munshi MohammedSuhail.Munshi@arm.com

Edited Apr 09, 2025 by Emil Ohlsson

Add QAI8 IGEMM kernels

Merge request reports