Fix segmentation faults in benchmark tool (!370) · Merge requests · Kleidi / KleidiAI

Jakub Sujak requested to merge jakub/benchmark_bugs into main Apr 28, 2025

Fix incorrect calculation of LHS matrix stride value

For kernels that use the LHS matrix stride in their API, namely kai_matmul_clamp_f32_f32_f32p8x1biasf32_6x8x4_neon_mla and kai_matmul_clamp_f16_f16_f16p16x1biasf16_6x16x8_neon_mla kernels, the LHS stride value was calculated incorrectly by computing in terms of bits, not bytes.

Fix insufficient allocation of memory for SME kernels

For SME kernels, such as kai_matmul_clamp_f32_f32_f32p16vlx1b_1x16vl_sme2_mla, the tensor sizes are in terms of the streaming SVE vector length. Thus, when running SME kernels we must scale the LHS/RHS/DST buffer sizes by the VL appropriately.

The segmentation faults were discovered when running with address sanitizer enabled.

Signed-off-by: Jakub Sujak jakub.sujak@arm.com

Fix segmentation faults in benchmark tool

Merge request reports