Fix segmentation faults in benchmark tool
- Fix incorrect calculation of LHS matrix stride value
For kernels that use the LHS matrix stride in their API, namely
kai_matmul_clamp_f32_f32_f32p8x1biasf32_6x8x4_neon_mla
and
kai_matmul_clamp_f16_f16_f16p16x1biasf16_6x16x8_neon_mla
kernels, the
LHS stride value was calculated incorrectly by computing in terms of
bits, not bytes.
- Fix insufficient allocation of memory for SME kernels
For SME kernels, such as
kai_matmul_clamp_f32_f32_f32p16vlx1b_1x16vl_sme2_mla
, the tensor sizes
are in terms of the streaming SVE vector length. Thus, when running SME
kernels we must scale the LHS/RHS/DST buffer sizes by the VL
appropriately.
The segmentation faults were discovered when running with address sanitizer enabled.
Signed-off-by: Jakub Sujak jakub.sujak@arm.com