Optimize kai_rhs_pack_nxk_qsi4c32p_qsu4c32s1s0 using advanced SIMD
Optimize the transposed RHS packing function for matmul_clamp_f32_qai8dxp_qsi4c32p using advanced SIMD, for kr / sr = 8
Signed-off-by: Evie Wright evie.wright@arm.com
Signed-off-by: Anitha Raj anitha.raj@arm.com
Edited by Anitha Raj