Optimize kai_rhs_pack_nxk_qsi4c32p_qsu4c32s1s0 using advanced SIMD
Optimize the transposed RHS packing function for matmul_clamp_f32_qai8dxp_qsi4c32p using advanced SIMD, for kr / sr = 8 Signed-off-by:Evie Wright <evie.wright@arm.com> Signed-off-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Anitha Raj <anitha.raj@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Reviewed-by:
Evie Wright <evie.wright@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>