Improve packing performance for quantized Int4 per-block
Improves performance of ‘kai_rhs_pack_nxk_qsi4c32pnrx8_qsu4c32s1s0_neon’ by vectorizing row summation
Signed-off-by: Evie Wright evie.wright@arm.com
Edited by Evie Wright
Improves performance of ‘kai_rhs_pack_nxk_qsi4c32pnrx8_qsu4c32s1s0_neon’ by vectorizing row summation
Signed-off-by: Evie Wright evie.wright@arm.com