Improve packing performance for quantized Int4 per-block
Improves performance of ‘kai_rhs_pack_nxk_qsi4c32pnrx8_qsu4c32s1s0_neon’ by vectorizing row summation Signed-off-by:Evie Wright <evie.wright@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>