Optimize scalar RHS packing function NxK F32 <- QAI8DXP x QSU4C32
- Optimize the generic RHS packing NxK. The performance improvement is around ~1.5x Signed-off-by:Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Approved-by:
Anton Bondarenko <anton.bondarenko@arm.com>