Optimizes RHS packing qsu4c32s16s0->qsi4c32pscalef16
Optimizes this RHS packing by vectorizing the XOR operation. This is done for segment lenghts of 4 or 8 bytes. The unoptimized path is used for any other segment length. Signed-off-by:Dan Johansson <dan.johansson@arm.com> Approved-by:
Emil Ohlsson <emil.ohlsson@arm.com>