Add fp16 in/out bf16 Gemm kernel and relevant packing functions
This commit * Adds bf16 x bf16 = fp16 matmul microkernel with 8x12 output block size * Lhs/Rhs packing functions that packs and converts the inputs from fp16 to bf16 * Corresponding tests, and modifications to the testing framework, and reference implementation Signed-off-by:Gunes Bayir <gunes.bayir@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Jakub Sujak <jakub.sujak@arm.com> Reviewed-by:
Gunes Bayir <gunes.bayir@arm.com> Reviewed-by:
Emil Ohlsson <emil.ohlsson@arm.com> Approved-by:
Emil Ohlsson <emil.ohlsson@arm.com>