Optimize F32 <- QAI8DXP 1x8 (LHS) x QSI4C32P 8x8 (RHS) for 1x8 sdot
- Add new assembly ukernel optimized with FEAT_DOTPROD for matrix multiplication with 1x8 block size. - Update build script. - Add to unit test. Signed-off-by:Michael Kozlov <michael.kozlov@arm.com> Reviewed-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
parent
8e3073c9
Loading
Loading
Pipeline
#22938
passed
with stages
in
5 minutes and 15 seconds
Loading
Please register or sign in to comment