- Oct 21, 2024
-
-
This commit - Adds bf16 x bf16 = fp32 matmul microkernel with 8x12 output block size - Lhs/Rhs packing functions that packs and converts the inputs from fp32 to bf16 - Corresponding tests, and modifications to the testing framework, and reference implementation Signed-off-by:
Gunes Bayir <gunes.bayir@arm.com> Reviewed-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Reviewed-by:
Anton Bondarenko <anton.bondarenko@arm.com> Reviewed-by:
Gunes Bayir <gunes.bayir@arm.com> Approved-by:
Gian Marco Iodice <gianmarco.iodice@arm.com>
-
- Aug 13, 2024
-
-
Felix Johnny Thomasmathibalan authored
Block size: 6x8 Signed-off-by:
Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
- Aug 01, 2024
-
-
Felix Johnny Thomasmathibalan authored
Signed-off-by:
Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Viet-Hoa Do <viet-hoa.do@arm.com>
-
Signed-off-by:
Michael Kozlov <michael.kozlov@arm.com> Approved-by:
Viet-Hoa Do <viet-hoa.do@arm.com>
-
- Jul 15, 2024
-
-
Felix Johnny Thomasmathibalan authored
Signed-off-by:
Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Jens Elofsson <jens.elofsson@arm.com>
-
- Jul 08, 2024
-
-
Jakub Sujak authored
* Add instructions for building for various platforms using CMake. * Provide a CMake toolchain file for Arm GNU Toolchain. Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Jens Elofsson <jens.elofsson@arm.com>
-
- Jul 04, 2024
-
-
Signed-off-by:
Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
Felix Johnny Thomasmathibalan authored
Signed-off-by:
Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-
Signed-off-by:
Jakub Sujak <jakub.sujak@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- May 31, 2024
-
-
Signed-off-by:
Gian Marco Iodice <gianmarco.iodice@arm.com> Approved-by:
Viet-Hoa Do <viet-hoa.do@arm.com>
-
- May 23, 2024
-
-
- The LHS matrix is quantized (Q) Asymmetric (A) 8-bit (8) with per-row (DX) quantization parameters - The RHS matrix is quantized (Q) Symmetric (S) 4-bit (4) with per-channel (cx) quantization parameters - The destination is F32 - Implement matmul int4 micro-kernels with intrinsics by using the dotprod and i8mm extensions - Implement a micro-kernel to pack the RHS matrix - Implement two micro-kernels to dynamically quantize and pack the LHS matrix - Add README.md - No test added into this PR. Test will be added in a separate PR Signed-off-by:
Gian Marco Iodice <gianmarco.iodice@arm.com> Reviewed-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Reviewed-by:
Jakub Sujak <jakub.sujak@arm.com>
-
Viet-Hoa Do authored
* Also remove trailing whitespace. Signed-off-by:
Viet-Hoa Do <viet-hoa.do@arm.com> Approved-by:
Felix Johnny Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com>
-
- Apr 09, 2024
-
-
Content is taken from a pending Pull Request Signed-off-by:
Felix Thomasmathibalan <felixjohnny.thomasmathibalan@arm.com> Co-authored-by:
Gian Marco Iodice <gianmarco.iodice@arm.com> Approved-by:
Jakub Sujak <jakub.sujak@arm.com>
-