Cache optimizations
Add comments about ethosu_flush_dcache() being deprecated and not
recommended to be implemented. Cache coherency for regions that are
shared by the CPU and NPU are to be handled by the application before an
inference is invoked, as the driver will otherwise do it for every
invokation hurting performance.
Remove cache flush/clean and invalidation calls for all base pointers
and instead add a cache flush/clean and invalidation base pointer mask.
This mask defaults to only mark the scratch base pointer (tensor arena)
for both flush/clean and invalidation. The scratch base pointer is the
only one containg RW data shared between the CPU and NPU.
For the typical case, cache invalidation is only required to be done on
the scratch/tensor arena base pointer, as that contains the OFM data.
All other base pointers are either read only or in the case of dedicated
sram mode being used, the fast memory is only meant to be used by the
NPU and thus no cache coherency issues exist.
Add a helper function to allow the cache masks to be modified for
advanced use cases. The cache mask for flush and invalidate are both 8
bit masks where bit 0 corresponds to base pointer 0, bit 1 corresponds
to base pointer 1 etc.
Update previously incorrect documentation that the addresses shipped to
cache functions needs to be 16 byte aligned, they need to be 32 byte
aligned (or the cache line size of the CPU).
Invalidation of the complete cache is no longer supported as this is
potentially dangerous, especially in async use cases where the CPU might
be doing other things while the NPU is running. base_addr_size is now
required to be set for all invoke calls, or an assert will trigger.
Change-Id: Ica665ebfb84329ec5e56c224859516036fc08d2c
Signed-off-by:
Jonny Svärd <jonny.svaerd@arm.com>
Loading
Please register or sign in to comment