arm64: mm: Don't remap pgtables for allocate vs populate
The previous change reduced remapping in the fixmap during the
population stage, but the code was still separately
fixmapping/fixunmapping each table during allocation in order to clear
the contents to zero. Which means each table still has 2 TLB
invalidations issued against it. Let's fix this so that each table is
only mapped/unmapped once, halving the number of TLBIs.
Achieve this by abstracting pgtable allocate, map and unmap operations
out of the main pgtable population loop code and into a `struct
pgtable_ops` function pointer structure. This allows us to formalize the
semantics of "alloc" to mean "alloc and map", requiring an "unmap" when
finished. So "map" is only performed (and also matched by "unmap") if
the pgtable is already been allocated.
As a side effect of this refactoring, we no longer need to use the
fixmap at all once pages have been mapped in the linear map because
their "map" operation can simply do a __va() translation. So with this
change, we are down to 1 TLBI per table when doing early pgtable
manipulations, and 0 TLBIs when doing late pgtable manipulations.
Execution time of map_mem(), which creates the kernel linear map page
tables, was measured on different machines with different RAM configs:
| Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
| VM, 16G | VM, 64G | VM, 256G | Metal, 512G
---------------|-------------|-------------|-------------|-------------
| ms (%) | ms (%) | ms (%) | ms (%)
---------------|-------------|-------------|-------------|-------------
before | 77 (0%) | 429 (0%) | 1753 (0%) | 3796 (0%)
after | 77 (0%) | 375 (-13%) | 1532 (-13%) | 3366 (-11%)
Signed-off-by:
Ryan Roberts <ryan.roberts@arm.com>
Loading
Please register or sign in to comment