arm64: mm: Don't remap pgtables for allocate vs populate
During linear map pgtable creation, each pgtable is fixmapped / fixunmapped twice; once during allocation to zero the memory, and a again during population to write the entries. This means each table has 2 TLB invalidations issued against it. Let's fix this so that each table is only fixmapped/fixunmapped once, halving the number of TLBIs, and improving performance. Achieve this by abstracting pgtable allocate, map and unmap operations out of the main pgtable population loop code and into a `struct pgtable_ops` function pointer structure. This allows us to formalize the semantics of "alloc" to mean "alloc and map", requiring an "unmap" when finished. So "map" is only performed (and also matched by "unmap") if the pgtable has already been allocated. As a side effect of this refactoring, we no longer need to use the fixmap at all once pages have been mapped in the linear map because their "map" operation can simply do a __va() translation. So with this change, we are down to 1 TLBI per table when doing early pgtable manipulations, and 0 TLBIs when doing late pgtable manipulations. Execution time of map_mem(), which creates the kernel linear map page tables, was measured on different machines with different RAM configs: | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra | VM, 16G | VM, 64G | VM, 256G | Metal, 512G ---------------|-------------|-------------|-------------|------------- | ms (%) | ms (%) | ms (%) | ms (%) ---------------|-------------|-------------|-------------|------------- before | 13 (0%) | 162 (0%) | 655 (0%) | 1656 (0%) after | 11 (-15%) | 109 (-33%) | 449 (-31%) | 1257 (-24%) Signed-off-by:Ryan Roberts <ryan.roberts@arm.com> Tested-by:
Itaru Kitayama <itaru.kitayama@fujitsu.com> Tested-by:
Eric Chanudet <echanude@redhat.com>
Loading
Please register or sign in to comment