mm/filemap: Allow arch to request folio size for exec memory
Change the readahead config so that if it is being requested for an
executable mapping, do a synchronous read of an arch-specified size in a
naturally aligned manner into a folio of the same size (assuming an fs
with large folio support).
On arm64 if memory is physically contiguous and naturally aligned to the
"contpte" size, we can use contpte mappings, which improves utilization
of the TLB. When paired with the "multi-size THP" feature, this works
well to reduce dTLB pressure. However iTLB pressure is still high due to
executable mappings having a low likelihood of being in the required
folio size and mapping alignment, even when the filesystem supports
readahead into large folios (e.g. XFS).
The reason for the low likelihood is that the current readahead
algorithm starts with an order-2 folio and increases the folio order by
2 every time the readahead mark is hit. But most executable memory tends
to be accessed randomly and so the readahead mark is rarely hit and most
executable folios remain order-2. To make things worse, readahead
reduces the folio order to 0 at the readahead window boundaries if
required for alignment to those boundaries.
So let's special-case the read(ahead) logic for executable mappings. The
trade-off is performance improvement (due to more efficient storage of
the translations in iTLB) vs potential read amplification (due to
reading too much data around the fault which won't be used), and the
latter is independent of base page size. I've chosen 64K folio size for
arm64 which benefits both the 4K and 16K base page size configs and
shouldn't lead to any read amplification in practice since the old
read-around path was (usually) reading blocks of 128K. I don't
anticipate any write amplification because text is always RO.
Note that the text region of an ELF file could be populated into the
page cache for other reasons than taking a fault in a mmapped area. The
most common case is due to the loader read()ing the header which can be
shared with the beginning of text. So some text will still remain in
small folios, but this simple, best effort change provides good
performance improvements as is.
Benchmarking
============
The below shows nginx and redis benchmarks on Ampere Altra arm64 system.
First, confirmation that this patch causes more text to be contained in
64K folios:
| File-backed folios | system boot | nginx | redis |
| by size as percentage |-----------------|-----------------|-----------------|
| of all mapped text mem | before | after | before | after | before | after |
|========================|========|========|========|========|========|========|
| base-page-4kB | 26% | 9% | 27% | 6% | 21% | 5% |
| thp-aligned-8kB | 4% | 2% | 3% | 0% | 4% | 1% |
| thp-aligned-16kB | 57% | 21% | 57% | 6% | 54% | 10% |
| thp-aligned-32kB | 4% | 1% | 4% | 1% | 3% | 1% |
| thp-aligned-64kB | 7% | 65% | 8% | 85% | 9% | 72% |
| thp-aligned-2048kB | 0% | 0% | 0% | 0% | 7% | 8% |
| thp-unaligned-16kB | 1% | 1% | 1% | 1% | 1% | 1% |
| thp-unaligned-32kB | 0% | 0% | 0% | 0% | 0% | 0% |
| thp-unaligned-64kB | 0% | 0% | 0% | 1% | 0% | 1% |
| thp-partial | 1% | 1% | 0% | 0% | 1% | 1% |
|------------------------|--------|--------|--------|--------|--------|--------|
| cont-aligned-64kB | 7% | 65% | 8% | 85% | 16% | 80% |
The above shows that for both workloads (each isolated with cgroups) as
well as the general system state after boot, the amount of text backed
by 4K and 16K folios reduces and the amount backed by 64K folios
increases significantly. And the amount of text that is contpte-mapped
significantly increases (see last row).
And this is reflected in performance improvement:
| Benchmark | Improvement |
+===============================================+======================+
| pts/nginx (200 connections) | 8.96% |
| pts/nginx (1000 connections) | 6.80% |
+-----------------------------------------------+----------------------+
| pts/redis (LPOP, 50 connections) | 5.07% |
| pts/redis (LPUSH, 50 connections) | 3.68% |
Signed-off-by:
Ryan Roberts <ryan.roberts@arm.com>
Loading
Please register or sign in to comment