arm64/mm: Update tlb invalidation routines for FEAT_LPA2 (79d42881) · Commits · linux-arm / linux-rr

Commit 79d42881 authored Dec 01, 2022 by Ryan Roberts

arm64/mm: Update tlb invalidation routines for FEAT_LPA2

FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
the non-range tlbi instructions can now validly take a 0 value for the
4KB granule (this is due to the extra level of translation). Secondly,
the BADDR field in the range tlbi instructions must be aligned to 64KB
when LPA2 is in use (TCR.DS=1). Changes are required for tlbi to
continue to operate correctly when LPA2 is in use.

We solve the first by always adding the level hint if the level is
between [0, 3] (previously anything other than 0 was hinted, which
breaks in the new level -1 case from kvm). When running on non-LPA2 HW,
0 is still safe to hint as the HW will fall back to non-hinted. We also
update kernel code to take advantage of the new hint for p4d flushing.
While we are at it, we replace the notion of 0 being the non-hinted
seninel with a macro, TLBI_TTL_UNKNOWN. This means callers won't need
updating if/when translation depth increases in future.

The second problem is tricker. When LPA2 is in use, we need to use the
non-range tlbi instructions to forward align to a 64KB boundary first,
then we can use range-based tlbi from there on, until we have either
invalidated all pages or we have a single page remaining. If the latter,
that is done with non-range tlbi. (Previously we invalidated a single
odd page first, but we can no longer do this because it could wreck our
64KB alignment). When LPA2 is not in use, we don't need the initial
alignemnt step. However, the bigger impact is that we can no longer use
the previous method of iterating from smallest to largest 'scale', since
this would likely unalign the boundary again for the LPA2 case. So
instead we iterate from highest to lowest scale, which guarrantees that
we remain 64KB aligned until the last op (at scale=0).

The original commit (d1d3aa98 "arm64: tlb: Use the TLBI RANGE feature in
arm64") stated this as the reason for incrementing scale:

However, in most scenarios, the pages = 1 when flush_tlb_range() is
called. Start from scale = 3 or other proper value (such as scale
=ilog2(pages)), will incur extra overhead. So increase 'scale' from 0
to maximum, the flush order is exactly opposite to the example.

But pages=1 is already special cased by the non-range invalidation path,
which will take care of it the first time through the loop (both in the
original commit and in my change), so I don't think switching to
decrement scale should have any extra performance impact after all.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>

parent 9a4946b7

Hide whitespace changes

Inline Side-by-side

Please register or to comment