arm64: Prefer non-broadcast maintenance for any task that has run on one CPU
__flush_tlb_range() always sends broadcast TLB maintenance to all CPUs in the inner shareable domain. If a process has only ever run on one CPU and that happens to be the current CPU, this is not necessary. In this case, Non-broadcast maintenance can be used. Set the bit for CPUs that have called __switch_mm() in the mm. (This field has always been allocated, but never used on arm64). The bits are only set, and never cleared so migration to a CPU where a previous thread had exited, leaving stale TLB records is covered. __flush_tlb_range() has a dsb to ensure the new page table entry is visible to all CPUs before they receive the TLBI. mm_cpumask() is read after this barrier. On the writer side, an additional dsb can ensure the write to mm_cpumask() is visible before the TTBR is written. Together these ensure that if __flush_tlb_range() doesn't see the bit set in mm_cpumask(), then __switch_mm() sees the updated page table entries, and doesn't need to receive broadcast maintenance. | AArch64 MP | { | p=1; | 0:X1=p; 0:X2=m; | 1:X1=p; 1:X2=m; X4=(asid:37, base=...) | } | | P0 | P1 ; | MOV X0, #0 | MOV X0, #1 ; | STR X0, [X1] | STR X0, [X2] ; | DSB ISHST | DSB ISHST ; | LDR X0, [X2] | MSR TTBR0_EL1, X4; | | ISB | | LDR X0, [X1] | | exists (0:X0=0 /\ 1:X0=1) (the tools are not yet able to validate this) Signed-off-by:James Morse <james.morse@arm.com> Signed-off-by:
Ryan Roberts <ryan.roberts@arm.com>
Loading
Please register or sign in to comment