Skip to content
Commit 38bb0063 authored by Ryan Roberts's avatar Ryan Roberts
Browse files

arm64/mm: Implement ptep_set_wrprotects() to optimize fork()

With the core-mm changes in place to batch-copy ptes during fork, we can
take advantage of this in arm64 to greatly reduce the number of tlbis we
have to issue, and recover the lost fork performance incured when adding
support for transparent contiguous ptes.

If we are write-protecting a whole contig range, we can apply the
write-protection to the whole range and know that it won't change
whether the range should have the contiguous bit set or not. For ranges
smaller than the contig range, we will still have to unfold, apply the
write-protection, then fold if the change now means the range is
foldable.

This optimization is possible thanks to the tightening of the Arm ARM in
respect to the definition and behaviour when 'Misprogramming the
Contiguous bit'. See section D21194 at
https://developer.arm.com/documentation/102105/latest/



Performance tested with the following test written for the will-it-scale
framework:

-------

char *testcase_description = "fork and exit";

void testcase(unsigned long long *iterations, unsigned long nr)
{
	int pid;
	char *mem;

	mem = malloc(SZ_128M);
	assert(mem);
	memset(mem, 1, SZ_128M);

	while (1) {
		pid = fork();
		assert(pid >= 0);

		if (!pid)
			exit(0);

		waitpid(pid, NULL, 0);

		(*iterations)++;
	}
}

-------

I see huge performance regression when PTE_CONT support was added, then
the regression is mostly fixed with the addition of this change. The
following shows regression relative to before PTE_CONT was enabled
(bigger negative value is bigger regression):

|   cpus |   before opt |   after opt |
|-------:|-------------:|------------:|
|      1 |       -10.4% |       -5.2% |
|      8 |       -15.4% |       -3.5% |
|     16 |       -38.7% |       -3.7% |
|     24 |       -57.0% |       -4.4% |
|     32 |       -65.8% |       -5.4% |

Signed-off-by: Ryan Roberts's avatarRyan Roberts <ryan.roberts@arm.com>
parent 28ed6285
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment