sched: Do opportunistic updates of remote rq_clock() (161fc96a) · Commits · linux-arm / linux-pg

Commit 161fc96a authored Oct 31, 2024 by Pierre Gondois
sched: Do opportunistic updates of remote rq_clock()



PELT clock is scaled with CPU capacity, i.e. with the uArch CPU capacity
and current frequency. This allows to estimate task load/runnable/util
signals accurately.

The cpufreq schedutil governor regularly checks the operating frequency
matches the system utilization through cpufreq_update_util(rq, X). Pelt
rq clock is updated before.

Frequencies are shared among CPUs in the same perf. domain. The pelt
clocks of these CPUs are not updated at the same timing. Thus the
pelt rq clock contributions are scaled using the last current frequency,
but the frequency might have been different during most part of the
contribution.

This effect is related to the schedutil cpufreq 'rate_limit_us', which
allows to limit the delta between consecutive frequency requests.
(struct sugov_policy).last_freq_update_time keeps track of
the last time a frequency requests was emitted for a perf. domain.

On a lightly loaded perf. domain, if a frequency request was emitted
just before enqueuing a big a task:
- the task will run 'rate_limit_us' us at a low frequency
- the frequency of the domain will be updated to a new, higher,
  frequency after 'rate_limit_us' us. The big task will be accounted
  some ms as running at a higher frequency than the reality.

If fast_switch is not enabled, frequency changes are emitted from the
sugov:X thread. So inside a perf. domain, the CPU running the sugov:X
thread is less likely to see such effect.

This effect can be seen with the following test. On a Pixel6, running
rt-app tasks on the big CPUs:
- 3 tasks pinned on CPU6 (where the sugov thread runs)
  - 3% duty cycle
  - 16ms period
  - tasks starting with a 4ms delay
These task to trigger frequent frequency changes.

- 1 task pinned to CPU7 (where the sugov thread doesn't run)
  - 30% duty cycle
  - 128ms period
Due to the long period, the task will raise the OPP of the perf. domain
when running. The perf. domain will then run at a low OPP. If the small
tasks triggered a frequency change just before the tasks wakes up:
- the task will first run at a low OPP
- a freq. change will happen from CPU6
- whenever update_rq_clock(rq7) will be called, time scaling will be
  done using the high OPP.

Run 5 times with a 10s workload, and with a rate_limit_us of Xms,
each rt-app loop of the big task (with 128ms period) lasts:
+------------+----------+--------+------------+-----------+-----------+
| rate_limit | mean(ns) |    std | mean_patch | std_patch | std_ratio |
+------------+----------+--------+------------+-----------+-----------+
|     1000us | 39981873 | 298211 |   39871798 |     58267 |    -80.4% |
|    10000us | 40169796 | 495455 |   39887005 |     63122 |    -87.2% |
|    20000us | 40051687 | 329570 |   39882019 |     59167 |    -82.0% |
+------------+----------+--------+------------+-----------+-----------+
The standard deviation of the running timne of the task is greatly
reduced.

The effect is more visible with:
- a higher 'rate_limit_us'
- large perf. domain, i.e. more CPUs will be affected by the frequency
  changes of another CPU.

If fast_switch is enabled, (struct cpufreq_driver).fast_switch() is
called with rq_lock() on. To avoid complex double lock scenarios,
update pelt rq clock opportunistically by using raw_spin_rq_trylock().
If a rq is idle, pelt time is not scaled, so there is no need to
update rq_clock().

Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
parent 2941f3be
Hide whitespace changes
Inline Side-by-side
Please register or to comment