sched/fair: Decrease util_est in presence of idle time
util_est signal does not decay if the task utilization is lower than its runnable signal by a value of 10. This was done to keep the util_est signal high in case a task shares a rq with another task and doesn't obtain a desired running time. However, tasks sharing a rq obtain the running time they desire provided that the rq has some idle time. Indeed, either: - a CPU is always running. The utilization signal of tasks reflects the running time they obtained. This running time depends on the niceness of the tasks. A decreasing utilization signal doesn't reflect a decrease of the task activity and the util_est signal should not be decayed in this case. - a CPU is not always running (i.e. there is some idle time). Tasks might be waiting to run, increasing their runnable signal, but eventually run to completion. A decreasing utilization signal does reflect a decrease of the task activity and the util_est signal should be decayed in this case. ------------ Running on a 1024 capacity CPU: - TaskA: - duty_cycle=60%, period=16ms, duration=2s - TaskB: - sleep=2ms (to let TaskA run first), duration=1s - first phase: duty_cycle=20%, period=16ms, duration=1s - second phase: duty_cycle=5%, period=16ms, duration=1s When TaskB starts the second phase, the util_est signal can take up to ~150ms before starting to decrease. Indeed, if TaskB runs after TaskA, its runnable signal will be higher than its util signal by more than 10 units. This creates unnecessary frequency spikes: upon enqueuing TaskB, util_est_cfs is increased by the old value of util_est of TaskB, impacting frequency selection through: sugov_get_util() \-cpu_util_cfs_boost() \-cpu_util() util_est = READ_ONCE(cfs_rq->avg.util_est); ------------ Energy impact can also be measured as suggested by Hongyan at [2]. On a Pixel6, the following workload is run 10 times: - TaskA: - duty_cycle=20%, duration=0.4s - one task per mid and big CPU (2 mid and 2 big CPUs, so 4 in total) - Used to increase the runnable signals of TaskB - TaskB: - sleep=2ms (to let TaskA run first) - first phase: duty_cycle=20%, duration=0.2s - second phase: duty_cycle=5%, duration=0.2s - 4 occurrences of TaskB The duration of the workload is low (0.4s) to emphasis the impact of continuing to run at an overestimated frequency. Energy measured with energy counters: ┌────────────┬────────────┬───────────┬───────────┬───────────┐ │ base mean ┆ patch mean ┆ base std ┆ patch std ┆ ratio (%) │ ╞════════════╪════════════╪═══════════╪═══════════╪═══════════╡ │ 536.412419 ┆ 487.232935 ┆ 68.591493 ┆ 66.862019 ┆ -9.16 │ └────────────┴────────────┴───────────┴───────────┴───────────┘ Energy computed from util signals and energy model: ┌────────────┬────────────┬───────────┬───────────┬───────────┐ │ base mean ┆ patch mean ┆ base std ┆ patch std ┆ ratio (%) │ ╞════════════╪════════════╪═══════════╪═══════════╪═══════════╡ │ 4.8318e9 ┆ 4.0823e9 ┆ 5.1044e8 ┆ 7.5558e8 ┆ -15.51 │ └────────────┴────────────┴───────────┴───────────┴───────────┘ ------------ The initial patch [2] aimed to solve an issue detected while running speedometer 2.0 [3]. While running speedometer 2.0 on a Pixel6, 3 versions are compared: - base: the current version - patch: the new version, with this patch applied - revert: the initial version, with commit [2] reverted Score (higher is better): ┌────────────┬────────────┬────────────┬─────────────┬──────────────┐ │ base mean ┆ patch mean ┆revert mean ┆ ratio_patch ┆ ratio_revert │ ╞════════════╪════════════╪════════════╪═════════════╪══════════════╡ │ 108.16 ┆ 104.06 ┆ 105.82 ┆ -3.94% ┆ -2.16% │ └────────────┴────────────┴────────────┴─────────────┴──────────────┘ ┌───────────┬───────────┬────────────┐ │ base std ┆ patch std ┆ revert std │ ╞═══════════╪═══════════╪════════════╡ │ 0.57 ┆ 0.49 ┆ 0.58 │ └───────────┴───────────┴────────────┘ Energy measured with energy counters: ┌────────────┬────────────┬────────────┬─────────────┬──────────────┐ │ base mean ┆ patch mean ┆revert mean ┆ ratio_patch ┆ ratio_revert │ ╞════════════╪════════════╪════════════╪═════════════╪══════════════╡ │ 141262.79 ┆ 130630.09 ┆ 134108.07 ┆ -7.52% ┆ -5.64% │ └────────────┴────────────┴────────────┴─────────────┴──────────────┘ ┌───────────┬───────────┬────────────┐ │ base std ┆ patch std ┆ revert std │ ╞═══════════╪═══════════╪════════════╡ │ 1347.13 ┆ 2431.67 ┆ 510.88 │ └───────────┴───────────┴────────────┘ Energy computed from util signals and energy model: ┌────────────┬────────────┬────────────┬─────────────┬──────────────┐ │ base mean ┆ patch mean ┆revert mean ┆ ratio_patch ┆ ratio_revert │ ╞════════════╪════════════╪════════════╪═════════════╪══════════════╡ │ 2.0539e12 ┆ 1.3569e12 ┆ 1.3637e+12 ┆ -33.93% ┆ -33.60% │ └────────────┴────────────┴────────────┴─────────────┴──────────────┘ ┌───────────┬───────────┬────────────┐ │ base std ┆ patch std ┆ revert std │ ╞═══════════╪═══════════╪════════════╡ │ 2.9206e10 ┆ 2.5434e10 ┆ 1.7106e+10 │ └───────────┴───────────┴────────────┘ OU ratio in % (ratio of time being overutilized over total time). The test lasts ~65s: ┌────────────┬────────────┬─────────────┐ │ base mean ┆ patch mean ┆ revert mean │ ╞════════════╪════════════╪═════════════╡ │ 63.39% ┆ 12.48% ┆ 12.28% │ └────────────┴────────────┴─────────────┘ ┌───────────┬───────────┬─────────────┐ │ base std ┆ patch std ┆ revert mean │ ╞═══════════╪═══════════╪═════════════╡ │ 0.97 ┆ 0.28 ┆ 0.88 │ └───────────┴───────────┴─────────────┘ The energy gain can be explained by the fact that the system is overutilized during most of the test with the base version. During the test, the base condition is evaluated to true ~40% of the time. The new condition is evaluated to true ~2% of the time. Preventing util_est signals to decay with the base condition has a significant impact on the overutilized state due to an overestimation of the resulting utilization of tasks. The score is impacted by the patch, but: - it is expected to have slightly lower scores with EAS running more often - the base version making the system run at higher frequencies by overestimating task utilization, it is expected to have higher scores ------------ Decrease util_est when the rq has some idle time instead of a delta between util and runnable signals. [1] https://lore.kernel.org/lkml/39cde23a-19d8-4e64-a1d2-f26bce264883@arm.com/ [2] commit 50181c0c ("sched/pelt: Avoid underestimation of task utilization") https://lore.kernel.org/lkml/20231122140119.472110-1-vincent.guittot@linaro.org/ [3] https://lore.kernel.org/lkml/CAKfTPtDd-HhF-YiNTtL9i5k0PfJbF819Yxu4YquzfXgwi7voyw@mail.gmail.com/#t Signed-off-by:Pierre Gondois <pierre.gondois@arm.com>
Loading
Please register or sign in to comment