sched/fair: Decay task util_avg during migration
Before being migrated to a new CPU, a task sees its util_avg value synchronized with the runqueue clock. Once done, that same task will also have its sched_avg last_update_time reset. This means the time between the migration and the last clock update (B) will not be accounted for in util_avg and a discontinuity will appear. This issue is amplified by the PELT clock scaling. If the clock hasn't been updated while the CPU is idle, clock_pelt will not be aligned with clock_task and that time (A) will be also lost. ---------|----- A -----|-----------|------- B -----|> clock_pelt clock_task clock now This is especially problematic for asymmetric CPU capacity systems which need stable util_avg signals for task placement and energy estimation. Ideally, this problem would be solved by updating the runqueue clocks before the migration. But that would require taking the runqueue lock which is quite expensive [1]. Instead estimate the missing time and update the task util_avg with that value: A + B = clock_task - clock_pelt + sched_clock_cpu() - clock Neither clock_task, clock_pelt nor clock can be accessed without the runqueue lock. The new runqueue clock_pelt_lag is therefore created and encode those three values. clock_pelt_lag = clock - clock_task + clock_pelt And we can then write the missing time as follow: A + B = sched_clock_cpu() - clock_pelt_lag The B. part of the missing time is however an estimation that doesn't take into account IRQ and Paravirt time. The end result is then a function to estimate what would be the PELT clock if it was updated now: rq_clock_pelt_estimator() = last_update_time + A + B = last_update_time + sched_clock_cpu() - clock_pelt_lag [1] https://lore.kernel.org/all/20190709115759.10451-1-chris.redpath@arm.com/ Signed-off-by:Vincent Donnefort <vincent.donnefort@arm.com>
Loading
Please register or sign in to comment