Skip to content
Commit ee9a4e92 authored by Andrea Righi's avatar Andrea Righi Committed by Tejun Heo
Browse files

sched_ext: idle: Properly handle invalid prev_cpu during idle selection



The default idle selection policy doesn't properly handle the case where
@prev_cpu is not part of the task's allowed CPUs.

In this situation, it may return an idle CPU that is not usable by the
task, breaking the assumption that the returned CPU must always be
within the allowed cpumask, causing inefficiencies or even stalls in
certain cases.

This issue can arise in the following cases:

 - The task's affinity may have changed by the time the function is
   invoked, especially now that the idle selection logic can be used
   from multiple contexts (i.e., BPF test_run call).

 - The BPF scheduler may provide a @prev_cpu that is not part of the
   allowed mask, either unintentionally or as a placement hint. In fact
   @prev_cpu may not necessarily refer to the CPU the task last ran on,
   but it can also be considered as a target CPU that the scheduler
   wishes to use for the task.

Therefore, enforce the right behavior by always checking whether
@prev_cpu is in the allowed mask, when using scx_bpf_select_cpu_and(),
and it's also usable by the task (@p->cpus_ptr). If it is not, try to
find a valid CPU nearby @prev_cpu, following the usual locality-aware
fallback path (SMT, LLC, node, allowed CPUs).

This ensures the returned CPU is always allowed, improving robustness to
affinity changes and invalid scheduler hints, while preserving locality
as much as possible.

Fixes: a730e3f7 ("sched_ext: idle: Consolidate default idle CPU selection kfuncs")
Signed-off-by: default avatarAndrea Righi <arighi@nvidia.com>
Signed-off-by: default avatarTejun Heo <tj@kernel.org>
parent 0f70f5b0
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment