Skip to content
Commit bc4394e5 authored by Kan Liang's avatar Kan Liang Committed by Peter Zijlstra
Browse files

perf: Fix the throttle error of some clock events

Both ARM and IBM CI reports RCU stall, which can be reproduced by the
below perf command.
  perf record -a -e cpu-clock -- sleep 2

The issue is introduced by the generic throttle patch set, which
unconditionally invoke the event_stop() when throttle is triggered.

The cpu-clock and task-clock are two special SW events, which rely on
the hrtimer. The throttle is invoked in the hrtimer handler. The
event_stop()->hrtimer_cancel() waits for the handler to finish, which is
a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
be used to stop the timer.

There may be two ways to fix it:
 - Introduce a PMU flag to track the case. Avoid the event_stop in
   perf_event_throttle() if the flag is detected.
   It has been implemented in the
   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
   The new flag was thought to be an overkill for the issue.
 - Add a check in the event_stop. Return immediately if the throttle is
   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
   method to stop the timer.

The latter is implemented here.

Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
the order the same as perf_event_unthrottle(). Except the patch, no one
checks the hw.interrupts in the stop(). There is no impact from the
order change.

When stops in the throttle, the event should not be updated,
stop(event, 0). But the cpu_clock_event_stop() doesn't handle the flag.
In logic, it's wrong. But it didn't bring any problems with the old
code, because the stop() was not invoked when handling the throttle.
Checking the flag before updating the event.

Fixes: 9734e25f ("perf: Fix the throttle logic for a group")
Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
Closes: https://lore.kernel.org/lkml/4ce106d0-950c-aadc-0b6a-f0215cd39913@maine.edu/


Reported-by: Leo Yan's avatarLeo Yan <leo.yan@arm.com>
Reported-by: Aishwarya TCV's avatarAishwarya TCV <aishwarya.tcv@arm.com>
Reported-by: default avatarAlexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-by: default avatarVenkat Rao Bagalkote <venkat88@linux.ibm.com>
Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ian Rogers's avatarIan Rogers <irogers@google.com>
Link: https://lkml.kernel.org/r/20250606192546.915765-1-kan.liang@linux.intel.com
parent 49b393af
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment