rcu: Fix rcu_read_unlock() deadloop due to IRQ work
During rcu_read_unlock_special(), if this happens during irq_exit(), we can lockup if an IPI is issued. This is because the IPI itself triggers the irq_exit() path causing a recursive lock up. This is precisely what Xiongfeng found when invoking a BPF program on the trace_tick_stop() tracepoint As shown in the trace below. Fix by managing the irq_work state correctly. irq_exit() __irq_exit_rcu() /* in_hardirq() returns false after this */ preempt_count_sub(HARDIRQ_OFFSET) tick_irq_exit() tick_nohz_irq_exit() tick_nohz_stop_sched_tick() trace_tick_stop() /* a bpf prog is hooked on this trace point */ __bpf_trace_tick_stop() bpf_trace_run2() rcu_read_unlock_special() /* will send a IPI to itself */ irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu); A simple reproducer can also be obtained by doing the following in tick_irq_exit(). It will hang on boot without the patch: static inline void tick_irq_exit(void) { + rcu_read_lock(); + WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true); + rcu_read_unlock(); + Reported-by:Xiongfeng Wang <wangxiongfeng2@huawei.com> Closes: https://lore.kernel.org/all/9acd5f9f-6732-7701-6880-4b51190aa070@huawei.com/ Tested-by:
Qi Xi <xiqi2@huawei.com> Signed-off-by:
Joel Fernandes <joelagnelf@nvidia.com> Reviewed-by:
"Paul E. McKenney" <paulmck@kernel.org> Reported-by:
Linux Kernel Functional Testing <lkft@linaro.org> [neeraj: Apply Frederic's suggested fix for PREEMPT_RT] Signed-off-by:
Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
Loading
Please register or sign in to comment