Skip to content
Unverified Commit 8c44dac8 authored by Nam Cao's avatar Nam Cao Committed by Christian Brauner
Browse files

eventpoll: Fix priority inversion problem

The ready event list of an epoll object is protected by read-write
semaphore:

  - The consumer (waiter) acquires the write lock and takes items.
  - the producer (waker) takes the read lock and adds items.

The point of this design is enabling epoll to scale well with large number
of producers, as multiple producers can hold the read lock at the same
time.

Unfortunately, this implementation may cause scheduling priority inversion
problem. Suppose the consumer has higher scheduling priority than the
producer. The consumer needs to acquire the write lock, but may be blocked
by the producer holding the read lock. Since read-write semaphore does not
support priority-boosting for the readers (even with CONFIG_PREEMPT_RT=y),
we have a case of priority inversion: a higher priority consumer is blocked
by a lower priority producer. This problem was reported in [1].

Furthermore, this could also cause stall problem, as described in [2].

To fix this problem, make the event list half-lockless:

  - The consumer acquires a mutex (ep->mtx) and takes items.
  - The producer locklessly adds items to the list.

Performance is not the main goal of this patch, but as the producer now can
add items without waiting for consumer to release the lock, performance
improvement is observed using the stress test from
https://github.com/rouming/test-tools/blob/master/stress-epoll.c

. This is
the same test that justified using read-write semaphore in the past.

Testing using 12 x86_64 CPUs:

          Before     After        Diff
threads  events/ms  events/ms
      8       6932      19753    +185%
     16       7820      27923    +257%
     32       7648      35164    +360%
     64       9677      37780    +290%
    128      11166      38174    +242%

Testing using 1 riscv64 CPU (averaged over 10 runs, as the numbers are
noisy):

          Before     After        Diff
threads  events/ms  events/ms
      1         73        129     +77%
      2        151        216     +43%
      4        216        364     +69%
      8        234        382     +63%
     16        251        392     +56%

Reported-by: default avatarFrederic Weisbecker <frederic@kernel.org>
Closes: https://lore.kernel.org/linux-rt-users/20210825132754.GA895675@lothringen/

 [1]
Reported-by: default avatarValentin Schneider <vschneid@redhat.com>
Closes: https://lore.kernel.org/linux-rt-users/xhsmhttqvnall.mognet@vschneid.remote.csb/

 [2]
Signed-off-by: default avatarNam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/20250527090836.1290532-1-namcao@linutronix.de


Tested-by: default avatarK Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: default avatarFrederic Weisbecker <frederic@kernel.org>
Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
parent dbee298c
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment