epoll: autoremove wakers even more aggressively
If a process is killed or otherwise exits while having active network
connections and many threads waiting on epoll_wait, the threads will all
be woken immediately, but not removed from ep->wq. Then when network
traffic scans ep->wq in wake_up, every wakeup attempt will fail, and will
not remove the entries from the list.
This means that the cost of the wakeup attempt is far higher than usual,
does not decrease, and this also competes with the dying threads trying to
actually make progress and remove themselves from the wq.
Handle this by removing visited epoll wq entries unconditionally, rather
than only when the wakeup succeeds - the structure of ep_poll means that
the only potential loss is the timed_out->eavail heuristic, which now can
race and result in a redundant ep_send_events attempt. (But only when
incoming data and a timeout actually race, not on every timeout)
Shakeel added:
: We are seeing this issue in production with real workloads and it has
: caused hard lockups. Particularly network heavy workloads with a lot
: of threads in epoll_wait() can easily trigger this issue if they get
: killed (oom-killed in our case).
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ben Segall <[email protected]>
Tested-by: Shakeel Butt <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Roman Penyaev <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Khazhismel Kumykov <[email protected]>
Cc: Heiher <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
1 file changed