Browse Source

jobserver.py: _try_read()'s alarm timeout needs to throw an exception.

In python3, os.read() automatically retries after EINTR, which breaks
our ability to interrupt on SIGALRM.

Instead, throw an exception from the SIGALRM handler, which should work
on both python2 and python3.

This fixes a rare deadlock during parallel builds on python3.

For background:
https://www.python.org/dev/peps/pep-0475/#backward-compatibility

"Applications relying on the fact that system calls are interrupted
with InterruptedError will hang. The authors of this PEP don't think
that such applications exist [...]"

Well, apparently they were mistaken :)
main redo-0.42c
Avery Pennarun 8 months ago
parent
commit
670abbe305
1 changed files with 6 additions and 2 deletions
  1. +6
    -2
      redo/jobserver.py

+ 6
- 2
redo/jobserver.py View File

@ -142,8 +142,10 @@ def release_mine():
_release(1)
class TimeoutError(Exception): pass
def _timeout(sig, frame):
pass
raise TimeoutError()
# We make the pipes use the first available fd numbers starting at startfd.
@ -171,11 +173,13 @@ def _try_read(fd, n):
return None # try again
# ok, the socket is readable - but some other process might get there
# first. We have to set an alarm() in case our read() gets stuck.
oldh = signal.signal(signal.SIGALRM, _timeout)
try:
oldh = signal.signal(signal.SIGALRM, _timeout)
signal.setitimer(signal.ITIMER_REAL, 0.01, 0.01) # emergency fallback
try:
b = os.read(fd, 1)
except TimeoutError:
return None # try again
except OSError as e:
if e.errno in (errno.EAGAIN, errno.EINTR):
# interrupted or it was nonblocking


Loading…
Cancel
Save