Skip to content
  • deuce's avatar
    Second attempt to fix the NPTL problem (input thread hanging) · 03ba75e6
    deuce authored
    So, looking into this issue has shown me a number of things...
    1) THE CAUSE FOR SIGWAIT FAILURE!!!
       The new Linux pthread model sends a "real-time signal" SIG33 to every
       thread when the saved, real, or effective user or group ID changes which
       then allows every thread to synchronize their IDs.  You cannot prevent
       this signal, and it causes the current syscall to fail.
    2) Synchronet calls do_seteuid() quite often during startup and, when thread
       setXid is broken, every time a thread starts up.  As a result, you can
       get a storm of SIG33s.  Further, calling setXid() when not all threads
       have processed their SIG33 results in a deadlock apparently.
    3) There is no apparent way of telling if all other threads have processed
       their SIG33s yet.
    
    As a result, I've added a 10ms pause after a setXid() call to give it time
    to work.  Since this happens while the mutex is held, it should work, but
    10ms may be too large or too small or not effective.
    
    This will add 10ms to EVERY threads startup time which will negatively
    effect just about everything.  I'll be looking into the possibility that
    setXid() is no longer actually broken on Linux and so we can reduce this
    issue to only happening while Synchronet is still starting up, rather than
    ongoing for every thread.
    
    Thanks to everyone for their patience on the sigwait() failure issue.
    Hopefully we'll be able to actually resolve this relatively soon.
    
    NOTE: There will still be the sigwait failures during startup... the hope
    is that we can prevent them from being an ongoing issue.
    03ba75e6