Augustin Cavalier 2c588b031f kernel: Properly separate and handle THREAD_BLOCK_TYPE_USER.
Consider this scenario:
 * A userland thread puts its ID into some structure so that it
   can be woken up later, sets its wait_status to initiate the
   begin of the wait, and then calls _user_block_thread.
 * A second thread finishes whatever task the first thread
   intended to wait for, reads the ID almost immediately
   after it was written, and calls _user_unblock_thread.
 * _user_unblock_thread was called so soon that the first
   thread is not yet blocked on the _user_block_thread block,
   but is instead blocked on e.g. the thread's main mutex.
 * The first thread's thread_block() call returns B_OK.
   As in this example it was inside mutex_lock, it thinks
   that it now owns the mutex.
 * But it doesn't own the mutex, and so (until yesterday)
   all sorts of mayhem and then a random crash occurs, or
   (after yesterday) an assert-failure is tripped that
   the thread does not own the mutex it expected to.

The above scenario is not a hypothetical, but is in fact the
exact scenario behind the strange panics in #15211.

The solution is to only have _user_unblock_thread actually
unblock threads that were blocked by _user_block_thread,
so I've introduced a new BLOCK_TYPE to differentiate these.
While I'm at it, remove the BLOCK_TYPE_USER_BASE, which was
never used (and now never will be.) If we want to differentiate
different consumers of _user_block_thread for debugging
purposes, we should use the currently-unused "object"
argument to thread_block, instead of cluttering the
relatively-clean block type debugging code with special
types.

One final note: The race condition which was the case of
this bug does not, in fact, imply a deadlock on the part
of the rw_lock here. The wait_status is protected by the
thread's mutex, which is acquired by both _user_block_thread
and _user_unblock_thread, and so if _user_unblock_thread
succeeds faster than _user_block_thread can initiate
the block, it will just see that wait_status is already
<= 0 and return immediately.

Fixes #15211.
2019-08-05 22:31:02 -04:00
..
2019-02-19 18:33:25 +00:00