Wednesday, February 6, 2008

A Critical[Section] Difference: Windows XP vs. Windows Vista

No, this isn't one of those comparisons!  This is just something somewhat interesting about the difference in the implementation of a critical section in Windows XP vs. Windows Vista.  It seems that Windows Vista is much more resilient in how it handle's the misuse of a critical section.  One such degenerate, blatantly obvious case of misuse is doing the following:

LeaveCriticalSection(CSec);
EnterCriticalSection(CSec);

Yes this is wrong on so many levels, but the interesting thing to note is that under Windows XP, the above code hangs at the EnterCriticalSection call because the LeaveCriticalSection call did some very bad damage to the critical section structure.  The problem is that the RecursionCount field of the critical section is decremented and left as non-zero (-1) which causes the LockCount field to also be decremented in order to keep the two field counts in sync.  When the EnterCriticalSection call is made, the LockCount field is incremented, but it is still non-zero which makes it think there is another owner.  The OwningThread Field is compared with the calling thread's ID which being 0, is not equal to the calling thread.  But no other thread owns the lock!  It blithely forges ahead and allocates the wait event and proceeds to block.  Oops!


Windows Vista, in contrast, has changed how it manages the LockCount field.  Rather than only accumulating the waiters (and recursions) in terms of simply incrementing (or decremented) the LockCount field, it uses the low-bit as the actual indication of holding the lock and accumulates the waiter and recursion counts in the remaining bits.  This means that the critical section can now only have 2^31 recursions and waiters. (UPDATE: Upon further examination, only the waiters are indicated in the LockCount field)  Still plenty of space.  The upshot of this is that the above degenerate case no longer will cause a complete hang in many cases where it would have.  Why they made the change is anybody's guess;  Maybe Raymond Chen can pipe in and explain the reasons...  Seems to me that anyone with doing the above deserves to be put into the penalty box.  Now all that is going to happen is that when a program that once used to hang, will now appear to work only to probably show up some other flaw downstream.  Seems like a hang/crash now or hang/crash later kind of thing.  The end result is still the same.