Friday, January 18, 2008

Simmering Unicode, bring DPL to a boil (Part 2)

As promised, I'll be covering the next group of methods on the TMonitor class.  Wait, Pulse, and PulseAll.  Before I do that, I want to quickly cover the spin-lock section of the Enter method from this post.  (UPDATE: For the record, the TInterlocked class does not exist yet in any shipping VCL.  You can simply replace these calls with the direct call to the Windows InterlockedXXXX functions).

It may seem counter-intuitive when you look at the loop in the first section of the Enter method.  Why "waste" time in a loop when you could just block and release this thread's time-slice for some other thread?  It has to do with the overhead and expense that goes along with blocking on a kernel object, returning to the OS thread scheduler and finally passing control to another thread.  This could take literally thousands of CPU cycles and by the time this has all happened, the lock may have been released and the "waiting" thread can have the lock.  Spin-locks are not for every case and if used improperly can actually do exactly what it looks like they do, waste cycles.  The spin-lock will work best in situations where you have more than one CPU core, and more than one thread vying for the same resource in a relatively tight loop. 

If a given core can just wait for a few cycles before doing the expensive and dreaded full-on kernel-level block, you can increase performance by drastically reducing user-level caused kernel-mode switches.  Yes, the OS pre-emptive scheduler is still going to pre-empt the thread at various points, but those are things you cannot control.  You can, however, control voluntarily giving up your time-slice.  Remember, having a non-zero spin-count only makes sense for a system with more than one CPU core.  In the case of systems with one CPU core, you must relinquish your time-slice in order to give the other "thread" a chance to actually complete its task.  For this reason, the TMonitor makes sure there is more than one CPU core before allowing the internal FSpinCount field to be set to anything other than 0.  Since the lock count is not a fixed value, that also means it is a tunable item.  It allows you to "tune" the value that works best given the specific code in which it is used.  The Windows critical section allows this value to be "tuned" in the same way using the InitializeCriticalSectionAndSpinCount API.

Some of the comments I've gotten from the previous post seem to me to indicate that some folks didn't read the whole post or simply missed a few key points.  The whole point about that post was not that I'm trying to make a better critical section than what is provided by the OS.  It was that until Windows Vista(and Win2008) you cannot use a critical section object very easily with other synchronization primitives.  There was also some comments about how we shouldn't be spending any time on traditional locking since it is not "innovative."  Not every little problem out there can (and should) be solved using things like continuation passing style, or other functional language-like constructs.  Ultimately, some form of locking is going to be needed.

Let's start with the Wait function.  This function's job is to atomically release the monitor lock and then block until some other thread calls Pulse or PulseAll.  So you can actually be deeply nested in Enter calls on the monitor and then call Wait and the lock is completely released and the thread will block.   This will allow another thread that may be blocked on the Enter call to gain the lock.  Before Wait returns it will have re-acquired the lock and have properly restored the recursion state.  Calling Wait implies a contract that some other thread must call Pulse or PulseAll before this thread can continue.  If that never happens and the Timeout is INFINITE, it will block forever.  Here's the Enter function:

function TMonitor.Wait(Timeout: Cardinal): Boolean;
var
RecursionCount: Integer;
WaitingThread: TWaitingThread;
begin
WaitingThread.Next := nil;
WaitingThread.Thread := CheckOwningThread;
// This event should probably be cached someplace.
// Probably not on the instance since this is a per-thread-per-instance resource
WaitingThread.WaitEvent := CreateEvent(nil, False, False, nil);
try
// Save the current recursion count
RecursionCount := FRecursionCount;
// Add the current thread to the waiting queue
QueueWaiter(WaitingThread);
// Set it back to almost released
FRecursionCount := 0;
// This thread is no longer the owner either
FOwningThread := 0;
// Release and get in line for someone to do a Pulse or PulseAll
if TInterlocked.Add(Integer(FLockCount), -RecursionCount) >= 0 then
Result := SignalObjectAndWait(GetEvent, WaitingThread.WaitEvent, Timeout, False) = WAIT_OBJECT_0
else
Result := WaitForSingleObject(WaitingThread.WaitEvent, Timeout) = WAIT_OBJECT_0;
// Got to get the lock back and block waiting for it.
Enter;
// Remove any dangling waiters from the list
RemoveWaiter(WaitingThread);
// Lets restore all the recursion and lock counts
TInterlocked.Add(Integer(FLockCount), RecursionCount - 1);
FRecursionCount := RecursionCount;
finally
CloseHandle(WaitingThread.WaitEvent);
end;
end;

(Note: Before someone chimes in and complains about the heavy handed CreateEvent/CloseHandle calls... please read the comments).


Since Wait can only be called while holding the lock, the CheckOwningThread call will raise an exception if the calling thread doesn't own the lock.  Now that we know we own the lock, we start the process of "unwinding" the lock.  We need to save off the recursion count since this has to be restored exactly.  The TWaitingThread type is merely a record and is strung together as a linked list among all the potential waiters by the call to QueueWaiter.  In fact, QueueWaiter doesn't need to do any "lock-free" style linked list machinations since it can only be called while the lock is held so no other threads can get there.  The queue is simply a FIFO (First-In-First-Out) in order to keep some level of fairness. Yes, I know, it is not totally fair if the waiting threads all have differing priorities.  In general, this Wait locking style is used among several equal priority worker threads, so that priority-inversions are less common.


Once we've added this thread to wait queue, we need to set about relinquishing the actual lock and prepare to block on the event created above.  This is done by explicitly setting the FRecursionCount and FOwningThread fields back to 0.  The next tricky part is to now remove all the recursion counts from the FLockCount field.  Since this field is touched from other concurrent threads, we need to interlock it.  There is no Interlocked subtract function so we simply add the twos-compliment negation of the the RecursionCount.  Just like in the Leave function we need to see if there are other threads currently waiting for the lock.  If the result of the subtraction of the recursion count from the lock still yields a value greater than or equal to 0, there are waiters.  In this case we need to use the nice atomic function SignalObjectAndWait.  This function will atomically signal the lock's event handle and then turn around and immediately block on the event handle we created within this function.  If there are no other threads waiting for the lock, there is no need to signal the lock event and a simple WaitForSingleObject is all that is needed.


Once the wait event is signaled by the Pulse or PulseAll method, or the wait times out, the first thing we need to do is reacquire the lock.  Yes, Virginia, this may also mean that this method will block, again!  That is the contract to the caller.  When Wait returns the calling thread has the lock.  Once we have the lock again, we need to do some house keeping and state restoration.  The first thing is to remove this thread from the wait queue.  This is really *only* necessary in the case where the wait above timed out.  Nobody dequeued the WatingThread record and set the event so it is still in the queue someplace.  RemoveWaiter does nothing if we were signaled.  Once I've proven to myself that this is the only case where the WaitingThread record could ever still be in the wait queue, I can add some code to check the result and only call RemoveWaiter if the result is false.


After we've cleaned up the potentially "dirty" queue, we now need to restore the nesting or recursion state.  Returning from the Enter call we know that the FLockCount field has our lock added to it and the FRecursionCount field will be 1.  We need to add back in our recursive locks, less the one Enter already gave us.  Because we hold the lock we can simply blast over the FRecursionCount field.


Like the Exit method, the Pulse and PulseAll methods are very simple:


procedure TMonitor.Pulse;
var
WaitingThread: PWaitingThread;
begin
CheckOwningThread;
WaitingThread := DequeueWaiter;
if WaitingThread <> nil then
SetEvent(WaitingThread.WaitEvent);
end;

procedure TMonitor.PulseAll;
var
WaitingThread: PWaitingThread;
begin
CheckOwningThread;
WaitingThread := DequeueWaiter;
while WaitingThread <> nil do
begin
SetEvent(WaitingThread.WaitEvent);
WaitingThread := DequeueWaiter;
end;
end;

There is the CheckOwningThread call again.  Pulse and PulseAll can only be called while holding the lock.  Since we are holding the lock, the DequeueWaiter function doesn't need any complicated locking code on the linked-list queue.  It simply removes an item from the queue and returns it.  Pulse will simply release the first thread in the queue if one exists.  An added check could be made to make sure there really are waiting threads.  PulseAll will loop and dequeue all waiting thread records and release them all.


So there we go.  A Win32 implementation of a monitor.  The TMonitor class is, right now, a proof-of-concept and will probably look much different should it make it into a future version of Delphi.  The current thinking is that the field variables should actually be somehow "attached" to any arbitrary object instance, thus enabling a whole object to be locked.  We're also looking at integrating this concept into the language itself through the use of some sort of LockObject standard function.  Lots of possibilities.  Stay tuned.