Friday, January 18, 2008

Simmering Unicode, bring DPL to a boil (Part 2)

As promised, I'll be covering the next group of methods on the TMonitor class.  Wait, Pulse, and PulseAll.  Before I do that, I want to quickly cover the spin-lock section of the Enter method from this post.  (UPDATE: For the record, the TInterlocked class does not exist yet in any shipping VCL.  You can simply replace these calls with the direct call to the Windows InterlockedXXXX functions).

It may seem counter-intuitive when you look at the loop in the first section of the Enter method.  Why "waste" time in a loop when you could just block and release this thread's time-slice for some other thread?  It has to do with the overhead and expense that goes along with blocking on a kernel object, returning to the OS thread scheduler and finally passing control to another thread.  This could take literally thousands of CPU cycles and by the time this has all happened, the lock may have been released and the "waiting" thread can have the lock.  Spin-locks are not for every case and if used improperly can actually do exactly what it looks like they do, waste cycles.  The spin-lock will work best in situations where you have more than one CPU core, and more than one thread vying for the same resource in a relatively tight loop. 

If a given core can just wait for a few cycles before doing the expensive and dreaded full-on kernel-level block, you can increase performance by drastically reducing user-level caused kernel-mode switches.  Yes, the OS pre-emptive scheduler is still going to pre-empt the thread at various points, but those are things you cannot control.  You can, however, control voluntarily giving up your time-slice.  Remember, having a non-zero spin-count only makes sense for a system with more than one CPU core.  In the case of systems with one CPU core, you must relinquish your time-slice in order to give the other "thread" a chance to actually complete its task.  For this reason, the TMonitor makes sure there is more than one CPU core before allowing the internal FSpinCount field to be set to anything other than 0.  Since the lock count is not a fixed value, that also means it is a tunable item.  It allows you to "tune" the value that works best given the specific code in which it is used.  The Windows critical section allows this value to be "tuned" in the same way using the InitializeCriticalSectionAndSpinCount API.

Some of the comments I've gotten from the previous post seem to me to indicate that some folks didn't read the whole post or simply missed a few key points.  The whole point about that post was not that I'm trying to make a better critical section than what is provided by the OS.  It was that until Windows Vista(and Win2008) you cannot use a critical section object very easily with other synchronization primitives.  There was also some comments about how we shouldn't be spending any time on traditional locking since it is not "innovative."  Not every little problem out there can (and should) be solved using things like continuation passing style, or other functional language-like constructs.  Ultimately, some form of locking is going to be needed.

Let's start with the Wait function.  This function's job is to atomically release the monitor lock and then block until some other thread calls Pulse or PulseAll.  So you can actually be deeply nested in Enter calls on the monitor and then call Wait and the lock is completely released and the thread will block.   This will allow another thread that may be blocked on the Enter call to gain the lock.  Before Wait returns it will have re-acquired the lock and have properly restored the recursion state.  Calling Wait implies a contract that some other thread must call Pulse or PulseAll before this thread can continue.  If that never happens and the Timeout is INFINITE, it will block forever.  Here's the Enter function:

function TMonitor.Wait(Timeout: Cardinal): Boolean;
RecursionCount: Integer;
WaitingThread: TWaitingThread;
WaitingThread.Next := nil;
WaitingThread.Thread := CheckOwningThread;
// This event should probably be cached someplace.
// Probably not on the instance since this is a per-thread-per-instance resource
WaitingThread.WaitEvent := CreateEvent(nil, False, False, nil);
// Save the current recursion count
RecursionCount := FRecursionCount;
// Add the current thread to the waiting queue
// Set it back to almost released
FRecursionCount := 0;
// This thread is no longer the owner either
FOwningThread := 0;
// Release and get in line for someone to do a Pulse or PulseAll
if TInterlocked.Add(Integer(FLockCount), -RecursionCount) >= 0 then
Result := SignalObjectAndWait(GetEvent, WaitingThread.WaitEvent, Timeout, False) = WAIT_OBJECT_0
Result := WaitForSingleObject(WaitingThread.WaitEvent, Timeout) = WAIT_OBJECT_0;
// Got to get the lock back and block waiting for it.
// Remove any dangling waiters from the list
// Lets restore all the recursion and lock counts
TInterlocked.Add(Integer(FLockCount), RecursionCount - 1);
FRecursionCount := RecursionCount;

(Note: Before someone chimes in and complains about the heavy handed CreateEvent/CloseHandle calls... please read the comments).

Since Wait can only be called while holding the lock, the CheckOwningThread call will raise an exception if the calling thread doesn't own the lock.  Now that we know we own the lock, we start the process of "unwinding" the lock.  We need to save off the recursion count since this has to be restored exactly.  The TWaitingThread type is merely a record and is strung together as a linked list among all the potential waiters by the call to QueueWaiter.  In fact, QueueWaiter doesn't need to do any "lock-free" style linked list machinations since it can only be called while the lock is held so no other threads can get there.  The queue is simply a FIFO (First-In-First-Out) in order to keep some level of fairness. Yes, I know, it is not totally fair if the waiting threads all have differing priorities.  In general, this Wait locking style is used among several equal priority worker threads, so that priority-inversions are less common.

Once we've added this thread to wait queue, we need to set about relinquishing the actual lock and prepare to block on the event created above.  This is done by explicitly setting the FRecursionCount and FOwningThread fields back to 0.  The next tricky part is to now remove all the recursion counts from the FLockCount field.  Since this field is touched from other concurrent threads, we need to interlock it.  There is no Interlocked subtract function so we simply add the twos-compliment negation of the the RecursionCount.  Just like in the Leave function we need to see if there are other threads currently waiting for the lock.  If the result of the subtraction of the recursion count from the lock still yields a value greater than or equal to 0, there are waiters.  In this case we need to use the nice atomic function SignalObjectAndWait.  This function will atomically signal the lock's event handle and then turn around and immediately block on the event handle we created within this function.  If there are no other threads waiting for the lock, there is no need to signal the lock event and a simple WaitForSingleObject is all that is needed.

Once the wait event is signaled by the Pulse or PulseAll method, or the wait times out, the first thing we need to do is reacquire the lock.  Yes, Virginia, this may also mean that this method will block, again!  That is the contract to the caller.  When Wait returns the calling thread has the lock.  Once we have the lock again, we need to do some house keeping and state restoration.  The first thing is to remove this thread from the wait queue.  This is really *only* necessary in the case where the wait above timed out.  Nobody dequeued the WatingThread record and set the event so it is still in the queue someplace.  RemoveWaiter does nothing if we were signaled.  Once I've proven to myself that this is the only case where the WaitingThread record could ever still be in the wait queue, I can add some code to check the result and only call RemoveWaiter if the result is false.

After we've cleaned up the potentially "dirty" queue, we now need to restore the nesting or recursion state.  Returning from the Enter call we know that the FLockCount field has our lock added to it and the FRecursionCount field will be 1.  We need to add back in our recursive locks, less the one Enter already gave us.  Because we hold the lock we can simply blast over the FRecursionCount field.

Like the Exit method, the Pulse and PulseAll methods are very simple:

procedure TMonitor.Pulse;
WaitingThread: PWaitingThread;
WaitingThread := DequeueWaiter;
if WaitingThread <> nil then

procedure TMonitor.PulseAll;
WaitingThread: PWaitingThread;
WaitingThread := DequeueWaiter;
while WaitingThread <> nil do
WaitingThread := DequeueWaiter;

There is the CheckOwningThread call again.  Pulse and PulseAll can only be called while holding the lock.  Since we are holding the lock, the DequeueWaiter function doesn't need any complicated locking code on the linked-list queue.  It simply removes an item from the queue and returns it.  Pulse will simply release the first thread in the queue if one exists.  An added check could be made to make sure there really are waiting threads.  PulseAll will loop and dequeue all waiting thread records and release them all.

So there we go.  A Win32 implementation of a monitor.  The TMonitor class is, right now, a proof-of-concept and will probably look much different should it make it into a future version of Delphi.  The current thinking is that the field variables should actually be somehow "attached" to any arbitrary object instance, thus enabling a whole object to be locked.  We're also looking at integrating this concept into the language itself through the use of some sort of LockObject standard function.  Lots of possibilities.  Stay tuned.


  1. Very interesting and exciting stuff.

    Just one thing... aiui TInterlocked is just a class with a bunch of static methods, i.e. a way of grouping together related "Interlocked" functions for no other reason than they are related - there is no "class" of object that is a TInterlocked, there's just a group of Interlocking functions.

    Yet done with something that looks, smells and tastes like an object.

    It's this sort of OO for the sake of it that drives me mad.

    We already have (in Delphi) a very neat means of grouping things that are related in this way - the UNIT. You can even use that unit name to qualify things if you need/want to.

    By using a class with nowt but static methods all we do is introduce unnecessary verbosity just to give our inner academic a nice warm fuzzy feeling.

    Borland used to have a "No Nonsense" license agreement.

    Delphi used to have a No Nonesense approach to sophisticated software development.

    It used to be Java and C++ (and latterly C#) that were lumbered with high faluting language foibles appreciated only only by the highest of high-brow, self aggrandising coders, with all the baggage and confusion that came with it.

    There is increasingly less reason to choose Delphi over those other languages.


  2. I disagree with Jolyon. The unit is merely a project building block with name space features. If a group of procedures/functions work together logically then they either belong to a class or record. Having loose routines belongs to the "structured programming" era.

    If you shake the code vigorously you can here them rattling around. :-)

  3. The Interlocked functions do NOT "work together". Far from it in fact - if they worked together in any way shape or form then they would not do the job that they do. They just all happen to be "Interlocked" operations.

    One might just as well have a class THandle to "group" all related "Handle" functions (but that doesn't itself represent a specific handle).

    THandle.GetDC( hWnd )
    THandle.CreatePen( ... )
    THandle.Close( aHandle )

  4. Oh and as for the final point - they are STILL" loose routines", they are just loose routines neatly painted with a thin (VERY thin) OO veneer in order to give them an air of respectability in an era when "if it aint an object it's bad" is rapidly becoming the dogma of "correct" coding.

    dOOgma, if you like.

    Shake the code - you'll still hear them rattle, except it'll be more of a clanking sound since things are coalesced into bigger chunks of looseness.


    (Look ma! - I didn't pollute the river because I put the chemical waste in a paper bag before dumping it! Aren't I clever?)

  5. Jolyon,

    I completely understand where you're coming from. I, in spirit, agree with you. Sometimes these forms of "encapsulation" are more of a "documentation encapsulation " and a logical grouping, over and above what you can get from the unit-level alone. I absolutely do not want to become dogmatic about these and if it doesn't make sense, why bother?


  6. If the generated code for the compiler is the same, then this class is not a big deal. Other than that, I guess it's not needed as all interlocked functions starts with "Interlocked" and whenever you use code completion, they will be the first to show when entering "Interlocked".

  7. Atle,

    There are many overloaded versions of the various functions on the TInterlocked class. Many are also inlined, so they'll simply get folded into the code stream. Finally, some of the functions are implemented in direct assembly because there are no OS supplied versions that will, say, atomically swap a Double floating point or Int64 value. These functions implement it using the LOCK CMPXCHG8B instruction.



Please keep your comments related to the post on which you are commenting. No spam, personal attacks, or general nastiness. I will be watching and will delete comments I find irrelevant, offensive and unnecessary.