Friday, February 22, 2008

Thread pools Task handling.

The Delphi Parallel Library (DPL) tends to occupy a lot of my otherwise idle "thought" time. DPL has been redesigned in my head more times than I can count.  The good news is that it is conceptually beginning to actually take some shape and is finding a direction.  As I've stated before, a lot of the concepts behind the DPL are taken from Intel's TBB and Microsoft's TPL.

When I started down this road, it all began as an investigation into various ways of implementing a thread pool.  It turns out that this simplistic mechanism really doesn't truly address parallelism and concurrent processing.  It is merely a way of shoveling off a task into another thread, which may or may not execute concurrently.  A thread pool doesn't do anything to "load-balance" the work in a constructive manner, nor does it purport to.  That is left as an exercise for the user.

While reading and researching all the various approaches to handling parallelism, a few light bulbs started coming on.  I've been approaching this whole thing from the wrong angle.  I'd been trying to shoehorn task handling (a task being the smallest unit of work that can be made parallel) concepts into the more general notion of a thread pool.  I admit that sometimes I'm not the sharpest knife in the drawer, but I can recognize when one approach is superior to another.

Rather than starting with the concept of thread pools, a parallel library should really begin at the most basic level, the task or just a unit of work.  A task is merely some chunk of code that may or may not execute concurrently.  Another thing to remember is that a parallel library is about getting these tasks done as fast as absolutely possible.  It is also about keeping as many of the CPU cores as busy as possible with as little idle time and overhead.  It is not about fairness. It is about speed.

To best do this, the internal scheduling and management of tasks should control how they are scheduled and processed on the CPU cores.  Remember, we're talking about the actual real, hardware CPU core (or virtual core, like Hyperthreading).  Not a thread.  A thread is merely a way to easily segregate tasks among the cores.  We'll rely on the operating system to make sure that of the available cores, then there are equally as many "threads" actually burning CPU cycles for real work.  Any other threads are stuck waiting for their chance at a core.  Because of this, a task manager should endeavor to keep the number of running threads as close to equal to the number of available cores.  It should also equally handle 1 or 100 cores by making sure as many cores are a busy as possible.

Using a task based approach also requires that the threading and scheduling is completely (or nearly completely) hidden.  The user should never think of threads.  The user only deals with Tasks and the extra abstractions built on top of them.  So it looks like DPL is headed for another rewrite, which is good because each time I do this it is only getting better. I've not spent a whole lot of time on the current iteration as it stands, so this isn't much of a setback. 

Microsoft's TPL is one of the few things to come out in recent years that looks great and performs well.  The concepts behind it are very well suited for Delphi.  To get an idea of some of the things you can do with it (and eventually the DPL ;-) you really should watch this video.  The really interesting bits come on toward the last 1/2 to 1/3 of the video where they go into the idea behind work stealing and work load balancing.  Really, really interesting stuff.  Joe Duffy also addresses how you need to think about parallelism and how it applies to your situation.

As part of this, I just ordered Joe's soon to be released book on Concurrent Programming on Windows Vista.  You can actually buy the book online ahead of the actual print release and download a PDF of the work in progress until then.  From reading only a few of the online excerpts, it looks to be one of those "must have" tomes for anyone doing any kind of parallel programming in native Windows and in .NET.  Even though a lot of the information I know, it is good to have it codified in a consistent organized manner.  As I stated in my first post about thread pools, as I dig deeper into this whole subject, I want to make sure that Delphi developers at all levels will find some value in such a library.  The best thing you can do at this point is to learn as much about these concepts as you can.  Hopefully this blog and others I've been referencing and will reference in the future will be an excellent source of information.  I'm finding that as I write about this, it give me a chance to review my assumptions and design ideas with a critical eye.  Hopefully, we can all learn something along the way as well.  I know I have so far.

Tuesday, February 19, 2008

Lock my Object... Please!

Here's a quick recap of all the DPL (Delphi Parallel Library) related posts over the last few months:

A Critical[Section] Difference: Windows XP vs. Windows Vista
Breaking the rules
Simmering Unicode, bring DPL to a boil (Part 2)
Simmering Unicode, bring DPL to a boil
Placing your code in the forge - Refining a technique
When code lies - A better solution
Stupid Enumerator Tricks - And now for something completely different
Magical Assembler Incantations - Nested functions and anonymous methods
The Life and Times of a Thread Pool
I cut and I cut and it was still too short!
Spot the deadlock
Wading in the shallow end of the pool - Thread Pools

As another follow on to my discussion of the TMonitor (see the above "Bring DPL to a boil" articles) things have changed a little. The TMonitor class is no longer a separate class that you can create and use. It is now tied directly to any TObject instance or derivative (which means any instance of a Delphi class).  There is still the notion of a TMonitor but only to serve as a way to tie together the "monitor" functionality. Rather than creating one and setting about to locking/unlocking it, you now only need an object instance. For Tiburón, you merely need to call System.TMonitor.Enter(<obj>); or System.TMonitor.Exit(<obj>); among the other related methods.

I've contemplated putting these methods directly on TObject itself, but there is the issue with calling these methods when a descendant class has a method of the same name. The code will compile and function normally due to Delphi's scoping rules, but it could make it harder for users to call those methods in these cases. Thread synchronization should never be done lightly and should require some forethought and planning. A simple rule of thumb is that you should never call unknown code (or code that you're not entirely sure where it goes and what it does) while holding a lock. 

This now paves the way to add some interesting intrinsic functionality such as using this as the basis for "synchronized" methods or even synchronized code blocks similar to the C# lock keyword. While you've always been able to create an OS critical section, or use some of the helper classes in SyncObjs.pas, this new mechanism will be available on any object.

I merely design it then write about it... you get to kvetch :-)

Wednesday, February 6, 2008

A Critical[Section] Difference: Windows XP vs. Windows Vista

No, this isn't one of those comparisons!  This is just something somewhat interesting about the difference in the implementation of a critical section in Windows XP vs. Windows Vista.  It seems that Windows Vista is much more resilient in how it handle's the misuse of a critical section.  One such degenerate, blatantly obvious case of misuse is doing the following:

LeaveCriticalSection(CSec);
EnterCriticalSection(CSec);

Yes this is wrong on so many levels, but the interesting thing to note is that under Windows XP, the above code hangs at the EnterCriticalSection call because the LeaveCriticalSection call did some very bad damage to the critical section structure.  The problem is that the RecursionCount field of the critical section is decremented and left as non-zero (-1) which causes the LockCount field to also be decremented in order to keep the two field counts in sync.  When the EnterCriticalSection call is made, the LockCount field is incremented, but it is still non-zero which makes it think there is another owner.  The OwningThread Field is compared with the calling thread's ID which being 0, is not equal to the calling thread.  But no other thread owns the lock!  It blithely forges ahead and allocates the wait event and proceeds to block.  Oops!


Windows Vista, in contrast, has changed how it manages the LockCount field.  Rather than only accumulating the waiters (and recursions) in terms of simply incrementing (or decremented) the LockCount field, it uses the low-bit as the actual indication of holding the lock and accumulates the waiter and recursion counts in the remaining bits.  This means that the critical section can now only have 2^31 recursions and waiters. (UPDATE: Upon further examination, only the waiters are indicated in the LockCount field)  Still plenty of space.  The upshot of this is that the above degenerate case no longer will cause a complete hang in many cases where it would have.  Why they made the change is anybody's guess;  Maybe Raymond Chen can pipe in and explain the reasons...  Seems to me that anyone with doing the above deserves to be put into the penalty box.  Now all that is going to happen is that when a program that once used to hang, will now appear to work only to probably show up some other flaw downstream.  Seems like a hang/crash now or hang/crash later kind of thing.  The end result is still the same.

Friday, February 1, 2008

ODPC - One Delphi Per Child

Ok, I just made that up.  But that is the crux of the 1 million seat licensing deal just closed with the Russian Federal Agency of Education.  Infoworld also has an article about the deal right here.  A whole new generation of Delphi developers are now in the making!