Friday, December 21, 2007

The Unicode Soft Hyphen vs. the Latin 1 Soft Hyphen (Cage Death Match)

As I delve further into the move to Unicode support in Tiburón (the next RAD Studio release) and pouring over the Unicode standards documents and what Windows thinks about Unicode I've stumbled across and interesting controversy.   A quick Google search turned up this interesting summary of the controversy.  I went looking for information about this single code-point, U+00AD, because I clearly noticed a discrepancy between how the Microsoft .NET frameworks System.Char.GetUnicodeCategory() function categorizes this character and what the actual Unicode.org database says.  Microsoft and Latin 1 (ISO-8859-1) insists that this is a visible character, while the Unicode.org standard says it is not.  I'm guessing that MS based their code on pre 4.0 versions of the Unicode standard.

Here is a rebuttal to the above article

"That text is unfortunately too easy to misread (and overinterpret!)."

Now folks are parsing words!  Uh... excuse me??!  Standards text should be more clear than that. Heck, even the Unicode.org had "misread" or "overinterpreted" it for the first three major revisions of the standard.  Here's the relevant text from the ISO-8859-1 standard:

5.3.3 SOFT HYPHEN (SHY)
A graphic character that is imaged by a graphic symbol identical with, or similar to, that representing HYPHEN, for use when a line break has been established within a word. [emphasis mine]

Ok.. when I stand on my head and when the phase of the moon is just right (that's called sarcasm, folks), I can see a vague reference to it not be an rendered character except in specific circumstances...  Had they really meant this character to not be considered graphically rendered, maybe they should have adopted text similar to the non-breaking space:

5.3.2 NO-BREAK SPACE (NBSP)
A graphic character the visual representation of which consists of the absence of a graphic symbol, for use when a line break is to be prevented in the text as presented. [emphasis mine]

I'm going to be blunt here.  This is stupid.  Latin 1, ISO-8859-1, EBCDIC etc... all pre-date the establishment of the Unicode standard (which was established in 1991).  The first 256 code points in the Unicode standard purport to be equivalent to ISO-8859-1.  In this case, I'm inclined to just special case that one character and change it's Unicode classification from "Cf" (Control, Format) to "Pd" (Punctuation, Dash).  This would make it consistent with how MS Office regards this character and how it is classified in the .NET Framework.  Open Notepad, or Word and type Alt+0173 on the numpad.  You'll see a hyphen show up.  Save the document in Notepad as Unicode, then dump it as hex and you'll get something like this:

000000: FF FE AD 00 0D 00 0A 00  00 00 00 00 00 00 00 00 ................

There's the little-endian Byte Order Mark (BOM) as the first two bytes, then the next two bytes are the U+00AD Soft Hyphen character.  Maybe I'm imagining things...  But I'm sure I see that character on the display.

Who would have thought that one code point in the whole of the Unicode standard which covers many thousands of code points would be so polarizing?  Maybe this is all hyperbole on my part, but the first article linked above really goes out of its way to try and smack down the Unicode.org once and for all.  Also, the rebuttal has an air of indignant arrogance and throughout the text just rudely brushes off (and rewords) the various assertions which from my point of view seemed reasonable.  I've not found a well reasoned explanation as to why the Unicode.org decided to make the change.  Even something like "Oops!  We screwed the pooch on that one!  Sorry folks!"  Maybe they could have taken "common usage" (even though it may be incorrect usage) into account and simply introduced a new code point to do what they wanted (It's not like they're going to run out of code points anytime soon :)

I guess that's the beauty of standards;  There are so many to choose from :).  Oh, and there are all those folks that are going to immediately jump on this and start bashing MS as being hostile toward standards or simply ignoring them.  Yeah, that's right folks there's a whole committee at MS called "How can we mess with the standards bodies?".  I'm simply not agreeing with MS because they're MS, I just looked at the history, facts (as I've been able to discover them) and drew my own conclusion.

I've been exposed the the ISO standards process.  I know the lengths by which they're suppose to look out for the interests of the relevant industry as a whole.  I can imagine the "Soft Hyphen" sub-committee heated discussions :).  "Punctuation!"  "Format Control!" "Punctuation!" "Format Control!" "I know you are but what am I!" "Neener Neener!"

All I want is my unit tests to pass!

With that, have a great Holiday Season.  Make sure you spend some quality time with family and friends.  Always designate a driver or call a cab.

Wednesday, November 21, 2007

Placing your code in the forge - Refining a technique

While preparing for this posting, I had the chance to review the code in my Parallel.pas unit.  I was trying to get my implementation of TNestedTask<T> to work properly with nested functions.  The problem I had is that with Win32 generics, you cannot have BASM (Built-in ASseMbler) blocks in the bodies of the methods in a parameterized type.  So I needed to find a way to call a function pointer that can take a type parameter as the result type.  The thing is that depending upon the result type, the calling code has to handle it very differently.  For instance, if the result type were a floating point value, the result would appear on the top of the floating point stack.  If it were a structure > 4 bytes in size, the caller has to pass in a hidden pointer parameter that tells the function where to place the result.  And finally, for simple ordinal types they're returned in a CPU register (EAX).  Even if the compiler were to allow assembler instructions in the bodies of methods in a parameterized type, there is no way for that code to properly "morph" into the correct sequence for handling all the various types of return types.

The compiler doesn't currently (aside from some internal research I've been doing in the compiler) support the notion of a nested procedure pointer, but it does support procedure pointers for any global procedure or function.  My solution was to create three little assembler functions to serve as helpers.  The Delphi compiler is smart enough to forgo the creation of a stack frame for most very simple procedures and functions.  This is especially true for any procedure or function that is nothing more than a pure block of assembly code.  Here's the little bits of code:

function GetFrame: Pointer;
asm
MOV EAX,[EBP]
end;

procedure PushFrame(AEBP: Pointer);
asm
XCHG EAX,[ESP]
PUSH EAX

end;

function PopFrame: Pointer;
asm
POP EDX
POP EAX
PUSH EDX

end;

The first function, GetFrame, will return the calling function's frame.  You should never call another function that calls this function if you're expecting to get the right results.  The other two functions are there for calling the nested procedure/function.  PushFrame takes the frame value stored off from a previous call to GetFrame and injects it onto the stack.  Since this is a function call, we have to swap it with the current top of the stack which is the return address for this function.  Then it is returned to the stack where the "RET" instruction that is generated by the compiler can go back to the right location.  PopFrame just undoes what PushFrame did.


Here's a little bit of code that demonstrates how to use them to call a nested proc:


type
TNestedFunction = function: Double;
TCalculatorFunction = (cAdd, cSubtract, cMultiply, cDivide);

function CallNestedFunction(ANestedFunction: TNestedFunction): Double;
var
LEBP: Pointer;
begin
LEBP := GetFrame;
PushFrame(LEBP);
Result := ANestedFunction;
PopFrame;
end;

procedure CalcFunction(Left, Right: Double; CalculatorFunction: TCalculatorFunction);

function Add: Double;
begin
Result := Left + Right;
end;

function Subtract: Double;
begin
Result := Left - Right;
end;

function Multiply: Double;
begin
Result := Left * Right;
end;

function Divide: Double;
begin
Result := Left / Right;
end;

var
Result: Double;
NestedFunc: TNestedFunction;
begin
case CalculatorFunction of
cAdd: NestedFunc := @Add;
cSubtract: NestedFunc := @Subtract;
cMultiply: NestedFunc := @Multiply;
cDivide: NestedFunc := @Divide;
end;
Result := CallNestedFunction(NestedFunc);
Writeln(Result);
end;

By using the assembler functions above, I can make CallNestedFunction a parameterized function like this:


type
TNestedFunction<T> = function: T;
TCalcClass = class
class function CallNestedFunction<T>(ANestedFunction: TNestedFunction<T>): T; static;
end;

class function TCalcClass.CallNestedFunction<T>(ANestedFunction: TNestedFunction<T>): T;
var
LEBP: Pointer;
begin
LEBP := GetFrame;
PushFrame(LEBP);
Result := ANestedFunction;
PopFrame;
end;

... Same as above ...

var
Result: Double;
NestedFunc: TNestedFunction<Double>;
begin
case CalculatorFunction of
cAdd: NestedFunc := @Add;
cSubtract: NestedFunc := @Subtract;
cMultiply: NestedFunc := @Multiply;
cDivide: NestedFunc := @Divide;
end;
Result := CallNestedFunction<Double>(NestedFunc);
Writeln(Result);
end;

In this way, I simply let the compiler figure out the right way to call through the procedure/function pointer and all those assembler functions do is to make sure the calling frame reference is on the stack.  Before someone asks, you should not wrap the PushFrame, call, PopFrame sequence into a try..finally block (I know it does kind of look like that should be done).  First of all, it will mess up the stack because exception frames use the stack to keep them linked together.  Secondly, it is wholly unnecessary.  If an exception occurs in the call to the nested proc, the system will unwind the stack and make sure each finally block and except block has the proper local frame.  How that all works is a rather complicated topic that would take up a lot of posts.  If you look at how the compiler generates code for a call to a nested procedure, it doesn't try to do any kind of exception wrapping either.  The extra frame value on the stack will be properly cleaned up.

When code lies - A better solution

Many of you wretched and guffawed at my silly post about Stupid Enumerator Tricks. You know what? I agree. It was a hideous misuse of a feature. One comment from Joe White that stood out was the notion that the code was lying. He is right. I was using a a side-effect as the primary feature, not what the code was actually saying. However, from casually observing the code in question, it could easily be misconstrued. The code was lying.

How would the maintainer of this code figure out the intent a year or more from now? What if I were the maintainer? How many times have you looked at a chunk of code you wrote a long time ago and about all you remember about it was "how utterly awesome and clever you were?" But now, you don't have the faintest idea what it does and how it works. Sure, you could have put comments in, but what if they were wrong and misleading? (this is where I like to say that no comment is far better than a wrong or bad comment). After staring at the code for what seems like hours... you suddenly have a forehead-slapping moment and exclaim, "What was I thinking!!"

But is there a better method of handling the automatic cleanup of this task? Another commenter, Jolyon Smith suggested the use of interfaces as a way to automatically manage the lifetime of an object. I, too, have used this little trick as well. Let's look at my TTask, TTask<T> and TReplicableTask implementation. Here's the declaration of the classes and the interfaces. Yes, I'm using Win32 generics... because, well... they work (for the most part) on my machine :).

  type
TFunctionEvent<T> = function (Sender: TObject): T of object;

ITask = interface
procedure Wait;
procedure Cancel;
function GetIsComplete: Boolean;

property IsComplete: Boolean read GetIsComplete;
end;

ITask<T> = interface(ITask)
function GetValue: T;

property Value: T read GetValue;
end;

EOperationCanceled = class(Exception);

TTask = class(TInterfacedObject, ITask)
private
FDoneEvents: THandleObjectArray;
FCanceled: Boolean;
FException: TObject;
protected
FEvent: TNotifyEvent;
function GetIsComplete: Boolean;
procedure WorkEvent(Sender: TObject);
procedure QueueEvents(Sender: TObject); virtual;
procedure Wait;
procedure Cancel;
property IsComplete: Boolean read GetIsComplete;
public
constructor Create(Sender: TObject; Event: TNotifyEvent);
destructor Destroy; override;
procedure CheckCanceled;
end;

TTask<T> = class(TTask, ITask<T>)
private
FEvent: TFunctionEvent<T>;
FResult: T;
procedure RunEvent(Sender: TObject);
function GetValue: T;
property Value: T read GetValue;
public
constructor Create(Sender: TObject; Event: TFunctionEvent<T>);
end;

TReplicableTask = class(TTask)
protected
procedure QueueEvents(Sender: TObject); override;
end;

So now the usage is like this:


procedure TForm1.Button1Click(Sender: TObject);
var
IntTask: ITask<Integer>;
begin
IntTask := TTask<Integer>.Create(Self, CalcResult);
Caption := Format('Result = %d', [IntTask.Value]);
end;

function TForm1.CalcResult(Sender: TObject): Integer;
var
I: Integer;
begin
Result := 1;
for I := 2 to 9 do
Result := Result * I;
end;

Much cleaner, don't you think? And remember, even if the task isn't complete yet, the call to IntTask.Value does an implicit wait until the task has obtained the value. By making the methods private or protected (or, even better, strict private and protected) on the TTask and TTask<T> classes, you can be more assured that someone doesn't decide to use the object directly. You must obtain the interface in order to use the class instance.


Update: I just updated this Code Central entry with a new Parallel.pas unit that demonstrates some of the above, along with a new TTask, TReplicableTask, and TNestedReplicableTask objects along with requisite interfaces.  A compiled version of the Life demo was also included per some requests.  Finally, the Parallel.pas unit demonstrates a new technique for calling nested procedures without having to litter your code with a bunch of assembler blocks.  I'll post another blog entry describing it.

Wednesday, November 14, 2007

Automated Testing - Learn it, Live it, Love it

 

Steve Trefethen may be an ex-CodeGear employee, but he is still adding significant value to the developer community as a whole.  Looks like he's started a series of posts where he'll talking about test automation and the advantages it gives you.  Sure, everyone knows about unit-testing (at least you should), but Steve is talking about a much more holistic approach to automated testing.  Here at CodeGear we have a significant internal investment in automated testing tools, especially for Delphi.  So much so that we have several patents related to the technology.  There are also a lot of excellent automated testing tools out there, such as TestComplete by AutomatedQA.  I'm sure there will be some excellent information as he delivers the blog postings.

Monday, November 12, 2007

Stupid Enumerator Tricks - And now for something completely different

Sometimes I get admittedly kooky, crazy ideas.  While playing with Tasks, ParallelFor, and Thread Pooling, a rather interesting thought popped into my head.  I've been pouring over the very interesting ideas and techniques outlined in this article on MSDN that is referring to the Parallel FX library that is in the works for the .NET framework.  One thing I've been looking at is how to make something like this applicable to natively compiled code in Delphi for Win32.  While I understand that .NET has a lot of great features and that a managed runtime environment has some interesting advantages, right now I'm all about researching how to take native Win32 code and Delphi to a new level.

What does this all have to do with Enumerators and Stupid tricks.  Well... it turns out that in Win32, the for..in language construct is the only case where the compiler will actually properly manage the lifetime of a class.  If you're curious to what I mean, make sure you look into this excellent rundown on how to write them.  It you look at some of the code in the referenced MSDN article (ignoring, for now the use of anonymous methods) there is this bit of code:

static void For( int from, int to, Action<int> body )
{
int index = from;
ReplicableTask rtask = new ReplicableTask( delegate {
int i;
while ((i = Interlocked.Increment(ref index)-1) < to) {
body(i);
}
});
rtask.Wait();
}

Since .NET is a nice garbage collected environment, there is no need to explicitly worry about freeing the "rtask" variable.  It will be collected for you.  However in the cold-cruel world of natively compiled code, you are afforded no such luxury*... sort of.  Ignoring that whole anonymous method thing, but using the nested procedure trick I mentioned here, here's a potential implementation in Delphi for Win32:



procedure ParallelFor(AFrom, ATo: Integer; AEvent: TIteratorEvent);
var
RTask: TReplicableTask;
Index: Integer;

procedure ExecuteEvent;
var
I: Integer;
begin
I := InterlockedIncrement(Index) - 1;
while I < ATo do
begin
AEvent(I);
I := InterlockedIncrement(Index) - 1;
end;
end;

begin
Index := AFrom;
RTask := TReplicableTask.Create(@ExecuteEvent);
try
RTask.Wait;
finally
RTask.Free;
end;
end;

But what if we could eliminate writing that whole try...finally thing?  Here's where it gets a little out there and on the edge :).  What if, instead we write this:



procedure ParallelFor(AFrom, ATo: Integer; AEvent: TIteratorEvent);
var
RTask: TReplicableTask;
Index: Integer;

procedure ExecuteEvent;
var
I: Integer;
begin
I := InterlockedIncrement(Index) - 1;
while I < ATo do
begin
AEvent(I);
I := InterlockedIncrement(Index) - 1;
end;
end;

begin
Index := AFrom;
for RTask in TReplicableTask.Create(@ExecuteEvent)do
RTask.Wait;
end;

So how does this work?  If you read up on enumerators on Primoz' and Hallvard's blogs, you'll know that as long as the TReplicableTask object has a method called "GetEnumerator" that returns an enumerator with the Current property of type TReplicableTask.  It will also automatically embed a try..finally block into the code and properly destroy the enumerator object for you.  Like I mentioned above, this is the only place where the compiler does this for you.  So if we added a GetEnumerator method to TReplicableTask that looked something like this:



type
TReplicableTask = class;
TTaskDestructor = class
private
FTask: TReplicableTask;
FMoved: Integer;
public
constructor Create(ATask: TReplicableTask);
destructor Destroy; override;
function GetCurrent: TAutoObject; inline;
function MoveNext: Boolean; inline;
property Current: TAutoObject read GetCurrent;
end;

TReplicableTask = class
public
...
function GetEnumerator: TObjectDestructor;
...
end;

{ TTaskDestructor }
constructor TTaskDestructor.Create(ATask: TReplicable);
begin
inherited Create;
FTask := ATask;
FMoved := -1;
end;

destructor TTaskDestructor.Destroy;
begin
FTask.Free;
inherited;
end;

function TTaskDestructor.GetCurrent: TAutoObject;
begin
Result := FTask;
end;

function TTaskDestructor.MoveNext: Boolean;
begin
Result := InterlockedIncrement(FMoved) > 0;
end;

function TReplicableTask.GetEnumerator: TTaskDestructor;
begin
Result := TTaskDestructor.Create(Self);
end;


So there you go...  An enumerator that is not really an enumerator, but actually a poor-man's implementation of an auto-destroyed object.  Yep, a Stupid Enumerator Trick of the highest order :).  So... what if we really did have anonymous methods in some future version of the compiler?  What would the above code look like then?  This is pure speculation, but I'd imagine it might look something like this:



procedure ParallelFor(AFrom, ATo: Integer; AEvent: TIteratorEvent);
var
RTask: TReplicableTask;
Index: Integer;
begin
Index := AFrom;
for RTask in TReplicableTask.Create(
procedure
var
I: Integer;
begin
I := InterlockedIncrement(Index) - 1;
while I < ATo do
begin
AEvent(I);
I := InterlockedIncrement(Index) - 1;
end;
end
) do
RTask.Wait;
end;


And, just to remind you... I'm talking natively compiled code here :).  Let the rampant speculation begin....


 


*Some would argue that this is a "good thing," but that is a can-of-worms I will refuse to open at this point :)

Wednesday, November 7, 2007

Magical Assembler Incantations - Nested functions and anonymous methods

A few folks left some comments on my Life and Times of a Thread Pool post asking to comment on and explain what those little snippets of assembler do in the LocalParallelLoop() function in the TThreadPool class.  You can get the code to which I'm referring in this Code Central entry.

First a little background.  Object Pascal has long supported the notion of nesting functions and procedures within each other.  This is a very powerful way to better hide implementation details and making your code much easier to understand and maintain.  Many times a nested function is used to hide a deeply recursive function such as a quick sort  algorithm.  The nice thing about using a nested function is that it has full access to all the local variables of the surrounding procedure (provided they're declared prior to the nested function implementation).  So rather than having to define a peer-method and pass in all the values needed, a nested procedure need only take the parameters that may change between each invocation or are the result of an inline expression.

Let's peek under the hood at how the compiler accomplishes this.  On the x86 platform, there are several conventions that are used to store and access local variables.  This involves the use of the hardware stack and either the stack pointer (ESP) or the Base Pointer (EBP).  Direct stack indexing is used by the compiler when the function does little stack manipulation and usage, however in most cases, the compiler will generate a common sequence of instructions to establish a stack frame.  This usually consists of pushing the current Base Pointer (EBP) onto the stack, then loading the Base Pointer register (EBP) with the top of the stack (ESP).  Then the stack pointer is adjusted to "allocate" the local variables by decrementing the stack pointer register by the total size of the local variables.  Now to access the passed in parameters, you use positive indecies off the Base Pointer and then to access local variable negative indices are used.  Once you've established what the Base Pointer is, passed in parameters and local variables are at a predetermined (at compile-time) offset.

If you can get access to the Base Pointer for a given instance (or invocation) of a function, you can now easily get access to all the parameters and local variables.  This is exactly how a nested procedure grabs the value from it's outer procedure.  The compiler does not actually nest the body of the nested procedure into the middle of the outer procedure, but rather generates it as it's own, stand-alone procedure.  The only difference is that it expects a hidden parameter, which is always placed onto the stack prior to calling the procedure.  This hidden parameter is the Base Pointer from the outer procedure.

So now let's look at how this little magical incantation operates.  In LocalParallelLoop(), there is this little bit of assembler:

asm
  MOV EAX,[EBP]
  MOV LEBP,EAX
end;

What this little snippet does is to load the Base Pointer of the calling function.  If you recall from above, when creating a stack frame, the calling procedure's Base Pointer value is saved by pushing it onto the stack, and then the current procedure's Base Pointer is established by loading the current value of the stack pointer into the base pointer register (mov ebp,esp).  At a 0 (zero) offset from the location referenced by the currently executing procedure's base pointer is the caller's base pointer.  That is what the mov eax,[ebp] instruction does.  It loads this value into the eax register.  This value is then stored into a local variable called LEBP (the compiler knows how to generate the right instruction to address the LEBP variable on the stack).  Now that we have the caller's base pointer,  we can now call the function referenced by the LocalProc parameter.

This is where things can go horribly wrong and also where the compiler is really no help at all to you.  Notice that the LocalProc parameter is just a raw Pointer.  No type information, no type checking... In other words, you're drifting in "You had better get it right or you're screwed, land."  The caller just takes the address of the local procedure using the '@' operator which doesn't do any type checking.  The LocalParallelProc is taking it on faith that you've properly declared the local procedure to take a single Integer parameter and that it is using the default calling convention.

In this case, the default calling convention is to try to pass as many parameters as it can in CPU registers before pushing the remaining ones onto the stack.  This is the 'register' calling convention and is the default.  In this case only one parameter will be passed in a register, the index.  Recall from above that for a nested procedure, the caller's base pointer is always pushed onto the stack.  Since we have that value in LEBP, it is a simple matter to now make the call.  That is what the following code does:

asm
  MOV EAX,Index
  MOV EDX,LEBP
  PUSH EDX
  MOV ECX, LocalProc
  CALL ECX
  POP EDX
end;

What this little bit of code does is to load up the EAX register with the Index which is the parameter it is passing into the LocalProc, get the LEBP value and push it onto the stack, and then finally, get the address of the LocalProc and perform an indirect call.  Upon return, the stack needs to be cleanup from pushing the base pointer so the pop edx instruction handles that task.

As you can see there is no checking to make sure that LocalProc points to a properly declared procedure, or even that it points to code at all!  Another little problem is that you should never, ever, retain the address of the LocalProc any place other than in a parameter or another local variable.  Without the proper base pointer also passed along, you can risk corrupting memory at worse, or crashing your application at best.  Allowing the LocalProc parameter to live past the lifetime of this function is a clear no-no.  Even though the LocalParallelFor() function does lots of things and even calls back to the caller's local function as long as the logical thread of executing remains in this function, LocalProc and LEBP are valid.  In this case , LocalParallelFor() doesn't return until all the work is completed which ensures that the caller's frame remains on the stack and intact.

What about anonymous methods?  In .NET, C# has the notion of anonymous methods which are simply inline-declared delegates which contain little snippets of code.  Basically, you pass the code as a parameter to a method that takes a delegate.  Under the hood, the compiler generates a method, gives it some made-up name (to satisfy the CLR).  Since .NET is a garbage-collected runtime environment, they are more free to play fast-and-loose with the calling functions local variables.  In this case those variables are "captured" and copied [EDIT: clarification in the comments from Barry Kelly, fellow CodeGear colleague].  Then a reference to this copy these variables is passed along.  Once all the delegates that refer to this copy are released, the captured copies variables are then released.  Simple. [not really because that is a very naive and over simplification way of looking at it, but the point is that the surrounding run-time helps a lot]

For the unmanaged, natively compiled space, things are a little more complicated since not all data types are managed.  Sure, things like strings, variants, interfaces and dynamic arrays are properly reference counted and destroyed, most other data-types require the user to explicitly manage the instance.  You simply cannot "capture" the caller's stack frame and hope that it remains valid for as long as you reference it.  If, and only if, the usage of a calling procedure's stack frame were purely limited to the lifetime of the currently executing procedure, then it is safe and there is no need to "capture" the caller's frame.  In most cases, this is actually what happens.  It is those little fringe cases that can really cause the headaches and mysterious bugs to show up.

It turns out that the next version of the C++ standard will introduce what is essentially anonymous functions they're calling "lambdas."  They're also trying to address the "capture" problem by allowing the user to specify how the locals are to be handled.  Were Delphi to get anonymous methods/functions on the native-compiled side, it is very likely that we'd follow the very closely the work being done on the next C++ standard as a guideline.

Monday, November 5, 2007

Delphi, Adelphi, & Flickr

Back in March of this year, you may recall that I'd traveled throughout Australia and parts of Asia.  While in Australia I visited Sydney, Melbourne and Brisbane.  While in Melbourne I stayed at a rather eclectic hotel with the rather apropos name, Adelphi. It was very much a "bohemian" place to stay.  So much so that I had to take a few photos of the room.  Throughout the trip, I'd uploaded some of these photos to my Flickr account.  Thinking of it no further.

Then on October 24th, I got a rather odd email through my Flickr account.  Normally I'd have disregarded it as spam since I didn't know the sender, but since it was specifically through my Flickr account I looked into it.  It was from someone associated with a site called Schmap.  I manually navigated to the site and looked around.  Looked legit.  Apparently they troll through Flickr and other photo sites for people's vacation photos associated with many of the highlighted locations on their site.  They contact these folks asking for permission to link to their photo and to use it in their online promotions.  I read through the online agreement and it seemed reasonable.  I retained all my rights to the photo, and I only grant them specific rights to use the photo.  When the photo is displayed on their site it links back to my Flickr account and proper attribution is given.  It was only one photo of the room I stayed in at the Adelphi hotel, so I figured it couldn't hurt.

Last week, I got word that the photo had been selected and was live on their site.  Here's the link that contains the photo.  You may need to scroll through the photos on the right, but mine should be there :).  It looks like their site is a just a clever mash-up using Google maps, links to Priceline, and Google adwords.

Friday, November 2, 2007

The Life and Times of a Thread Pool

As promised in this post, I've uploaded a demo project to Code Central. This is an implementation of Conway's Game of Life using a thread pool and a parallel loop. It uses a 500x500 wrapping grid in which only a portion is displayed in the UI. No calculations of the next generation are done in the UI thread in either parallel or serial mode. While the implementation of the life algorithm is interesting, that is not the intent of this demo. The Parallel.pas unit is the main focus of this demo. This unit contains the thread pool and a couple of parallel for loop implementations. One implementation takes a method pointer as a callback and the other uses an old trick where you can pass in the address of a nested procedure. This is the technique used by default, but a simple modification will allow it to use the other version. This was developed using Delphi 2007+ (internal developer build). It should work fine with shipping D2007. It is also Win32 only.
Here's a simple screen-shot:

You can check and uncheck the "Parallel" check box while the application is processing generations to see the effect of going parallel. For those of you with single core systems, you will not see much improvement by checking or unchecking Parallel.
For more information on Conway's life and information of various other algorithms here's a few links:
http://psoup.math.wisc.edu/mcell/ca_files_formats.html
http://www.radicaleye.com/lifepage/
Finally, here is a very good implementation of Life written in Delphi:
http://psoup.math.wisc.edu/Life32.html

Wednesday, October 31, 2007

I cut and I cut and it was still too short!

In keeping with the notion that I'm highly fallible and flawed human like so many others I just have to relay an important epiphany I had while playing some more with thread pools, parallelization of loops, and using a different thread pool technique. Actually, more than a week ago I'd planned on writing up this simply simply stellar posting about how "simple" paralleling can be done and the overall results of some "research" I'd been doing (you know, the 'R' in R&D). I'd even posted a little challenge for folks in this message. However, I'd actually spent the last few weeks becoming even more follicularly challenged.

I've discovered a crucial bit about how you divide a task up into smaller chunks for dispatching into a thread pool. The technique you use for chunking the task is highly dependent on the nature of the task itself [and the crowd sighs with a huge DUH!!!]. Obviously your algorithm cannot have interdependencies between the iterations, directly or indirectly. I got that one. Here's what I discovered.

I took my thread pool class and wrote a ParallelFor function that takes a low and high bound and a method pointer (this will become a generic function once we have Win32 generics fully functional). The routine would divide up the range into smaller "chunks" based on the number of processors available and then dispatch these smaller chunks into the thread pool. It would then block on a list of events until they were all signaled. This meant that all the sub-tasks were complete and the function can return. Works great!

Now I needed a sufficiently long running algorithm that also had the trait where each iteration is independent from all the others. I also wanted one that folks were familiar with. I've got it, the Sieve of Eratosthenes! Simple, iterative algorithm, right? Or so I thought. The first step was to code it up in a single-threaded manner and test it, get a base-line on the performance, and make sure I got the algorithm correct. Easy. Check. Now I need to figure out how to use my new handy-dandy "Ginsu Knife" ParallelFor procedure. So the sieve consists of an outer loop and an inner loop. Just pass the outer loop range to ParallelFor and then in the method callback perform the inner loop. Simple. I whipped this all up into my test application where I'd done some very simple "QueryPerformanceCounter()" profiling. I already knew the baseline performance of the algorithm, so I was all ready to see both cores on my "Yonah" based Dell D820 light up. You know that sinking feeling in your gut when you're all ready for something big and when it gets to the moment, it is a huge let-down? That's just what happened... The performance was nearly identical to the single-threaded version!

I spent the next couple of weeks on and off doing things like rewriting my thread-pool using this interesting technique referred to me by Chee Wee. I adjusted the number of chunks used for my ParallelFor function. Disabled SpeedStep on my machine. Verified that both cores were, in fact, being exercised. I wrote a little assembler function using the cpuid instruction to identify each core and make sure I'm seeing some distribution between the cores. Yep, both cores were lighting up... but why wasn't the performance any better??? Logic would dictate that it should. I had conservative expectations. Anywhere between 15 and 30% improvement would be great. Close to 50% would be perfect. I made sure the array was aligned... wait that doesn't matter since it was an array of bytes. It's mostly unaligned in that case. So I made it an array of Integers... nope virtually no performance change! Back to array of Byte. Maybe I need to increase the range of primes to calculate since there is some overhead in the ParallelFor procedure. So I ultimately increases it to 1million. Took longer, sure. But it was still the same performance as the single threaded version!!! By now I certain I'm destined to be immortalized for all to see on the the DailyWTF

Ok, time to drag out the big guns! So I fired up my trusty AQTime. After a couple of runs, I started noticing something interesting. Of the two threads being dispatched into, one was eating significantly more CPU cycles than the other. Why was that?? I rechecked the chunking in my ParallelFor function... nope it evenly divided the range up just fine. (Ok, the much more astute among you have long ago figured out my error) Finally, I went back and analyzed the sieve algorithm... Suddenly I realized what the problem was.

If you look at how the Sieve of Eratosthenes actually works, it doesn't really calculate which numbers are prime but rather which ones are not prime. When it is done, each element of the array that is not marked, are the prime numbers. So as the algorithm runs it is figuring all the numbers that have the current iteration's value as a factor. For instance, in a given finite range there are far more numbers that have 2 as a factor than say, 10. By chunking the algorithm in even linear chunks, the lower range has to do more work than the higher range.

The solution in this case was to provide a way to interleave the work among multiple threads such that each thread takes a fair share of the high-factor lower integers. Once I rewrote the ParallelFor algorithm to do this and retested the application I was greeted with a performance improvement very close to 50%! I'd finally got each core equally lit up!

In conclusion, based on this information, it is very clear that the way you break apart an algorithm is highly dependent on the nature of that algorithm. In the case of the Sieve of Eratosthenes, an interleaving algorithm is the optimal way to make it parallel. However, in the case of, say, doing an alpha-blend over a large, in memory image, the larger linear chunks may work just fine. Suppose you did have an algorithm that did have dependencies on adjacent iterations? In this case, the large linear chunk version may work better because you only need to worry about the edges. However, even in that case, such an algorithm is in danger of degrading back to being serial because you cannot calculate iteration 100 before you finish iterations 1-99. Sometimes you can minimize these interdependencies at the expense of using more memory.

Another interesting algorithm to make parallel is Conway's Game of Life. In this case you calculate the new generation based on a static version of the previous generation. I think I'll tackle that one and see if interleaving or chunking makes any difference. I'll be back with a report. This multi-core/parallel processing stuff is interesting, challenging, and overall forces you to think a little differently about your problem space.

Tuesday, October 30, 2007

You want to do what?

I always get a kick out of reading Raymond Chen's blog, The Old New Thing.  So many of his posts hit home way too often :).  One of his latest posts simply highlights one thing I always try to do and get folks to do when they ask for support or help with such-and-such.  I've caught myself doing this same thing many times.  Don't ask how to implement your grandiose solution, simple present the problem you're trying to solve and then present your proposed solution. 

Why present the problem?  You've already thought about it (I presume) up one side and down the other.  I should not have to bother this person with my problems;  they're mine to solve, dag-nabit!  First of all, why are you even asking for help?  If you're suppose to be "God's gift to programming," why are you seeking advice?  Your not?  You're a fallible, imperfect human?  Me, too!  Good, now that we're all on the same level here, back to why should you present the problem?  When you present and frame the question in terms of the problem you're attempting to solve, you have the highest chance of getting the information you want.  In fact, in cases too numerous to count, I've gotten better solutions to my original problem than my own "awe-inspiring" solution.

I see this same thing all too often in the peer-support forums and newsgroups.  A person will pop in and asking for help with how to implement their solution, only to be disappointed by the responses due to a lack of common information.  Too much inference is going on.  The same is also true of QualityCentral reports.  Rather than simply stating that such-and-such is broken, a detailed explanation along with detailed steps go a long way to getting your point across.  My parents always used to tell me, "To be terrific, you must be specific."

Wednesday, October 17, 2007

Enumerators, Inlining, methods on records, for-in and deadlocks

The ever-present, over-achieving Hallvard Vassbotn has just posted an excellent analysis of how the Delphi compiler generates code for the "for-in" statement. He also presented many suggestions on how to make this as efficient as possible.  I will be taking his suggestions back to the team to get folded into the next release.  You should also be sure to read Primoz Gabrijelcic's blog posts on writing enumerators.

 

I noticed that I haven't gotten any "bites" on my challenge to spot the deadlock...  Since returning to the office after my trip to Kona, HI, I've been working on several things not related to multi-core, thread-pooling, paralleling, etc...  Mostly surrounding the next Delphi/C++Builder release.  I'll come back to threading soon.

Friday, October 5, 2007

Spot the deadlock

This is the final full day I'll be here in Kona, Hawaii for the ANSI/ISO C++ standards meeting and they're just going through some of the minor issues, or actually "bugs" in the standard, I've been playing with the thread pool class from my post about thread pools. I included a small project to demonstrate the concept. However, in working with it, I found a subtle deadlock condition. Can you spot it? I'll give you a hint: SendMessage. I think this highlights the complexity that can creep into even the seemingly simplest of multithreaded code. Of course I wasn't being particularly careful while writing the code as it was only a conceptual exercise.

One common thing you may want to do with thread pools is to dispatch several tasks to the pool, do some other work, and then wait for the result. Another twist is that you want to continue working in the UI and then at some point in the future, a method is called when the set of tasks are completed. I've been adding some code to the SyncObjs.pas unit, and specifically the THandleObject class. I've added a class function called WaitForMultiple(). This allows you to pass in an array of THandleObject class instances on which to wait. You can wait for one or all. This is just a wrapper around WaitForMultipleObjectsEx. In the thread pool class I've added a parameter to the QueueXXWorkItem methods allowing you to optionally pass in a TEvent object that will be signaled once the work item completes. Since you may want to dispatch many work items, all of which must be completed before continuing, you'd call THandleObject.WaitForMultiple() or a new class method on the TThreadPool class.

An interesting thing to note is that if the UI thread blocks waiting for the work items, it seems that the items are not dispatched into the worker threads at all and the application hangs. There is nothing in the MSDN documentation for QueueUserWorkItem. If I perform the wait on a separate wait thread, it seems to work OK. Any ideas? Are the tasks dispatched onto the worker threads via synchronization with the message queue? I would have figured that there was a control thread that waits for things to exist in the work item queue and then does the dispatching.

Thursday, October 4, 2007

ANSI/ISO C++ meeting Day 4

Shameless namedrop... sitting across the table from Herb Sutter(from MS) and Bjarne Stroustrup (from AT&T labs) in the Evolution/Concurrency Working group discussing http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2407.html.  Just to make it clear to the Delphi and C++Builder folks... we've had this for many years in the form of packages.  It's good that the standards bodies are getting on board with this concept.  Of course there are many issues with defining a standard that can be implementable among various platforms, mainly Unix (and similar) and Windows...  Well it looks like this proposal has died in committee due to the fact that many felt that they would be unable to finalize the proposal in time to get a new standard out by 2009 (aka. C++09, from C++ox).

Next discussion is more informal and is related to garbage collection... should be interesting.  Apparently there will be a minimal specification of garbage collection in the next standard (at least at this point).  Main sticking points are about obscured pointers, leak detection, and destructors.  No, no... GC, as defined in the original proposal, is not in. Geez, it is hard to track all the different proposals and their status as they wind their way through the various subgroups.  Looks like they've decided to put in some special library functions to allow certain annotations to give "hints" to any collector about whether or not a memory region contains or doesn't contain reachable pointers.

Great googly-moogly, now the discussion is about the definition of "may"...  now they're discussing that "need not" is the negative form of "may."  Stop before I go insane :-).

Now in the afternoon session where they're finally trying to nail down the final proposal for the threading library.  Working on some interesting problems with the try_lock() function.  It seems that there are cases where try_lock can return false even if the lock is available.  The details are somewhat abstract and platform specific.  In properly written code, a "spurious failure" of try_lock() is not really a problem.  The one degenerate case as presented by Hans Boehm (from HP) (translated to Delphi code):

var
  x: Integer;
  M: TMutex;
 
thread #1 thread #2
x := 1;
M.Lock;
while (M.TryLock) do
  M.Unlock;
Assert(x = 1);

It was quickly pointed out that this is considered "bad" programming practice.  The issue is that there is actually a race because if try_lock() fails (and failure in this case is the notion of not acquiring the lock even though it is available).  On multi-processor/core systems with guaranteed cache coherency, it is hard to see a case where a "compare and swap" (the typical underlying mechanism for implementing a mutex) operation would fail spuriously.  I do agree that with properly written code (NOT the code above :-)), this should not be a problem.

Wednesday, October 3, 2007

ANSI/ISO C++ meeting Day 3

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2320.html - Went over this paper some more and rehashed the issue surrounding whether or not the thread object should detach or join the underlying OS thread.  Join and detach are POSIX thread notions and roughly correspond to Windows WaitForSingleObject(<thread handle>) and CloseHandle(<thread handle>).  Looks like their going with the detach rather than the join, mainly because they had no known prior art where a thread object does a join in the destructor.  Alisdair, at my prompting, did make it known that the Delphi TThread object does, in fact, do a join in the destructor.  So.. there's my contribution to the C++ standards process... they're still going with the detach, but it was a long, long discussion.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2410.html - This paper was just covered in the final session of the day.  What is interesting is how they define thread-safety guarantee.  There is the notion of basic thread-safety and strong thread-safety.  Adding "strong" thread safety in which a globally shared object can be mutated by more than one thread in a safe manner has a significant negative impact on performance.  There are so many notions of "thread-safety" that it is so hard to nail down one... some are small in scope where they're only dealing with atomic operations on single "memory-locations" and then the much broader notion that many folks seem to relate with is the strong notion.  The problem is that this "strong" notion is not actually the case where "I can write my program in any way I see fit without regard to how the threads interact."  Even all the work going on to leverage multi-core systems both in the libraries and even intrinsically by the compiler, these are not "strongly" thread-safe in the broad naive notion.

Now they're talking about random number generators...  Cool.. I needed a nap :-).

Tuesday, October 2, 2007

ANSI/ISO C++ meeting Day 2

OK, it is actually getting somewhat interesting.  Aside from the myriad of straw-polls, and various other procedural items, the technical discussions have some interesting bits.  We're just finished discussing atomics per this paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2393.html.  It was actually moderated by one of the authors, Lawrence.

For the Delphi folks on the x86 architecture, it is generally assumed that most memory operations are sequentially consistent without the need for specific memory fences (this is in user-mode where Delphi apps typically live).  However for some other architectures like PowerPC, Itanium and others, explicit memory barriers need to be inserted.  The notion of atomics, codifies this distinction whereby any "normal" variable does not have necessarily acquire/release semantics.  However an atomic variable does have this behavior.  It turns out that the C++ volatile modifier is wholly inadequate for describing the notion of acquire/release.

Another funny item that became a sticking point was whether or not bool (aka. Boolean in Delphi parlance) is bitwise comparable.  Since a bool has one and only one value for "false" (zero[0] typically), yet many bitwise values for "true" (non-zero[0]), you cannot guarantee an atomic "compare and swap" for a bool type.  It is intriguing that such a seemingly simple, fundamental type can present such a problem.

We're now talking about this proposal for a multi-threading library: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2320.html.  Being presented by one of the authors, Howard.   Although the version being presented has had the thread cancellation section removed and explicitly made undefined.  This is because of the notion that cancellation is done via an exception signal... which could be caught and effectively cancel the cancel request...This ended up being rife with other issues (besides the fact that a thread can deny a cancellation request, nay, demand), like intermixing non-C++ frames or even intervening OS frames. 

Discussion of condition variables, which Windows doesn't have any intrinsic primitive (not until Windows Vista), has been somewhat interesting (and another sticky point).  Pete Becker (former Borland C++ RTL guru)  has raised the issue that the current proposal seems to be designed with a bias to POSIX (through BOOST) and unnecessarily complicates the implementation of condition variables for systems that don't have a native conditional-variable-like thingy.  In looking at potential implementations for a condition variable on Windows in the absence of native condition variables, it becomes abundantly clear that this isn't an easy problem to solve.  Geez.. the machinations that these various algorithms go through is astounding.  However, a condition variable construct makes implementing things like thread-safe message queues and stacks where the producer want to know "not full" state and the consumer is interested in the "not empty" state.

 

Afternoon session:

OK, we just spent over an hour wrestling with the notion of whether or not a reference is also a memory location (as defined by this document: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2334.htm).  Even though you cannot take the address of a C++ reference, it may in fact occupy a memory location.  So given that, should it share the same semantics as a regular memory location?  It all sounded so metaphysical for such an analytical group.  I understand the arguments on both sides, however the one argument I could not disagree with is that unless you call it out, some "clever" compiler writer may decide to do something "special" with references regardless of whether or not they live as a memory location or not (references can be collapsed by optimization).  Finally past that... now we're trying to define what a "thread" is...  Pretty soon my head is going to explode :-).  blah blah blah...

 

I think that is it for day two... we'll see if I have the gumption to say anything about day 3 :-)

Monday, October 1, 2007

ANSI/ISO C++ meeting Day 1

Two words.  Boring.  Tedious.  This is simply where previous meeting minutes are approved by the voting members, the setting of the agenda for the week and discussions of specific mechanical meetings issues.  This includes when we're going to get WiFi in the meeting rooms.  I don't particularly care since I'm using my cell phone as a data channel :-).  If I can stay awake, I'll try to report some more...  Once we get into the breakouts where the actual concepts are being discussed, I'm sure things will perk up.

Tuesday, September 25, 2007

Who is going where to attend what!? The ISO C++ committee meetings.

Yep.  That's right.  This weekend I'll be boarding a plane to attend the ISO C++ meetings next week.  So why is the long-term Delphi guy going to attend something for C++?  Since I do things for CodeGear across the product lines in different capacities, it isn't too much of a stretch.  But there are a couple of really interesting things happening within the C++ community surrounding the standards that may be of interest on the Delphi side of things.  This is concurrency and garbage collection.  I'll be attending these meetings along with Alisdair Meredith, the new C++Builder product manager, and one of the C++ compiler engineers.  Apparently among the many topics of discussion will be on the concurrency side with debates on memory models, variable volatility, and fence-posting.  Some discussion will certainly ensue surrounding the OpenMP folks and their requirements.  On the GC front, things are a ways off from being finalized as there are, apparently, several camps aligned with different views on this subject.  This is interesting in light of the fact that we've actually had several internal conversations about GC on the Delphi/Win32 side (nothing to see here folks, move along, move along ;-).  I'd like to see how they're going to handle adding GC to a language with such a long legacy of deterministic resource manipulation.

So if you're going to be in Kona, Hawaii from September 29th through October 6th, let me know and maybe we can meet up and chat.  Oh this is going to be such a rough trip... ;-).

Friday, September 21, 2007

Super editor zoom - proof of concept.

David Clegg mentions in this post a new super secret, "super zoom" feature.  First of all the reason this "feature" isn't on by default is that it was implemented as a very quick hack to more or less simply be a proof of concept for the team to play with.  As such, it is riddled with problems that if you start using it you may encounter.  It is actually somewhat ironic that this comes up now because for the last couple of days I've been looking at this feature and how it should actually be implemented.

The way it is currently implemented is to simply hide/show the dock sites on the editor form.  This causes a couple of major issues.  First of all, when in the zoomed mode, you cannot easily get back to the other views without unzooming first (and you must to use the mouse for this operation).  Zoom the editor, then select View|Project Manager... nothing happens.  That is because the form is visible, but the site into which it is docked is not.  Try switching desktop states, by starting the debugger or manually from the desktop state dropdown... then it gets really interesting.  Again, it was only for some internal research and a proof of concept.

Similar to my post about adding auto-save and recovery of files to the IDE, I'll go through and outline the behavior I'd expect this feature to adhere to.  As I mentioned in the previous paragraph, this feature takes the somewhat naive approach of hiding and showing the dock sites.

The first modification to this code is to iterate through all the dock sites on the editor form and then go through all the docked clients within each side and hide them individually.  This will allow the docking system to properly track their state.  It will also allow you to still selectively re-show specific views.  Selecting View|Project Manager will now properly display the Project Manager view in its properly docked position.   The next change is how the "unzoom" is handled.  As there can be several views that are not visible when the view is zoomed, you want to make sure they remain hidden when unzoomed unless they were explicitly shown while zoomed.  Another requirement is that for all "unpinned" views, they should remain unpinned and accessible.  So there could still be a thin strip on the sides and bottom where the views can still pop up.  Finally, any large desktop changes, such as switching states, loading a new project with a saved desktop, etc should cancel the zoom state.

I'm sure the next question on everyone's mind is why wasn't this feature just finished and done right for RAD Studio?  The simple truth is that it fell too low on the priority list to make it into the product.  Things like this are always going to happen where good intentions just sometimes don't make it because of the reality of the schedule and other priorities.  Another side effect of this is that we can also get some direct feedback from the field as to the overall usefulness of the concept (irrespective of the implementation) before devoting actual resources to it.  By all means, follow the instructions from David and play with it.  Even given its limitations and shortcomings there is enough there to get the idea of what it would do.

Wednesday, September 19, 2007

Wading in the shallow end of the pool - Thread Pools

A few weeks ago I was following with interest the announcement by Intel where they open-sourced their Thread Building Blocks (TBB) library. OK, so it is written in C++, makes heavy use of templates, and is generally like nearly every C++ library out there with heavy macro usage and umpteen levels of nest macros... Besides all of that, either I've gotten way better at reading and understanding C++ or these libraries are simplifying things, I was able to understand it quite well. This got me thinking a bit and so I decided to take a quick look at Windows Thread Pools.

A thread pool is simply a way to more efficiently utilize the CPU(s) while incurring as little thread context switching overhead as possible. If your application needs to do lots of quick little tasks and can do them concurrently, the build-up and tear-down overhead of threads may actually cancel out any gains from trying to push these tasks into threads. Think of thread pooling as simply a way to queue up a bunch of work-items and allow them to run on an number of available threads that are created once and used over and over. Thus you can amortize the build-up and tear-down of the threads across all the work they do. Once a work item is assigned to a thread, that work items has exclusive access to that particular thread until it is finished and allows the thread to return to the pool. A thread pool is not the solution for everything. For instance, if you have a thread that is blocked most of the time, you should not do that on a thread pool thread.

Off I went on a discovery mission to see how Windows thread pooling works. I wanted to visually see what was going on as I scheduled or queued work items. I came up with a simple TThreadPool class. Here's the declaration of what I came up with:

type
  TThreadPool = class
  private
    type
      TUserWorkItem = class
        FSender: TObject;
        FWorkerEvent: TNotifyEvent;
      end;
    class procedure QueueWorkItem(Sender: TObject; WorkerEvent: TNotifyEvent; Flags: ULONG); overload; static;
  public
    class procedure QueueWorkItem(Sender: TObject; WorkerEvent: TNotifyEvent); overload; static;
    class procedure QueueIOWorkItem(Sender: TObject; WorkerEvent: TNotifyEvent); static;
    class procedure QueueUIWorkItem(Sender: TObject; WorkerEvent: TNotifyEvent); static;
  end;


You'll notice that this is designed to be a singleton class since all methods are class static. The main function of interest is the QueueWorkItem. What this does is simply schedule the WorkerEvent to be called from a thread pool thread whenever that is. It is up to you to make sure that the instance on which the WorkerEvent event is called is still valid at the time it is called. The other two methods simply correspond to some of the flags you can pass to QueueUserWorkItem. They're not used right now. Sender is passed through to the event handler specified by WorkerEvent, so that object should contain the context in which that task item is to work.

Now here's the implementation of that class:

function InternalThreadFunction(lpThreadParameter: Pointer): Integer; stdcall;
begin
  Result := 0;
  try
    try
      with TThreadPool.TUserWorkItem(lpThreadParameter) do
        if Assigned(FWorkerEvent) then
          FWorkerEvent(FSender);
    finally
      TThreadPool.TUserWorkItem(lpThreadParameter).Free;
    end;
  except
    // Eventually this will need to somehow synchronously notify the main thread and either reraise the exception over there or
    // otherwise provide some information about the exception to the main thread.
  end;
end;

{ TThreadPool }

class procedure TThreadPool.QueueWorkItem(Sender: TObject; WorkerEvent: TNotifyEvent);
begin
  QueueWorkItem(Sender, WorkerEvent, WT_EXECUTEDEFAULT);
end;

class procedure TThreadPool.QueueIOWorkItem(Sender: TObject; WorkerEvent: TNotifyEvent);
begin
  QueueWorkItem(Sender, WorkerEvent, WT_EXECUTEINIOTHREAD);
end;

class procedure TThreadPool.QueueUIWorkItem(Sender: TObject; WorkerEvent: TNotifyEvent);
begin
  QueueWorkItem(Sender, WorkerEvent, WT_EXECUTEINUITHREAD);
end;

class procedure TThreadPool.QueueWorkItem(Sender: TObject; WorkerEvent: TNotifyEvent; Flags: ULONG);
var
  WorkItem: TUserWorkItem;
begin
  if Assigned(WorkerEvent) then
  begin
    IsMultiThread := True;
    WorkItem := TUserWorkItem.Create;
    try
      WorkItem.FWorkerEvent := WorkerEvent;
      WorkItem.FSender := Sender;
      if not QueueUserWorkItem(InternalThreadFunction, WorkItem, Flags) then
        RaiseLastOSError;
    except
      WorkItem.Free;
      raise;
    end;
 end;
end;


To see just what is going on I wrote this little application:




The numbers in the list box represent the thread ID for the thread that is currently running. The band of colors visually show how the threads are scheduled. What is interesting is that this is what it looks like after about the 3rd or 4th time it runs. The first time it runs, each color is painted in sequence in a clearly serialized manner. Subsequent iterations seem to interleave more and more. This is running on a dual core system. You can get the whole application in Code Central.

As multi-core systems become more and more mainstream (aren't they already??), your applications really should begin to take advantage of them. The problem is that multi-threaded, or concurrent, programming is not very easy since we humans tend to think serially and so it is conceptually a little tricky to understand all the various nuances of concurrency. This is where CodeGear is looking to help. By providing simple, easy to understand, tools and libraries we can help bring multi-core programming out of the realm of voodoo and black magic and into the hands of developers of all skill levels. This will involve providing both library and compiler/tool support.

Friday, September 14, 2007

Cleggy joins the CodeGear team

The ever-present David Clegg, has just joined the CodeGear crew working on the Developer Network! Welcome to the crucible (it does get a little "hot" around here at times! :-)  Be sure to stop by his blog and say, "Hi."

Thursday, September 13, 2007

A Generic Playground, an intro to parameterized types in Delphi for .NET

The other day, I presented an introduction to the new parameterized types language feature in the latest release of Delphi for .NET at the Developer Days event. First of all, I'm not what one would call a consummate expert in the realm of language generics. In fact a lot of it just makes my head hurt. For real hard-core, on the edge, looks at the kinds of things you can do with generics, you should be looking here or here. I'm sure you may recognize those guys that run those blogs.

So given that disclaimer, I'm going to start presenting some topics on using the new parameterized types in the Delphi language. Keep in mind that for this release this only applies to Delphi for .NET. Check out the roadmap for some guidance on when this will be coming to the Win32 side of the house. I'm going to approach this from the perspective that as a Delphi programmer, you've never used generics or parameterized type. You may not even know what they are, how to use them, and why. That's ok... Me neither. :-)

In many ways,"generic" programming takes the code reuse tenet of Object Oriented Programming (OOP) to a whole new level. When OOP was becoming more prominent throughout the '80s, a key advantage to the concept was this notion of being able to reuse and extend already written code in ways not considered by the original author. Instead of reusing code via inheritance and polymorphism (although parameterized types strongly leverage OOP), it is handled by separating the type of the data being operated on from the algorithm.

Suppose you've written a killer new sorting algorithm. Now you want to use that algorithm to sort a variety of different types of items. In one case you want to sort a simple list of integers. Other times you may want to sort some complex data structure based on a collating sequence unique to that data. Without "generics" you probably would have written this algorithm using traditional OOP polymorphism. So when you wanted to use your killer sort function, you'd have to create a unique descendant of the base class that implements the algorithm, override a few methods, and finally create an instance of this new type to use it.

Generics allow you to, in many cases, forgo the sometimes tedious, often mind-numbing, task of creating yet another descendant class just to sort that new data structure you just defined. With a parameterized type, you write your algorithm as if you were writing it for a specific data-type (yes, I know there is a little more you have to consider, this is the basic gist of it). So let's start with something simple.

So I find myself constantly needing to reverse the items in an array. I have this awesome algorithm that I keep writing over and over. Sometimes I need to reverse an array of integers and other times it's strings. So, how could I use parameterized types to only write this killer algorithm only once and then tell the compiler that I need a version to work with integers, or strings, or some other structure only when needed. Here's what I came up with:

type
TArrayReverser<T> = class
procedure ReverseIt(var A: array of T);
end;

procedure TArrayReverser<T>.ReverseIt(var A: array of T);
var
I: Integer;
E: T;
begin
for I := Low(A) to High(A) div 2 do
begin
E := A[I];
A[I] := A[High(A) - I];
A[High(A) - I] := E;
end;
end;


Eat your heart out, Mr. Knuth ;-). Now I can actually use the above parameterized type anyplace that I want to have the order of an array reversed, regardless of what the element types and sizes are. I don't have to create a descendant class and override some methods. I just instantiate the above type using the type of the array elements as the parameter. For example:

var
I: Integer;
IntegerArray: array of Integer;
IntegerArrayReverser: TArrayReverser<Integer>;

begin
SetLength(IntegerArray, 10);
for I := Low(IntegerArray) to High(IntegerArray) do
IntegerArray[I] := I * 10;
IntegerArrayReverser := TArrayReverser<Integer>.Create;
IntegerArrayReverser.ReverseIt(IntegerArray);
for I := Low(IntegerArray) to High(IntegerArray) do
Writeln('IntegerArray[', I, '] = ', IntegerArray[I]);
end;


So there is a quick intro to using parameterized types in Delphi for .NET. I'm sure you can see some of the interesting things you can do with this new found power. Another interesting fact about using generics is that in many cases the code is actually more compact and faster than if you'd tried to make a general purpose class using traditional OOP techniques. This is because the body of the ReverseIt procedure is compiled (at runtime in .NET, at compile-time for Win32) as if 'T' were actually declared as an 'Integer'. We could take the class I wrote above and use it to reverse an array of TPoint or TRect structures. Any type you declare or an intrinsic language type can be used to create a concrete version of TArrayReverser. If you have any questions about this pose, please post them in a comment.

Wednesday, September 12, 2007

Oh bugger! Comments are being moderated...

I know a lot of you have posted comments and then sent me email directly since your comment didn't show up... well it turns out that I'm still learning this new blog engine and comments are all moderated by default. I also noticed that the spam has already started :-(. So for now, I'll leave the moderation on. I'll try to moderate as often as I can. I'll also review all the commenting options and see if there are any settings that would allow comments to go unmoderated...

UPDATE:  Ok, I think I see what is going on.  Most comments will get posted immediately.  However if your post has 2 or more links in it, it goes into the moderation queue.  So if you post with a link to your blog and a link in the body of the comment, it gets moderated.  I've upped this to 3 or more links (may bump this up if I see more "normal" posts getting moderated).  This makes sure those silly SEO folks who like to post comments with a huge list of links will get moderated.  Nice feature!  Oh, and if you use profanity, it goes into moderation... I felt a little strange typing in all those words you're mother would wash your mouth out with soap if you ever said them.

Tuesday, September 11, 2007

IDE Autosave and recovery

I mentioned it in this post, which got me thinking about a new RAD Studio IDE feature; autosave. So what do you think, is some sort of time-based autosave feature something nice to have in a development tool?

Let's spec this thing out. I think you should be able to configure the idle delay, the "modified" state isn't cleared, and the original file isn't overwritten (nor are the __history backups cycled). So if you're working on MyUnit.pas, and it autosaves to a file called MyUnit.pas.autosave. Once you explicitly save the file, the MyUnit.pas.autosave file is deleted. Of course the normal __history backups are cycled and the working file is overwritten. So if certain random sub-atomic particles crash into your memory and cause a meltdown, you at least can recover the autosaves. When the IDE opens a file, it should detect that an orphaned autosave file is out there when you open a file and will ask if you want to recover the autosave by overwriting main working file. You should even be allowed to view the differences before you make that decision. How's that for really designing something on the fly? Anything else I missed? The compare tool should be pluggable/configurable, of course. Maybe allow you to enable this only for only certain files and file types?

New blogging server and engine...

So this is officially my first post on the live version of the Wordpress based blog-server. Lots of really cool new toys and features to play with. I'll miss some things like customizations and personal themes, but I'm promised that it's coming. No longer will I have to upload an image to my homepages.borland.com account and then reference them from my blog. This one allows me to upload attachments and images then inject the link. It even does thumbnailing for me.

OK, OK... so this stuff is expected these days for a blog engine... but after using, ancient by today's internet time, .Text, this is a welcome relief. Of course all of this won't make me any better of a blogger... I'll still suck as bad as I ever did as far as content and topics. :-) Oh and I just noticed.. the builtin post editor autosaves after a timeout... This sounds like an interesting IDE feature...

Friday, September 7, 2007

Steve Shaughnessy on Blackfish SQL and dbExpress4.

If you're doing any kind of database work with Delphi and/or C++Builder (I'd say that a very large number of you are), then you have got to start reading Steve Shaughnessy's blog. Yesterday he posted an excellent introduction to the new Blackfish SQL database with some interesing information about its lineage. Then today there is some great information on the new rich metadata support in the latest version of dbExpress coming in RAD Studio 2007. Be sure to post some comments on his blog and encourage him to keep posting more information!

Wednesday, September 5, 2007

Just in case...RAD Studio 2007

Just in case you missed all the other CodeGear bloggers, the press release, and the article on the CodeGear Developer Network... I'll go ahead an mention it here; CodeGear has announced RAD Studio 2007. If you've been waiting for the next studio release to upgrade, now is the time to do it. This release will also include Update 3 for the previously released Delphi and C++Builder 2007. If you already have Studio Level Software Assurance, you'll be getting this release automatically. If you still haven't signed up for SA, now is the time to do it. By maintaining your annual SA account, you are assured that you'll have all the latest versions and features available.

While RAD Studio 2007 includes many updates for the Delphi and C++Builder 2007 personalities, the main addition is the Delphi for .NET personality. This includes the new language feature, parameterized types (generics). This allows you to both create and consume generic data types as supported by the latest .NET 2.0 framework. And updated version of VCL for .NET is included with all the same level of support for Windows Vista. You can also create ASP.NET web applications.

Ok.. the commercial is over... :-)

Wednesday, August 29, 2007

Driven to Distraction...

Yeah, yeah, yeah... It's been a while since there has been any activity on this blog. Now that school has started, the summer is officially over and, hopefully, I'll be able to get back into a regular schedule. I had a very full schedule, both here at CodeGear and personally throughout the summer. I was able to take more time off throughout the summer than I've been able to in the past. Spent a week in Las Vegas at our vacation property, and a week in Hawaii (Kona) where my 16 year-old son, who is on the Scotts Valley High School Varisty Football (American) Team, played the Kealakehe Waveriders Varsity Team.

Earlier in the summer I broke down and bought my own XBox 360. I rationalized the purchase by telling myself that I'll work with some of the other Delphi team members and get Delphi for .NET working with the managed XNA framework... yeah.. yeah.. that's the ticket. The truth of the matter is that Forza Motorsports 2 was my real motivation. I bought the system and the MS Force Feedback Wheel. After playing for about a week... I determined that you can't just use the wheel just by sitting it on your lap.... I needed a racing chair! I researched online all the various different chairs and they all were either too expensive (nevermind the fact that my wife was already rolling her eyes at me...) or just cheap looking. So I decided to build my own. I went to Summit Racing and purchased a cheap racing seat and adjustable sliding tracks. Then headed down to the closest Home Depot and purchased some 2x3, 2x2s and some 5/8” (16mm) plywood. Here's what I built over a weekend:

This is after the first day I had the base completed and the seat mounted. The base consists of a plywood sheet with 2x2s framing underneath that provide rigidity and raise the bolt heads that attach the seat rails off the floor.



It took a bit of planning and thinking to get the wheel and pedal mounts the way I wanted. I knew I wanted the height of the wheel mount to be adjustable and to be able to swing out of the way to easily entering and exiting the seat. So I created a simple moveable parallelogram mounting table. I mounted the pedals onto an angled platform between the “legs” of the wheel mount table. Here's the final result:





To limit the movement of the table and to allow for adjustment, I just simply placed extra blocks of wood on the bottom. The uprights will wedge against them and rest there with with its own weight to hold it in place. Since these photos were taken, I've revised the mounting of the wheel by eliminating the need for the provided clamp (which is somewhat weak) and screwing two blocks of wood to the top of the mounting table that overlap the “wings” that stick out on the sides of the wheel housing.

Now, I've never really been what one would call a “hard-core” gamer as I've always preferred to take the game apart and figure out how it works rather than actually playing the game :-). However, with this setup, and the big screen TV I must say that the racing game experience is on a whole different level. So when we went to Las Vegas earlier this past summer we went to the NASCAR Cafe that has an excellent linear accelerated roller coaster called Speed-The Ride and the Cyber Speedway. The problem is that when we first started visiting Las Vegas regularly about 7 years ago, I remember being totally blown away at the level of “realism” that the motion controlled cars provided... that was late '90s technology. I must now say that short of having a true motion-controlled experience, the graphics and detail of the racing in Forza 2 makes the Cyber Speedway feel like playing the old Pong video game.

So if you've got an XBox 360, my gamertag is “kylix rd” I've yet to sign-up for a full Xbox Live “gold” membership, so once I do maybe we can race :-).

Ok, so it hasn't been all vacations, and fun and games... there actually has been some things going on here at CodeGear. If you haven't noticed there's been a lot of information regarding Highlander, the full release of what is was once the Borland Developer Studio now called RAD Studio 2007. The betablogging program is in full swing and there is a lot of excellent information flowing about this new release. I suppose for the folks that view the language as a critical piece of the product, you'll be pleased to know that we'll finally be releasing full support for generics on the Delphi for .NET side. The next RAD Studio release, codenamed Tiburón will include support for generics on Win32 as well as full support for Unicode throughout VCL, the RTL top to bottom, left to right. Nick has an excellent post with links to the various betabloggers out there. If you know of others that Nick hasn't mentioned, make sure you contact him with a link and he'll update his blog entry.

While not specifcally Delphi related, it is CodeGear related. We've just announced the availability of JGear. JGear allows you to purchase some of the excellent JBuilder specific “mojo” and integrate it into an existing Eclipse or Eclipse-derived 3.2 installation! You no longer have be a JBuilder customer to get access to the features that make JBuilder the most powerful IDE in the Java space. Of course if you want the best experience, you should consider moving to the new Eclipse derived JBuilder.

Over the coming months, I'll try and get back into the swing of more regular blogging by helping to chronicle our move to Unicode with VCL and the IDE itself. I plan on beginning to highlight some of the little gotchas and things to watch for now so you can be best prepared for the switch when Tiburón is released next year. As we encounter these things and solve them, there will be things you can do today to your code to make sure the transition is as smooth as possible. Details will follow throughout the coming months.

Another item of note is that supposedly we're due to move to some new mo-betta blogging software sometime soon... I'm hoping that it will go smoothly, but just in case there are a few burps, I hope you will be understanding while the dust settles.