Wednesday, November 21, 2007

Placing your code in the forge - Refining a technique

While preparing for this posting, I had the chance to review the code in my Parallel.pas unit.  I was trying to get my implementation of TNestedTask<T> to work properly with nested functions.  The problem I had is that with Win32 generics, you cannot have BASM (Built-in ASseMbler) blocks in the bodies of the methods in a parameterized type.  So I needed to find a way to call a function pointer that can take a type parameter as the result type.  The thing is that depending upon the result type, the calling code has to handle it very differently.  For instance, if the result type were a floating point value, the result would appear on the top of the floating point stack.  If it were a structure > 4 bytes in size, the caller has to pass in a hidden pointer parameter that tells the function where to place the result.  And finally, for simple ordinal types they're returned in a CPU register (EAX).  Even if the compiler were to allow assembler instructions in the bodies of methods in a parameterized type, there is no way for that code to properly "morph" into the correct sequence for handling all the various types of return types.

The compiler doesn't currently (aside from some internal research I've been doing in the compiler) support the notion of a nested procedure pointer, but it does support procedure pointers for any global procedure or function.  My solution was to create three little assembler functions to serve as helpers.  The Delphi compiler is smart enough to forgo the creation of a stack frame for most very simple procedures and functions.  This is especially true for any procedure or function that is nothing more than a pure block of assembly code.  Here's the little bits of code:

function GetFrame: Pointer;
asm
MOV EAX,[EBP]
end;

procedure PushFrame(AEBP: Pointer);
asm
XCHG EAX,[ESP]
PUSH EAX

end;

function PopFrame: Pointer;
asm
POP EDX
POP EAX
PUSH EDX

end;

The first function, GetFrame, will return the calling function's frame.  You should never call another function that calls this function if you're expecting to get the right results.  The other two functions are there for calling the nested procedure/function.  PushFrame takes the frame value stored off from a previous call to GetFrame and injects it onto the stack.  Since this is a function call, we have to swap it with the current top of the stack which is the return address for this function.  Then it is returned to the stack where the "RET" instruction that is generated by the compiler can go back to the right location.  PopFrame just undoes what PushFrame did.


Here's a little bit of code that demonstrates how to use them to call a nested proc:


type
TNestedFunction = function: Double;
TCalculatorFunction = (cAdd, cSubtract, cMultiply, cDivide);

function CallNestedFunction(ANestedFunction: TNestedFunction): Double;
var
LEBP: Pointer;
begin
LEBP := GetFrame;
PushFrame(LEBP);
Result := ANestedFunction;
PopFrame;
end;

procedure CalcFunction(Left, Right: Double; CalculatorFunction: TCalculatorFunction);

function Add: Double;
begin
Result := Left + Right;
end;

function Subtract: Double;
begin
Result := Left - Right;
end;

function Multiply: Double;
begin
Result := Left * Right;
end;

function Divide: Double;
begin
Result := Left / Right;
end;

var
Result: Double;
NestedFunc: TNestedFunction;
begin
case CalculatorFunction of
cAdd: NestedFunc := @Add;
cSubtract: NestedFunc := @Subtract;
cMultiply: NestedFunc := @Multiply;
cDivide: NestedFunc := @Divide;
end;
Result := CallNestedFunction(NestedFunc);
Writeln(Result);
end;

By using the assembler functions above, I can make CallNestedFunction a parameterized function like this:


type
TNestedFunction<T> = function: T;
TCalcClass = class
class function CallNestedFunction<T>(ANestedFunction: TNestedFunction<T>): T; static;
end;

class function TCalcClass.CallNestedFunction<T>(ANestedFunction: TNestedFunction<T>): T;
var
LEBP: Pointer;
begin
LEBP := GetFrame;
PushFrame(LEBP);
Result := ANestedFunction;
PopFrame;
end;

... Same as above ...

var
Result: Double;
NestedFunc: TNestedFunction<Double>;
begin
case CalculatorFunction of
cAdd: NestedFunc := @Add;
cSubtract: NestedFunc := @Subtract;
cMultiply: NestedFunc := @Multiply;
cDivide: NestedFunc := @Divide;
end;
Result := CallNestedFunction<Double>(NestedFunc);
Writeln(Result);
end;

In this way, I simply let the compiler figure out the right way to call through the procedure/function pointer and all those assembler functions do is to make sure the calling frame reference is on the stack.  Before someone asks, you should not wrap the PushFrame, call, PopFrame sequence into a try..finally block (I know it does kind of look like that should be done).  First of all, it will mess up the stack because exception frames use the stack to keep them linked together.  Secondly, it is wholly unnecessary.  If an exception occurs in the call to the nested proc, the system will unwind the stack and make sure each finally block and except block has the proper local frame.  How that all works is a rather complicated topic that would take up a lot of posts.  If you look at how the compiler generates code for a call to a nested procedure, it doesn't try to do any kind of exception wrapping either.  The extra frame value on the stack will be properly cleaned up.

When code lies - A better solution

Many of you wretched and guffawed at my silly post about Stupid Enumerator Tricks. You know what? I agree. It was a hideous misuse of a feature. One comment from Joe White that stood out was the notion that the code was lying. He is right. I was using a a side-effect as the primary feature, not what the code was actually saying. However, from casually observing the code in question, it could easily be misconstrued. The code was lying.

How would the maintainer of this code figure out the intent a year or more from now? What if I were the maintainer? How many times have you looked at a chunk of code you wrote a long time ago and about all you remember about it was "how utterly awesome and clever you were?" But now, you don't have the faintest idea what it does and how it works. Sure, you could have put comments in, but what if they were wrong and misleading? (this is where I like to say that no comment is far better than a wrong or bad comment). After staring at the code for what seems like hours... you suddenly have a forehead-slapping moment and exclaim, "What was I thinking!!"

But is there a better method of handling the automatic cleanup of this task? Another commenter, Jolyon Smith suggested the use of interfaces as a way to automatically manage the lifetime of an object. I, too, have used this little trick as well. Let's look at my TTask, TTask<T> and TReplicableTask implementation. Here's the declaration of the classes and the interfaces. Yes, I'm using Win32 generics... because, well... they work (for the most part) on my machine :).

  type
TFunctionEvent<T> = function (Sender: TObject): T of object;

ITask = interface
procedure Wait;
procedure Cancel;
function GetIsComplete: Boolean;

property IsComplete: Boolean read GetIsComplete;
end;

ITask<T> = interface(ITask)
function GetValue: T;

property Value: T read GetValue;
end;

EOperationCanceled = class(Exception);

TTask = class(TInterfacedObject, ITask)
private
FDoneEvents: THandleObjectArray;
FCanceled: Boolean;
FException: TObject;
protected
FEvent: TNotifyEvent;
function GetIsComplete: Boolean;
procedure WorkEvent(Sender: TObject);
procedure QueueEvents(Sender: TObject); virtual;
procedure Wait;
procedure Cancel;
property IsComplete: Boolean read GetIsComplete;
public
constructor Create(Sender: TObject; Event: TNotifyEvent);
destructor Destroy; override;
procedure CheckCanceled;
end;

TTask<T> = class(TTask, ITask<T>)
private
FEvent: TFunctionEvent<T>;
FResult: T;
procedure RunEvent(Sender: TObject);
function GetValue: T;
property Value: T read GetValue;
public
constructor Create(Sender: TObject; Event: TFunctionEvent<T>);
end;

TReplicableTask = class(TTask)
protected
procedure QueueEvents(Sender: TObject); override;
end;

So now the usage is like this:


procedure TForm1.Button1Click(Sender: TObject);
var
IntTask: ITask<Integer>;
begin
IntTask := TTask<Integer>.Create(Self, CalcResult);
Caption := Format('Result = %d', [IntTask.Value]);
end;

function TForm1.CalcResult(Sender: TObject): Integer;
var
I: Integer;
begin
Result := 1;
for I := 2 to 9 do
Result := Result * I;
end;

Much cleaner, don't you think? And remember, even if the task isn't complete yet, the call to IntTask.Value does an implicit wait until the task has obtained the value. By making the methods private or protected (or, even better, strict private and protected) on the TTask and TTask<T> classes, you can be more assured that someone doesn't decide to use the object directly. You must obtain the interface in order to use the class instance.


Update: I just updated this Code Central entry with a new Parallel.pas unit that demonstrates some of the above, along with a new TTask, TReplicableTask, and TNestedReplicableTask objects along with requisite interfaces.  A compiled version of the Life demo was also included per some requests.  Finally, the Parallel.pas unit demonstrates a new technique for calling nested procedures without having to litter your code with a bunch of assembler blocks.  I'll post another blog entry describing it.

Wednesday, November 14, 2007

Automated Testing - Learn it, Live it, Love it

 

Steve Trefethen may be an ex-CodeGear employee, but he is still adding significant value to the developer community as a whole.  Looks like he's started a series of posts where he'll talking about test automation and the advantages it gives you.  Sure, everyone knows about unit-testing (at least you should), but Steve is talking about a much more holistic approach to automated testing.  Here at CodeGear we have a significant internal investment in automated testing tools, especially for Delphi.  So much so that we have several patents related to the technology.  There are also a lot of excellent automated testing tools out there, such as TestComplete by AutomatedQA.  I'm sure there will be some excellent information as he delivers the blog postings.

Monday, November 12, 2007

Stupid Enumerator Tricks - And now for something completely different

Sometimes I get admittedly kooky, crazy ideas.  While playing with Tasks, ParallelFor, and Thread Pooling, a rather interesting thought popped into my head.  I've been pouring over the very interesting ideas and techniques outlined in this article on MSDN that is referring to the Parallel FX library that is in the works for the .NET framework.  One thing I've been looking at is how to make something like this applicable to natively compiled code in Delphi for Win32.  While I understand that .NET has a lot of great features and that a managed runtime environment has some interesting advantages, right now I'm all about researching how to take native Win32 code and Delphi to a new level.

What does this all have to do with Enumerators and Stupid tricks.  Well... it turns out that in Win32, the for..in language construct is the only case where the compiler will actually properly manage the lifetime of a class.  If you're curious to what I mean, make sure you look into this excellent rundown on how to write them.  It you look at some of the code in the referenced MSDN article (ignoring, for now the use of anonymous methods) there is this bit of code:

static void For( int from, int to, Action<int> body )
{
int index = from;
ReplicableTask rtask = new ReplicableTask( delegate {
int i;
while ((i = Interlocked.Increment(ref index)-1) < to) {
body(i);
}
});
rtask.Wait();
}

Since .NET is a nice garbage collected environment, there is no need to explicitly worry about freeing the "rtask" variable.  It will be collected for you.  However in the cold-cruel world of natively compiled code, you are afforded no such luxury*... sort of.  Ignoring that whole anonymous method thing, but using the nested procedure trick I mentioned here, here's a potential implementation in Delphi for Win32:



procedure ParallelFor(AFrom, ATo: Integer; AEvent: TIteratorEvent);
var
RTask: TReplicableTask;
Index: Integer;

procedure ExecuteEvent;
var
I: Integer;
begin
I := InterlockedIncrement(Index) - 1;
while I < ATo do
begin
AEvent(I);
I := InterlockedIncrement(Index) - 1;
end;
end;

begin
Index := AFrom;
RTask := TReplicableTask.Create(@ExecuteEvent);
try
RTask.Wait;
finally
RTask.Free;
end;
end;

But what if we could eliminate writing that whole try...finally thing?  Here's where it gets a little out there and on the edge :).  What if, instead we write this:



procedure ParallelFor(AFrom, ATo: Integer; AEvent: TIteratorEvent);
var
RTask: TReplicableTask;
Index: Integer;

procedure ExecuteEvent;
var
I: Integer;
begin
I := InterlockedIncrement(Index) - 1;
while I < ATo do
begin
AEvent(I);
I := InterlockedIncrement(Index) - 1;
end;
end;

begin
Index := AFrom;
for RTask in TReplicableTask.Create(@ExecuteEvent)do
RTask.Wait;
end;

So how does this work?  If you read up on enumerators on Primoz' and Hallvard's blogs, you'll know that as long as the TReplicableTask object has a method called "GetEnumerator" that returns an enumerator with the Current property of type TReplicableTask.  It will also automatically embed a try..finally block into the code and properly destroy the enumerator object for you.  Like I mentioned above, this is the only place where the compiler does this for you.  So if we added a GetEnumerator method to TReplicableTask that looked something like this:



type
TReplicableTask = class;
TTaskDestructor = class
private
FTask: TReplicableTask;
FMoved: Integer;
public
constructor Create(ATask: TReplicableTask);
destructor Destroy; override;
function GetCurrent: TAutoObject; inline;
function MoveNext: Boolean; inline;
property Current: TAutoObject read GetCurrent;
end;

TReplicableTask = class
public
...
function GetEnumerator: TObjectDestructor;
...
end;

{ TTaskDestructor }
constructor TTaskDestructor.Create(ATask: TReplicable);
begin
inherited Create;
FTask := ATask;
FMoved := -1;
end;

destructor TTaskDestructor.Destroy;
begin
FTask.Free;
inherited;
end;

function TTaskDestructor.GetCurrent: TAutoObject;
begin
Result := FTask;
end;

function TTaskDestructor.MoveNext: Boolean;
begin
Result := InterlockedIncrement(FMoved) > 0;
end;

function TReplicableTask.GetEnumerator: TTaskDestructor;
begin
Result := TTaskDestructor.Create(Self);
end;


So there you go...  An enumerator that is not really an enumerator, but actually a poor-man's implementation of an auto-destroyed object.  Yep, a Stupid Enumerator Trick of the highest order :).  So... what if we really did have anonymous methods in some future version of the compiler?  What would the above code look like then?  This is pure speculation, but I'd imagine it might look something like this:



procedure ParallelFor(AFrom, ATo: Integer; AEvent: TIteratorEvent);
var
RTask: TReplicableTask;
Index: Integer;
begin
Index := AFrom;
for RTask in TReplicableTask.Create(
procedure
var
I: Integer;
begin
I := InterlockedIncrement(Index) - 1;
while I < ATo do
begin
AEvent(I);
I := InterlockedIncrement(Index) - 1;
end;
end
) do
RTask.Wait;
end;


And, just to remind you... I'm talking natively compiled code here :).  Let the rampant speculation begin....


 


*Some would argue that this is a "good thing," but that is a can-of-worms I will refuse to open at this point :)

Wednesday, November 7, 2007

Magical Assembler Incantations - Nested functions and anonymous methods

A few folks left some comments on my Life and Times of a Thread Pool post asking to comment on and explain what those little snippets of assembler do in the LocalParallelLoop() function in the TThreadPool class.  You can get the code to which I'm referring in this Code Central entry.

First a little background.  Object Pascal has long supported the notion of nesting functions and procedures within each other.  This is a very powerful way to better hide implementation details and making your code much easier to understand and maintain.  Many times a nested function is used to hide a deeply recursive function such as a quick sort  algorithm.  The nice thing about using a nested function is that it has full access to all the local variables of the surrounding procedure (provided they're declared prior to the nested function implementation).  So rather than having to define a peer-method and pass in all the values needed, a nested procedure need only take the parameters that may change between each invocation or are the result of an inline expression.

Let's peek under the hood at how the compiler accomplishes this.  On the x86 platform, there are several conventions that are used to store and access local variables.  This involves the use of the hardware stack and either the stack pointer (ESP) or the Base Pointer (EBP).  Direct stack indexing is used by the compiler when the function does little stack manipulation and usage, however in most cases, the compiler will generate a common sequence of instructions to establish a stack frame.  This usually consists of pushing the current Base Pointer (EBP) onto the stack, then loading the Base Pointer register (EBP) with the top of the stack (ESP).  Then the stack pointer is adjusted to "allocate" the local variables by decrementing the stack pointer register by the total size of the local variables.  Now to access the passed in parameters, you use positive indecies off the Base Pointer and then to access local variable negative indices are used.  Once you've established what the Base Pointer is, passed in parameters and local variables are at a predetermined (at compile-time) offset.

If you can get access to the Base Pointer for a given instance (or invocation) of a function, you can now easily get access to all the parameters and local variables.  This is exactly how a nested procedure grabs the value from it's outer procedure.  The compiler does not actually nest the body of the nested procedure into the middle of the outer procedure, but rather generates it as it's own, stand-alone procedure.  The only difference is that it expects a hidden parameter, which is always placed onto the stack prior to calling the procedure.  This hidden parameter is the Base Pointer from the outer procedure.

So now let's look at how this little magical incantation operates.  In LocalParallelLoop(), there is this little bit of assembler:

asm
  MOV EAX,[EBP]
  MOV LEBP,EAX
end;

What this little snippet does is to load the Base Pointer of the calling function.  If you recall from above, when creating a stack frame, the calling procedure's Base Pointer value is saved by pushing it onto the stack, and then the current procedure's Base Pointer is established by loading the current value of the stack pointer into the base pointer register (mov ebp,esp).  At a 0 (zero) offset from the location referenced by the currently executing procedure's base pointer is the caller's base pointer.  That is what the mov eax,[ebp] instruction does.  It loads this value into the eax register.  This value is then stored into a local variable called LEBP (the compiler knows how to generate the right instruction to address the LEBP variable on the stack).  Now that we have the caller's base pointer,  we can now call the function referenced by the LocalProc parameter.

This is where things can go horribly wrong and also where the compiler is really no help at all to you.  Notice that the LocalProc parameter is just a raw Pointer.  No type information, no type checking... In other words, you're drifting in "You had better get it right or you're screwed, land."  The caller just takes the address of the local procedure using the '@' operator which doesn't do any type checking.  The LocalParallelProc is taking it on faith that you've properly declared the local procedure to take a single Integer parameter and that it is using the default calling convention.

In this case, the default calling convention is to try to pass as many parameters as it can in CPU registers before pushing the remaining ones onto the stack.  This is the 'register' calling convention and is the default.  In this case only one parameter will be passed in a register, the index.  Recall from above that for a nested procedure, the caller's base pointer is always pushed onto the stack.  Since we have that value in LEBP, it is a simple matter to now make the call.  That is what the following code does:

asm
  MOV EAX,Index
  MOV EDX,LEBP
  PUSH EDX
  MOV ECX, LocalProc
  CALL ECX
  POP EDX
end;

What this little bit of code does is to load up the EAX register with the Index which is the parameter it is passing into the LocalProc, get the LEBP value and push it onto the stack, and then finally, get the address of the LocalProc and perform an indirect call.  Upon return, the stack needs to be cleanup from pushing the base pointer so the pop edx instruction handles that task.

As you can see there is no checking to make sure that LocalProc points to a properly declared procedure, or even that it points to code at all!  Another little problem is that you should never, ever, retain the address of the LocalProc any place other than in a parameter or another local variable.  Without the proper base pointer also passed along, you can risk corrupting memory at worse, or crashing your application at best.  Allowing the LocalProc parameter to live past the lifetime of this function is a clear no-no.  Even though the LocalParallelFor() function does lots of things and even calls back to the caller's local function as long as the logical thread of executing remains in this function, LocalProc and LEBP are valid.  In this case , LocalParallelFor() doesn't return until all the work is completed which ensures that the caller's frame remains on the stack and intact.

What about anonymous methods?  In .NET, C# has the notion of anonymous methods which are simply inline-declared delegates which contain little snippets of code.  Basically, you pass the code as a parameter to a method that takes a delegate.  Under the hood, the compiler generates a method, gives it some made-up name (to satisfy the CLR).  Since .NET is a garbage-collected runtime environment, they are more free to play fast-and-loose with the calling functions local variables.  In this case those variables are "captured" and copied [EDIT: clarification in the comments from Barry Kelly, fellow CodeGear colleague].  Then a reference to this copy these variables is passed along.  Once all the delegates that refer to this copy are released, the captured copies variables are then released.  Simple. [not really because that is a very naive and over simplification way of looking at it, but the point is that the surrounding run-time helps a lot]

For the unmanaged, natively compiled space, things are a little more complicated since not all data types are managed.  Sure, things like strings, variants, interfaces and dynamic arrays are properly reference counted and destroyed, most other data-types require the user to explicitly manage the instance.  You simply cannot "capture" the caller's stack frame and hope that it remains valid for as long as you reference it.  If, and only if, the usage of a calling procedure's stack frame were purely limited to the lifetime of the currently executing procedure, then it is safe and there is no need to "capture" the caller's frame.  In most cases, this is actually what happens.  It is those little fringe cases that can really cause the headaches and mysterious bugs to show up.

It turns out that the next version of the C++ standard will introduce what is essentially anonymous functions they're calling "lambdas."  They're also trying to address the "capture" problem by allowing the user to specify how the locals are to be handled.  Were Delphi to get anonymous methods/functions on the native-compiled side, it is very likely that we'd follow the very closely the work being done on the next C++ standard as a guideline.

Monday, November 5, 2007

Delphi, Adelphi, & Flickr

Back in March of this year, you may recall that I'd traveled throughout Australia and parts of Asia.  While in Australia I visited Sydney, Melbourne and Brisbane.  While in Melbourne I stayed at a rather eclectic hotel with the rather apropos name, Adelphi. It was very much a "bohemian" place to stay.  So much so that I had to take a few photos of the room.  Throughout the trip, I'd uploaded some of these photos to my Flickr account.  Thinking of it no further.

Then on October 24th, I got a rather odd email through my Flickr account.  Normally I'd have disregarded it as spam since I didn't know the sender, but since it was specifically through my Flickr account I looked into it.  It was from someone associated with a site called Schmap.  I manually navigated to the site and looked around.  Looked legit.  Apparently they troll through Flickr and other photo sites for people's vacation photos associated with many of the highlighted locations on their site.  They contact these folks asking for permission to link to their photo and to use it in their online promotions.  I read through the online agreement and it seemed reasonable.  I retained all my rights to the photo, and I only grant them specific rights to use the photo.  When the photo is displayed on their site it links back to my Flickr account and proper attribution is given.  It was only one photo of the room I stayed in at the Adelphi hotel, so I figured it couldn't hurt.

Last week, I got word that the photo had been selected and was live on their site.  Here's the link that contains the photo.  You may need to scroll through the photos on the right, but mine should be there :).  It looks like their site is a just a clever mash-up using Google maps, links to Priceline, and Google adwords.

Friday, November 2, 2007

The Life and Times of a Thread Pool

As promised in this post, I've uploaded a demo project to Code Central. This is an implementation of Conway's Game of Life using a thread pool and a parallel loop. It uses a 500x500 wrapping grid in which only a portion is displayed in the UI. No calculations of the next generation are done in the UI thread in either parallel or serial mode. While the implementation of the life algorithm is interesting, that is not the intent of this demo. The Parallel.pas unit is the main focus of this demo. This unit contains the thread pool and a couple of parallel for loop implementations. One implementation takes a method pointer as a callback and the other uses an old trick where you can pass in the address of a nested procedure. This is the technique used by default, but a simple modification will allow it to use the other version. This was developed using Delphi 2007+ (internal developer build). It should work fine with shipping D2007. It is also Win32 only.
Here's a simple screen-shot:

You can check and uncheck the "Parallel" check box while the application is processing generations to see the effect of going parallel. For those of you with single core systems, you will not see much improvement by checking or unchecking Parallel.
For more information on Conway's life and information of various other algorithms here's a few links:
http://psoup.math.wisc.edu/mcell/ca_files_formats.html
http://www.radicaleye.com/lifepage/
Finally, here is a very good implementation of Life written in Delphi:
http://psoup.math.wisc.edu/Life32.html