Wednesday, November 7, 2007

Magical Assembler Incantations - Nested functions and anonymous methods

A few folks left some comments on my Life and Times of a Thread Pool post asking to comment on and explain what those little snippets of assembler do in the LocalParallelLoop() function in the TThreadPool class.  You can get the code to which I'm referring in this Code Central entry.

First a little background.  Object Pascal has long supported the notion of nesting functions and procedures within each other.  This is a very powerful way to better hide implementation details and making your code much easier to understand and maintain.  Many times a nested function is used to hide a deeply recursive function such as a quick sort  algorithm.  The nice thing about using a nested function is that it has full access to all the local variables of the surrounding procedure (provided they're declared prior to the nested function implementation).  So rather than having to define a peer-method and pass in all the values needed, a nested procedure need only take the parameters that may change between each invocation or are the result of an inline expression.

Let's peek under the hood at how the compiler accomplishes this.  On the x86 platform, there are several conventions that are used to store and access local variables.  This involves the use of the hardware stack and either the stack pointer (ESP) or the Base Pointer (EBP).  Direct stack indexing is used by the compiler when the function does little stack manipulation and usage, however in most cases, the compiler will generate a common sequence of instructions to establish a stack frame.  This usually consists of pushing the current Base Pointer (EBP) onto the stack, then loading the Base Pointer register (EBP) with the top of the stack (ESP).  Then the stack pointer is adjusted to "allocate" the local variables by decrementing the stack pointer register by the total size of the local variables.  Now to access the passed in parameters, you use positive indecies off the Base Pointer and then to access local variable negative indices are used.  Once you've established what the Base Pointer is, passed in parameters and local variables are at a predetermined (at compile-time) offset.

If you can get access to the Base Pointer for a given instance (or invocation) of a function, you can now easily get access to all the parameters and local variables.  This is exactly how a nested procedure grabs the value from it's outer procedure.  The compiler does not actually nest the body of the nested procedure into the middle of the outer procedure, but rather generates it as it's own, stand-alone procedure.  The only difference is that it expects a hidden parameter, which is always placed onto the stack prior to calling the procedure.  This hidden parameter is the Base Pointer from the outer procedure.

So now let's look at how this little magical incantation operates.  In LocalParallelLoop(), there is this little bit of assembler:


What this little snippet does is to load the Base Pointer of the calling function.  If you recall from above, when creating a stack frame, the calling procedure's Base Pointer value is saved by pushing it onto the stack, and then the current procedure's Base Pointer is established by loading the current value of the stack pointer into the base pointer register (mov ebp,esp).  At a 0 (zero) offset from the location referenced by the currently executing procedure's base pointer is the caller's base pointer.  That is what the mov eax,[ebp] instruction does.  It loads this value into the eax register.  This value is then stored into a local variable called LEBP (the compiler knows how to generate the right instruction to address the LEBP variable on the stack).  Now that we have the caller's base pointer,  we can now call the function referenced by the LocalProc parameter.

This is where things can go horribly wrong and also where the compiler is really no help at all to you.  Notice that the LocalProc parameter is just a raw Pointer.  No type information, no type checking... In other words, you're drifting in "You had better get it right or you're screwed, land."  The caller just takes the address of the local procedure using the '@' operator which doesn't do any type checking.  The LocalParallelProc is taking it on faith that you've properly declared the local procedure to take a single Integer parameter and that it is using the default calling convention.

In this case, the default calling convention is to try to pass as many parameters as it can in CPU registers before pushing the remaining ones onto the stack.  This is the 'register' calling convention and is the default.  In this case only one parameter will be passed in a register, the index.  Recall from above that for a nested procedure, the caller's base pointer is always pushed onto the stack.  Since we have that value in LEBP, it is a simple matter to now make the call.  That is what the following code does:

  MOV EAX,Index
  MOV ECX, LocalProc

What this little bit of code does is to load up the EAX register with the Index which is the parameter it is passing into the LocalProc, get the LEBP value and push it onto the stack, and then finally, get the address of the LocalProc and perform an indirect call.  Upon return, the stack needs to be cleanup from pushing the base pointer so the pop edx instruction handles that task.

As you can see there is no checking to make sure that LocalProc points to a properly declared procedure, or even that it points to code at all!  Another little problem is that you should never, ever, retain the address of the LocalProc any place other than in a parameter or another local variable.  Without the proper base pointer also passed along, you can risk corrupting memory at worse, or crashing your application at best.  Allowing the LocalProc parameter to live past the lifetime of this function is a clear no-no.  Even though the LocalParallelFor() function does lots of things and even calls back to the caller's local function as long as the logical thread of executing remains in this function, LocalProc and LEBP are valid.  In this case , LocalParallelFor() doesn't return until all the work is completed which ensures that the caller's frame remains on the stack and intact.

What about anonymous methods?  In .NET, C# has the notion of anonymous methods which are simply inline-declared delegates which contain little snippets of code.  Basically, you pass the code as a parameter to a method that takes a delegate.  Under the hood, the compiler generates a method, gives it some made-up name (to satisfy the CLR).  Since .NET is a garbage-collected runtime environment, they are more free to play fast-and-loose with the calling functions local variables.  In this case those variables are "captured" and copied [EDIT: clarification in the comments from Barry Kelly, fellow CodeGear colleague].  Then a reference to this copy these variables is passed along.  Once all the delegates that refer to this copy are released, the captured copies variables are then released.  Simple. [not really because that is a very naive and over simplification way of looking at it, but the point is that the surrounding run-time helps a lot]

For the unmanaged, natively compiled space, things are a little more complicated since not all data types are managed.  Sure, things like strings, variants, interfaces and dynamic arrays are properly reference counted and destroyed, most other data-types require the user to explicitly manage the instance.  You simply cannot "capture" the caller's stack frame and hope that it remains valid for as long as you reference it.  If, and only if, the usage of a calling procedure's stack frame were purely limited to the lifetime of the currently executing procedure, then it is safe and there is no need to "capture" the caller's frame.  In most cases, this is actually what happens.  It is those little fringe cases that can really cause the headaches and mysterious bugs to show up.

It turns out that the next version of the C++ standard will introduce what is essentially anonymous functions they're calling "lambdas."  They're also trying to address the "capture" problem by allowing the user to specify how the locals are to be handled.  Were Delphi to get anonymous methods/functions on the native-compiled side, it is very likely that we'd follow the very closely the work being done on the next C++ standard as a guideline.