Wednesday, November 7, 2007

Magical Assembler Incantations - Nested functions and anonymous methods

A few folks left some comments on my Life and Times of a Thread Pool post asking to comment on and explain what those little snippets of assembler do in the LocalParallelLoop() function in the TThreadPool class.  You can get the code to which I'm referring in this Code Central entry.

First a little background.  Object Pascal has long supported the notion of nesting functions and procedures within each other.  This is a very powerful way to better hide implementation details and making your code much easier to understand and maintain.  Many times a nested function is used to hide a deeply recursive function such as a quick sort  algorithm.  The nice thing about using a nested function is that it has full access to all the local variables of the surrounding procedure (provided they're declared prior to the nested function implementation).  So rather than having to define a peer-method and pass in all the values needed, a nested procedure need only take the parameters that may change between each invocation or are the result of an inline expression.

Let's peek under the hood at how the compiler accomplishes this.  On the x86 platform, there are several conventions that are used to store and access local variables.  This involves the use of the hardware stack and either the stack pointer (ESP) or the Base Pointer (EBP).  Direct stack indexing is used by the compiler when the function does little stack manipulation and usage, however in most cases, the compiler will generate a common sequence of instructions to establish a stack frame.  This usually consists of pushing the current Base Pointer (EBP) onto the stack, then loading the Base Pointer register (EBP) with the top of the stack (ESP).  Then the stack pointer is adjusted to "allocate" the local variables by decrementing the stack pointer register by the total size of the local variables.  Now to access the passed in parameters, you use positive indecies off the Base Pointer and then to access local variable negative indices are used.  Once you've established what the Base Pointer is, passed in parameters and local variables are at a predetermined (at compile-time) offset.

If you can get access to the Base Pointer for a given instance (or invocation) of a function, you can now easily get access to all the parameters and local variables.  This is exactly how a nested procedure grabs the value from it's outer procedure.  The compiler does not actually nest the body of the nested procedure into the middle of the outer procedure, but rather generates it as it's own, stand-alone procedure.  The only difference is that it expects a hidden parameter, which is always placed onto the stack prior to calling the procedure.  This hidden parameter is the Base Pointer from the outer procedure.

So now let's look at how this little magical incantation operates.  In LocalParallelLoop(), there is this little bit of assembler:

asm
  MOV EAX,[EBP]
  MOV LEBP,EAX
end;

What this little snippet does is to load the Base Pointer of the calling function.  If you recall from above, when creating a stack frame, the calling procedure's Base Pointer value is saved by pushing it onto the stack, and then the current procedure's Base Pointer is established by loading the current value of the stack pointer into the base pointer register (mov ebp,esp).  At a 0 (zero) offset from the location referenced by the currently executing procedure's base pointer is the caller's base pointer.  That is what the mov eax,[ebp] instruction does.  It loads this value into the eax register.  This value is then stored into a local variable called LEBP (the compiler knows how to generate the right instruction to address the LEBP variable on the stack).  Now that we have the caller's base pointer,  we can now call the function referenced by the LocalProc parameter.

This is where things can go horribly wrong and also where the compiler is really no help at all to you.  Notice that the LocalProc parameter is just a raw Pointer.  No type information, no type checking... In other words, you're drifting in "You had better get it right or you're screwed, land."  The caller just takes the address of the local procedure using the '@' operator which doesn't do any type checking.  The LocalParallelProc is taking it on faith that you've properly declared the local procedure to take a single Integer parameter and that it is using the default calling convention.

In this case, the default calling convention is to try to pass as many parameters as it can in CPU registers before pushing the remaining ones onto the stack.  This is the 'register' calling convention and is the default.  In this case only one parameter will be passed in a register, the index.  Recall from above that for a nested procedure, the caller's base pointer is always pushed onto the stack.  Since we have that value in LEBP, it is a simple matter to now make the call.  That is what the following code does:

asm
  MOV EAX,Index
  MOV EDX,LEBP
  PUSH EDX
  MOV ECX, LocalProc
  CALL ECX
  POP EDX
end;

What this little bit of code does is to load up the EAX register with the Index which is the parameter it is passing into the LocalProc, get the LEBP value and push it onto the stack, and then finally, get the address of the LocalProc and perform an indirect call.  Upon return, the stack needs to be cleanup from pushing the base pointer so the pop edx instruction handles that task.

As you can see there is no checking to make sure that LocalProc points to a properly declared procedure, or even that it points to code at all!  Another little problem is that you should never, ever, retain the address of the LocalProc any place other than in a parameter or another local variable.  Without the proper base pointer also passed along, you can risk corrupting memory at worse, or crashing your application at best.  Allowing the LocalProc parameter to live past the lifetime of this function is a clear no-no.  Even though the LocalParallelFor() function does lots of things and even calls back to the caller's local function as long as the logical thread of executing remains in this function, LocalProc and LEBP are valid.  In this case , LocalParallelFor() doesn't return until all the work is completed which ensures that the caller's frame remains on the stack and intact.

What about anonymous methods?  In .NET, C# has the notion of anonymous methods which are simply inline-declared delegates which contain little snippets of code.  Basically, you pass the code as a parameter to a method that takes a delegate.  Under the hood, the compiler generates a method, gives it some made-up name (to satisfy the CLR).  Since .NET is a garbage-collected runtime environment, they are more free to play fast-and-loose with the calling functions local variables.  In this case those variables are "captured" and copied [EDIT: clarification in the comments from Barry Kelly, fellow CodeGear colleague].  Then a reference to this copy these variables is passed along.  Once all the delegates that refer to this copy are released, the captured copies variables are then released.  Simple. [not really because that is a very naive and over simplification way of looking at it, but the point is that the surrounding run-time helps a lot]

For the unmanaged, natively compiled space, things are a little more complicated since not all data types are managed.  Sure, things like strings, variants, interfaces and dynamic arrays are properly reference counted and destroyed, most other data-types require the user to explicitly manage the instance.  You simply cannot "capture" the caller's stack frame and hope that it remains valid for as long as you reference it.  If, and only if, the usage of a calling procedure's stack frame were purely limited to the lifetime of the currently executing procedure, then it is safe and there is no need to "capture" the caller's frame.  In most cases, this is actually what happens.  It is those little fringe cases that can really cause the headaches and mysterious bugs to show up.

It turns out that the next version of the C++ standard will introduce what is essentially anonymous functions they're calling "lambdas."  They're also trying to address the "capture" problem by allowing the user to specify how the locals are to be handled.  Were Delphi to get anonymous methods/functions on the native-compiled side, it is very likely that we'd follow the very closely the work being done on the next C++ standard as a guideline.

11 comments:

  1. Hmm, you just gave me more to add to my own multithreading projects. ;)

    ReplyDelete
  2. One nit: C# anonymous methods capture variables, not values, so the values aren't copied. This can cause confusion:

    Action<int> a = null;
    for (int i = 0; i < 10; ++i)
    {
    if (i == 0)
    a = delegate
    { Console.WriteLine(i); };
    }
    a(0); // will print 10, not 0, the value that i had when it was captured.

    On the other hand, this prevents a possibly confusing difference between how reference types and value types would be handled - copying a value of a value type would result in capturing 0 in the example above, but of course the values of reference type instances can't be generally copied.

    (Second try; quoting angle brackets.)

    ReplyDelete
  3. Very interesting post, Allen!

    >> Were Delphi to get anonymous methods/functions on the native-compiled side, it is very likely that we’d follow the very closely the work being done on the next C++ standard as a guideline.
    That's the right direction.

    ReplyDelete
  4. So essentially, the variable LEBP stores the address of TLifeForm.TLifeThread.Execute. Then from LocalProc, all the private data of LifeThread can be accessed, right?

    ReplyDelete
  5. Wuping,

    LEBP stores that address of that functions *stack frame*. The LocalProc parameter is the actually address of the procedure itself.

    Allen.

    ReplyDelete
  6. Allen, Thank you! You are right!! Actually what i mean the stack frame adress.

    ReplyDelete
  7. Thanks for the post, Allen. The last ones are very good ones. Unfortunately I don’t have much time now to give you more feedback. But let’s go to the subject. Yes, ‘lambdas’ are a very useful feature for Delphi - I filled some time ago a QC with an implementation approach (#49996) something a little bit similar with what C ISO Guys try to do (even if I didn't had time to look so much at their papers - but if you're interested perhaps I'll 'drop an eye' on them). Basically, I think that it's best to tell the users that anonymous functions can be seen as a 'syntactic sugar', a quick, concise and clear way to declare a certain kind of objects. Another thing to mention here is that this, imho, should make pair with an user-controlled garbage collecting engine (details in QC #52966 and somewhere in .non-tech few weeks ago, IIRC - if something isn't clear please feel free to contact me). From our (ie. humble mortal developers) POV, the lambdas will be a very very sweet relief (imho) when it comes, at least, to do multi-threading programming - as you were forced to do also - in fact your approach was a 'poor-man' way to pass an 'anonymous code' to another structure. One more thing to mention: Your way of thinking is quite similar with Barry's, AFAIS, from his (very old) posts on his blog on the 'anonymous' theme. As a small recommendation, perhaps you can treat me as a 'dumb' (as, in fact, I am) and do a more detailed post about lambdas (what they are, usability, Pascal-like syntax aso.) and ask the folks for feedback, perhaps more brains will have more ways of thinking an perhaps that a different light will shine.

    ReplyDelete
  8. Me again... A better approach to anonymous functions it would be if you'll use the records with methods as internal representation for 'stack frame'. Another thing to note is the way of passing by reference/by value. In 'by reference' way, now I'm think (from top of my head) that the internal record will have a pointer (let's say PInteger) instead of the actual value copied. Imho, this is the expected behavior, for a developer POV.

    ReplyDelete
  9. I was one of the people who asked for more info about the assembler snippets - so, thankyou!

    It's very interesting, I had no idea about a lot of that.

    ReplyDelete
  10. Mark Andrews (The Other One)November 11, 2007 at 6:44 PM

    Allen, Barry Kelly's blog link should go to http://barrkel.blogspot.com/. (Note the extra "r".)

    ReplyDelete

Please keep your comments related to the post on which you are commenting. No spam, personal attacks, or general nastiness. I will be watching and will delete comments I find irrelevant, offensive and unnecessary.