Friday, June 1, 2007

A look at "array of const" for fun and profit

Back when Delphi was being developed we began to lament the lack of a function that would be able to format a string with format specifiers and some unknown number of parameters. C and C++ folks have long enjoyed the “...” variable number of parameters declaration used so effectively on things like printf, sprintf and others. Without getting too deep into the reasons and mechanics behind why this was not actually possible given the way function parameters were passed in Pascal, but suffice it to say it had to do with the order in which parameters were pushed onto the stack and who was responsible for cleaning said stack. Actually, if you really want the gory details, I'm sure Hallvard would be able to wax poetically on the whole underpinnings and machine-level workings :-).

So here we are wanting to have a nice string format function that allows you to specify any number of parameters both as constants and variables for maximum flexibility. Sure we could have just introduced a direct clone of printf and even followed the same syntax, but that just didn't seem to “fit” the whole idea of maximum type safety. See the problem with the printf function is that if you want to format a string with the text representation of the value of a byte followed by an integer, there was no information passed in that clearly indicated that 'x' param is a byte. It also forces you to specify the same parameter multiple times if you wanted to use it in more than one place. It was essentially a variable length array with elements of varying sizes without any information as to the overall length and size of each element! Gee... I wonder why the world is filled with so many buffer overrun errors? Anyway, I digress :-).

So the requirements were simple. We needed a language construct that would allow us to more-or-less declare a function to take a variable number of parameters. Since a function's parameter list is essentially a compiler generated array pushed onto the stack (or passed in CPU registers), why not just allow an array to be declared in place and passed to the function? We also wanted this array to be self-describing and type-safe. So when you declare a parameter as an “array of const” the compiler actually makes that into an open array parameter as an “array of TVarRec.” The declaration for TVarRec is as follows:
PVarRec = ^TVarRec;
TVarRec = record { do not pack this record; it is compiler-generated }
case Byte of
vtInteger: (VInteger: Integer; VType: Byte);
vtBoolean: (VBoolean: Boolean);
vtChar: (VChar: Char);
vtExtended: (VExtended: PExtended);
vtString: (VString: PShortString);
vtPointer: (VPointer: Pointer);
vtPChar: (VPChar: PChar);
vtObject: (VObject: TObject);
vtClass: (VClass: TClass);
vtWideChar: (VWideChar: WideChar);
vtPWideChar: (VPWideChar: PWideChar);
vtAnsiString: (VAnsiString: Pointer);
vtCurrency: (VCurrency: PCurrency);
vtVariant: (VVariant: PVariant);
vtInterface: (VInterface: Pointer);
vtWideString: (VWideString: Pointer);
vtInt64: (VInt64: PInt64);
end;

It's just a variant record. If you also look closely, all the data fields are the same size. They all max out at pointer size (which has some interesting implications for 64bit, but that is a subject for another day). So if the data being passed in is > 4 bytes it is done as a pointer to this data. An array with elements of this type will be a constant element size array. Also, since “array of const” becomes an “array of TVarRec” it is an open array so the length of the array is also passed in. This satisfies one objection to the C-style “...” construct. The other objection is solved by the fact that the compiler will encode into the VType field a value representing the type of that element. Another side benefit of this is that you can now refer to the position of an element in addition to its value. This allows you to do interesting things like explicitly refer to the element you want to format in the string using the format specifier '%x:y' where x is the ordinal position of the element with 0 being the first element.
So how do you use this special array construct? If you declared your function or method with a parameter of type “array of const” in the body of the method you just treat that parameter as an “array of TVarRec” and use all the standard array indexing and range checking functions, like Low and High. Here's a simple function that just writes to the console the type and value of each element:

procedure PrintArrayOfConst(const Args: array of const);
var
I: Integer;
begin
for I := Low(Args) to High(Args) do
begin
Write('Arg[', I, ']:');
case Args[I].VType of
vtInteger: Writeln('Integer = ', Args[I].VInteger);
vtBoolean: Writeln('Boolean = ', BoolToStr(Args[I].VBoolean, True));
vtChar: Writeln('Char = ''', Args[I].VChar, '''');
vtExtended: Writeln('Extended = ', FloatToStr(Args[I].VExtended^));
vtString: Writeln('ShortString = ''', Args[I].VString^, '''');
vtPChar: Writeln('PChar = ''', Args[I].VPChar, '''');
vtAnsiString: Writeln('AnsiString = ''', string(Args[I].VAnsiString), '''');
vtWideChar: Writeln('WideChar = ''', Args[I].VWideChar, '''');
vtPWideChar: Writeln('PWideChar = ''', Args[I].VPWideChar, '''');
vtWideString: Writeln('WideString = ''', WideString(Args[I].VWideChar), '''');
vtInt64: Writeln('Int64 = ', Args[I].VInt64^);
vtCurrency: Writeln('Currency = ', CurrToStr(Args[I].VCurrency^));
else
Writeln('Unsupported');
end;
end;
end;

Of course it doesn't support all things, but it should give you an idea what you can do with this little known but widely used (you've used Format() right?) language feature. Here's a sample of how to call this function:
    PrintArrayOfConst([1, 'c', 'this is a string', 12.5, True, PChar('This is a pchar'), Int64(123456)]);

I had to do some typecasting in order to get the compiler to recognize some of the literals as specific type because it will automatically pick the most natural type to use. Of course if you pass in variables for the array elements, the type of that variable is preserved. So go have fun, and definately do try this at home.

17 comments:

  1. This is one of the things that sold me on Delphi way back in 1994. I remember seeing a short example in Jeff Duntemann's PC Techniques (or was it Visual Developmer Magazine by then?) and being incredibly annoyed that the short accompanying article didn't explain what the square brackets were doing there and how this trick worked! But another article in the same or the next issue cleared it up, and after that it was just a matter of time until I managed to convince someone at Borland Australia that they did, in fact, have a product named Delphi and that they should sell one to me as soon as it was in stock.

    ReplyDelete
  2. I built a version of sscanf that uses this functionality. Unfortunately, you can't make the arguments function like a Var, so unfortunately, everything is passed as a pointer, EXCEPT longstrings (shortstrings are passed as a pointer).


    ie: sscanf(inputstring,'%d %d %s',[@i,@j,s]);


    Needless to say, typesafety is out the window, except for the strings.


    It would be nice if a future revision could handle that (everything is a pointer to data, but you still get the type), but I doubt you would extend the functionality for just one person's function (ah well...)


    I should probably put the code back up on my website one of these days. Maybe once I finish rewriting it to improve the parser's performance. Anyone still interested in sscanf functionality or has everyone learned to live without it?

    ReplyDelete
  3. Oh, that reminds me.


    The format functionality needs to be better integrated into the VCL.. I'm forever creating object descendants that only really add functionality that adds formatted strings where strings are allowed..


    ie: I typically overload TStringList's .Add functionality to include:


    Function TMyStringList.Add(FormatStr : String; Const Args : Array Of Const);


    Begin

    Add(Format(FormatStr,Args));

    End;


    and all the rest of the .add features. I wish it was part of the TStrings definition - this sort of thing saves a LOT of typing later.


    Actually, there are a LOT of places the VCL would benefit from this type of overloading (including ShowMessageFmt, which should just overloaded into ShowMessage with ShowMessageFmt deprecated, as it was added before overloading as I recall..)


    Just my 2 cents.

    ReplyDelete
  4. > which has some interesting implications for

    > 64bit


    Hm, I hope this means what I hope it means ;)

    ReplyDelete
  5. We have a TDialogForm class which is responsible to do a Windows Readln on steroids (think about at something similar to Tcl/Tk).


    Something like


    TDialogForm = class(TCustomForm)

    ...

    public

    procedure AddLabel(aCaption: string; aLineReductionPercent: integer=20);

    procedure AddEdit(var aTextVal: string); overload;

    procedure AddEdit(aCaption: string; var aTextVal: string); overload;

    procedure AddDDList(aCommaTextVal: string; var aItemIndex: integer; aAutoSize: boolean=True); overload;

    procedure AddDDList(aCaption, aCommaTextVal: string; var aItemIndex: integer; aAutoSize: boolean=True); overload;

    procedure AddCheckBox(aCaption: string; var aChecked: boolean);

    procedure AddButton(aCaption: string; aResult: TModalResult; aNewLine: boolean=False; aAutoSize: boolean=False; aDefault: boolean=False);

    procedure AddSeparator(aSpaceOnly: boolean=False);

    procedure AddDatePicker(var aDate: TDateTime); overload;

    procedure AddDatePicker(aCaption: string; var aDate: TDateTime); overload;

    procedure AddTimePicker(var aTime: TDateTime);

    procedure AddTrackBar(aMin, aMax: integer; var aPosition: integer); overload;

    procedure AddTrackBar(aCaption: string; aMin, aMax: integer; var aPosition: integer); overload;

    procedure AddMemo(var aText: string); overload;

    procedure AddMemo(aCaption: string; var aText: string); overload;

    procedure AddRadioGroup(aCaption: string; aCommaTextVal: string; var aItemIndex: integer);

    procedure AddList(aCommaTextVal: string; var aItemIndex: integer);

    procedure AddCrLf;

    ...

    function Execute(aCaption: string; Buttons: TMsgDlgButtons; aAccept: TModalResult=mrOk; aDefault: TMsgDlgBtn=mbHelp): TModalResult;


    which pushes the values from the UI controls which are dynamically generated, layed out, and filled based on 'Add' procedures from above. A more general approach, based on variants like:

    DlgRead(aCaption: string; aValue: array of const);

    DlgReadLn(aCaption: string; aValue: array of const);


    ...in which the class will be responsible by choosing the right input controls to read the values will be, imho, a wellcome addition to Delphi.


    hth.

    ReplyDelete
  6. As it happens I posted about how array of const could be used by user code back in the <a href"http://groups.google.com/group/comp.lang.pascal/browse_thread/thread/d86f368e885c98e0/164e0ba6c7051d35">last millennium</a> ;) - it was undocumented at the time.


    "Actually, if you really want the gory details"


    Heh - thanks for the link and vote of confidence, Allen!


    We'll see - maybe I'll pick you up on this and write something about calling conventions and order of parameter passing in the future.


    "Another side benefit of this is that you can now refer to the position of an element "


    Incidentally, this feature is available in C-style xprintf functions, too (and is a well-known source of security holes).

    ReplyDelete
  7. The one additional step that the compiler needs now is something akin to the C# params keyword. The square brackets are only there because the compiler requires them at present - internal implementation need not be affected. Oh, and by the way, I use this functionality a lot and it has saved me from lots of repetitive code - nice going.

    ReplyDelete
  8. Your blog makes me laugh. Every day with BDS 2006, i have tens of access violations, and lot of exotic errors. Imagine, I made a windows script which kills the BDS process. And what are you doing at CodeGear ? Blogs & chatting. Web. Oh, what a shame, your web site is not Web 2.0... But i am sure you are working hard on it. It is so funny.

    ReplyDelete
  9. Arthur -> D2007 is significantly more stable, but some problems can be caused by components.


    Which isn't to say the D2007 IDE is perfect.


    Mine has learned a new phrase : "catastrophic failure".


    Since I've gone a month without ever seeing it, I gotta wonder what sundenly drove it insane (aside from being coded to run in an infinite loop when it happens to the structure view...)


    When the compiler has a catastrophic failure, it at least just turns of the compile functionality and makes you restart the IDE (unlike the structure view... ugh)..


    Annoyingly, it will probably vanish without me ever pinning down what upsets it so.

    ReplyDelete
  10. "it will automatically pick the most natural type to use"


    These natural choice behaviour was changed since Delphi 5. So beware if you upgrade your code to a new version of Delphi.


    An integer with value 10 stored in a variant was always of type vtInteger, but Delphi 2006 converts it to a vtSmallInt, resulting in a lot of:

    "else Writeln('Unsupported');"


    Btw, it is good to see the team blogging.


    Best regards,

    Bas

    ReplyDelete
  11. That case statement reminded me of this quote:


    "I’m not against types, but I don’t know of any type systems that aren’t a complete pain, so I still like dynamic typing." - Alan Kay in The Meaning of Object-oriented Programming


    I find myself using dynamically-typed languages more and more frequently.

    ReplyDelete
  12. Java 5 added something similar with their new static class function format() of the String class. Made possible with their new varargs in Java 5.


    Basically, it brings printf like formatting to Strings in Java.

    ReplyDelete
  13. m. Th.


    What a difficult post to read. I had to really struggle intellectually. If this is the standard of communication between the developers in the Delphi group - then that explains a lot.


    I think it's a cop-out to ask customers to upgrade because they are having problems with the software you sold them. It's probably morally wrong and also, perhaps could be said that Delphi is not 'fit-for-purpose'. Yes, of course, building an IDE is going to be a complicated

    ReplyDelete
  14. What happened to the rest of my post?


    Well, I'm not going to type it twice.

    ReplyDelete
  15. I noticed that you use VWideChar for the vtWideString case. Shouldn't that be VWideString?

    ReplyDelete
  16. A very interesting article which helped me understand open arrays. However, you write "and definately (sic) do try this at home". Can you not afford a spellcheck program? Did you miss / forget your elementary education? Didn't your job description include something like "excellent communication skills, both written and verbal" as a criterion?

    It might be worth clarifying your assertion "If you also look closely, all the data fields are the same size" by pointing out the reason it is:
    TVarRec = record { do not pack this record; it is compiler-generated }
    case Byte of
    vtInteger: (VInteger: Integer; VType: Byte);
    vtBoolean: (VBoolean: Boolean);

    and not:

    TVarRec = record { do not pack this record; it is compiler-generated }
    VType: Byte;
    case Byte of
    vtInteger: (VInteger: Integer);
    vtBoolean: (VBoolean: Boolean);

    is to preserve the ability to typecast. Putting the byte at the front wouldn't affect speed due to alignment because the record isn't packed.

    ReplyDelete
  17. Mr. Watson,

    Did that make you feel better to impugn my intelligence and competence by ridiculing a simple spelling mistake? Starting off a comment with insult and ridicule is a sure way to be completely ignored...

    ReplyDelete

Please keep your comments related to the post on which you are commenting. No spam, personal attacks, or general nastiness. I will be watching and will delete comments I find irrelevant, offensive and unnecessary.