Friday, June 1, 2007

A look at "array of const" for fun and profit

Back when Delphi was being developed we began to lament the lack of a function that would be able to format a string with format specifiers and some unknown number of parameters. C and C++ folks have long enjoyed the “...” variable number of parameters declaration used so effectively on things like printf, sprintf and others. Without getting too deep into the reasons and mechanics behind why this was not actually possible given the way function parameters were passed in Pascal, but suffice it to say it had to do with the order in which parameters were pushed onto the stack and who was responsible for cleaning said stack. Actually, if you really want the gory details, I'm sure Hallvard would be able to wax poetically on the whole underpinnings and machine-level workings :-).

So here we are wanting to have a nice string format function that allows you to specify any number of parameters both as constants and variables for maximum flexibility. Sure we could have just introduced a direct clone of printf and even followed the same syntax, but that just didn't seem to “fit” the whole idea of maximum type safety. See the problem with the printf function is that if you want to format a string with the text representation of the value of a byte followed by an integer, there was no information passed in that clearly indicated that 'x' param is a byte. It also forces you to specify the same parameter multiple times if you wanted to use it in more than one place. It was essentially a variable length array with elements of varying sizes without any information as to the overall length and size of each element! Gee... I wonder why the world is filled with so many buffer overrun errors? Anyway, I digress :-).

So the requirements were simple. We needed a language construct that would allow us to more-or-less declare a function to take a variable number of parameters. Since a function's parameter list is essentially a compiler generated array pushed onto the stack (or passed in CPU registers), why not just allow an array to be declared in place and passed to the function? We also wanted this array to be self-describing and type-safe. So when you declare a parameter as an “array of const” the compiler actually makes that into an open array parameter as an “array of TVarRec.” The declaration for TVarRec is as follows:
PVarRec = ^TVarRec;
TVarRec = record { do not pack this record; it is compiler-generated }
case Byte of
vtInteger: (VInteger: Integer; VType: Byte);
vtBoolean: (VBoolean: Boolean);
vtChar: (VChar: Char);
vtExtended: (VExtended: PExtended);
vtString: (VString: PShortString);
vtPointer: (VPointer: Pointer);
vtPChar: (VPChar: PChar);
vtObject: (VObject: TObject);
vtClass: (VClass: TClass);
vtWideChar: (VWideChar: WideChar);
vtPWideChar: (VPWideChar: PWideChar);
vtAnsiString: (VAnsiString: Pointer);
vtCurrency: (VCurrency: PCurrency);
vtVariant: (VVariant: PVariant);
vtInterface: (VInterface: Pointer);
vtWideString: (VWideString: Pointer);
vtInt64: (VInt64: PInt64);
end;

It's just a variant record. If you also look closely, all the data fields are the same size. They all max out at pointer size (which has some interesting implications for 64bit, but that is a subject for another day). So if the data being passed in is > 4 bytes it is done as a pointer to this data. An array with elements of this type will be a constant element size array. Also, since “array of const” becomes an “array of TVarRec” it is an open array so the length of the array is also passed in. This satisfies one objection to the C-style “...” construct. The other objection is solved by the fact that the compiler will encode into the VType field a value representing the type of that element. Another side benefit of this is that you can now refer to the position of an element in addition to its value. This allows you to do interesting things like explicitly refer to the element you want to format in the string using the format specifier '%x:y' where x is the ordinal position of the element with 0 being the first element.
So how do you use this special array construct? If you declared your function or method with a parameter of type “array of const” in the body of the method you just treat that parameter as an “array of TVarRec” and use all the standard array indexing and range checking functions, like Low and High. Here's a simple function that just writes to the console the type and value of each element:

procedure PrintArrayOfConst(const Args: array of const);
var
I: Integer;
begin
for I := Low(Args) to High(Args) do
begin
Write('Arg[', I, ']:');
case Args[I].VType of
vtInteger: Writeln('Integer = ', Args[I].VInteger);
vtBoolean: Writeln('Boolean = ', BoolToStr(Args[I].VBoolean, True));
vtChar: Writeln('Char = ''', Args[I].VChar, '''');
vtExtended: Writeln('Extended = ', FloatToStr(Args[I].VExtended^));
vtString: Writeln('ShortString = ''', Args[I].VString^, '''');
vtPChar: Writeln('PChar = ''', Args[I].VPChar, '''');
vtAnsiString: Writeln('AnsiString = ''', string(Args[I].VAnsiString), '''');
vtWideChar: Writeln('WideChar = ''', Args[I].VWideChar, '''');
vtPWideChar: Writeln('PWideChar = ''', Args[I].VPWideChar, '''');
vtWideString: Writeln('WideString = ''', WideString(Args[I].VWideChar), '''');
vtInt64: Writeln('Int64 = ', Args[I].VInt64^);
vtCurrency: Writeln('Currency = ', CurrToStr(Args[I].VCurrency^));
else
Writeln('Unsupported');
end;
end;
end;

Of course it doesn't support all things, but it should give you an idea what you can do with this little known but widely used (you've used Format() right?) language feature. Here's a sample of how to call this function:
    PrintArrayOfConst([1, 'c', 'this is a string', 12.5, True, PChar('This is a pchar'), Int64(123456)]);

I had to do some typecasting in order to get the compiler to recognize some of the literals as specific type because it will automatically pick the most natural type to use. Of course if you pass in variables for the array elements, the type of that variable is preserved. So go have fun, and definately do try this at home.