Thursday, January 24, 2008

Taking a big PByte of pointer math

Many of you have done it.  We here have done it.  It is just so danged convenient and satisfying.  Sure, it is not regarded as a "safe" thing to do.  Without the proper precautions, you can easily get into some serious trouble.  But, hey, the C/C++ programmers do it all the time!  What is "it"?  Pointer math.  Pointer math is simply treating any given typed pointer in some narrow instances as a scaled ordinal where you can perform simple arithmetic operations directly on the pointer variable.  It also allows you to treat such a pointer variable as an unbounded array using the array [] operator.  In Delphi there are only a few cases where this is possible.  Let's set the WABAC machine for a little history lesson.

Around 1990 when Borland was moving Turbo Pascal to the new up-and-coming GUI operating system, Windows 3.0/3.1, the programming interface was a pure C-based API (with a "Pascal" calling convention, ironically).  That meant that character strings were handled as null-terminated arrays of chars.  These were mostly passed around by "reference" as a char* type.  Given this, it was very common to scan through a string by treating the typed pointer as an array by applying the "array operator []."  Likewise, simple arithmetic operations such as addition or subtraction are also common and convenient.  In order to better match the Windows programming model, Turbo Pascal for Windows introduced a special PChar type.  This type was unique in that the compiler had intrinsic knowledge of this type which allowed those same behaviors described previously.

Return to today.  Because of this special "PChar" type behavior, many developers (ourselves included) would do crazy things such as casting a pointer of one type to a PChar and then do some pointer arithmetic.  This was done because all other user-defined typed pointers would not allow these types of operations.  What this has done is created cases where some code is littered either with a lot of pointers cast to PChars or the direct use of PChar pointers even when the data being manipulated isn't specifically byte-sized characters.  In other words, PChars were used to manipulate byte-buffers of arbitrary data.

During the development of Tiburón, we discovered some of our own code was doing a lot of the above things. (I told you we've all done it!)  Using the PChar trick was simply the only thing available and made a lot of the code simpler and a little easier to read.  Now that PChar is an alias to PWideChar, things started falling over because the element type is now 2 bytes.  As I mentioned above, a typed pointers are scaled based on the size of the element.  Anyplace there is a PChar(Ptr) + 1, Inc(PChar(Ptr)), or similar expression, the result is an increment by 2 instead of 1.  As you can imagine, this has caused a few headaches.  In looking at the code, it was clear that the intent was to access this data buffer as an array of bytes, and was merely using a PChar as a convenience for accessing as an array or performing simple arithmetic.

What about using a PByte?  PByte has been a type declared in the Delphi System unit for many releases.  It is declared as simply a pointer to a byte or PByte = ^Byte.  One solution would be to teach the compiler special intrinsic knowledge of the PByte type just like the current PAnsiChar and PWideChar types, but I didn't like where that was heading (what about the next type we'd want to do this to, and the next?).  What if you wanted to have a typed pointer to some other type such as an Integer or a record?  What if you wanted to do the same kinds of scaled pointer arithmetic and array indexing?  So for Tiburón, we're introducing a new compiler directive {$POINTERMATH <ON|OFF>}.  If you declare a typed pointer while this directive is on, any variable of that type will allow all the scaled pointer arithmetic and array indexing you like.  Likewise, any block of code that is surrounded by that directive, will allow the same syntax for any typed pointers within that block regardless of whether or not it was originally declared that way.  PByte is declared with that directive ON.  This means that all the places that are using a the PChar type simply for the byte-pointer and byte-array access, can now use the PByte type instead and none of the existing code statements and logic needs to change.  A simple search and replace over the source is what is needed.  Sure, we could have changed the PChar references to PAnsiChar, but that simply serves to perpetuate the lack of clarity over what the intent of the code actually is.  This directive is not on by default, and the only existing type that is declared with it on is PByte.  It also only affects typed pointers.  Variables of type "Pointer" do not allow the pointer math features since it is effectively pointing to a "void" element which is 0 size.  Untyped "var" or "const" parameters are not affected because they're not really pointers (even though internally a "reference" in the form of an address is passed in).

This code will now compile with the new Tiburón compiler:

{$POINTERMATH ON}
procedure MoveRects(Rects: PRect; Count: Integer);
begin
while Count > 0 do
begin
MoveRect(Rects[Count - 1]); // This line will not compile today.
MoveRect((Rects + Count - 1)^); // Neither will this line.
Dec(Count);
end;
end;

Little known factoid: typed pointers have always been able to manipulated using the Inc() and Dec() standard procedures†.  The above code is very contrived and could be replaced the following that will compile today:


procedure MoveRects(Rects: PRect; Count: Integer);
begin
while Count > 0 do
begin
MoveRect(Rects^);
Inc(Rects); // This will scale the increment by the size of the element. 16 in this case.
{Rects := Rects + 1;} // The above is the same as this, which won't compile today.
Dec(Count);
end;
end;

Yes, this functionality can be considered very dangerous, especially where you can easily overrun of the end of a buffer.  This should be used judiciously and prudently in some very narrow instances.  If you don't want to use it, simply never turn on that directive and any code that tries to do this will not compile.


† There is an interesting bit of history with the Inc() and Dec() standard procedures.  When the PChar was introduced, Inc() and Dec() were also modified to operate on a PChar variable.  It turned out that it worked for all pointer types.  Additionally, it also properly scaled the increment or decrement operation based on the size of the pointed-to element.  Since a PChar pointed to a byte-sized element all the testing focused on that type.  The product was released and it was then discovered that Inc() and Dec() had this additional behavior.  The following release we decided to "fix" the problem.  You would have thought we were the angel of death and had just killed the first-born of every household!  The hew and cry from the field testers was overwhelming.  We finally relented and put the "bug" back in and it has remained to this day as a "feature."  I wonder what the reaction will be to this change?  I predict there will be hails of praise from one camp, and sneering and guffaws from another.  The largest camp will be the ones ambivalent to this change.   To which camp to you belong?