Thursday, January 24, 2008

Taking a big PByte of pointer math

Many of you have done it.  We here have done it.  It is just so danged convenient and satisfying.  Sure, it is not regarded as a "safe" thing to do.  Without the proper precautions, you can easily get into some serious trouble.  But, hey, the C/C++ programmers do it all the time!  What is "it"?  Pointer math.  Pointer math is simply treating any given typed pointer in some narrow instances as a scaled ordinal where you can perform simple arithmetic operations directly on the pointer variable.  It also allows you to treat such a pointer variable as an unbounded array using the array [] operator.  In Delphi there are only a few cases where this is possible.  Let's set the WABAC machine for a little history lesson.

Around 1990 when Borland was moving Turbo Pascal to the new up-and-coming GUI operating system, Windows 3.0/3.1, the programming interface was a pure C-based API (with a "Pascal" calling convention, ironically).  That meant that character strings were handled as null-terminated arrays of chars.  These were mostly passed around by "reference" as a char* type.  Given this, it was very common to scan through a string by treating the typed pointer as an array by applying the "array operator []."  Likewise, simple arithmetic operations such as addition or subtraction are also common and convenient.  In order to better match the Windows programming model, Turbo Pascal for Windows introduced a special PChar type.  This type was unique in that the compiler had intrinsic knowledge of this type which allowed those same behaviors described previously.

Return to today.  Because of this special "PChar" type behavior, many developers (ourselves included) would do crazy things such as casting a pointer of one type to a PChar and then do some pointer arithmetic.  This was done because all other user-defined typed pointers would not allow these types of operations.  What this has done is created cases where some code is littered either with a lot of pointers cast to PChars or the direct use of PChar pointers even when the data being manipulated isn't specifically byte-sized characters.  In other words, PChars were used to manipulate byte-buffers of arbitrary data.

During the development of Tiburón, we discovered some of our own code was doing a lot of the above things. (I told you we've all done it!)  Using the PChar trick was simply the only thing available and made a lot of the code simpler and a little easier to read.  Now that PChar is an alias to PWideChar, things started falling over because the element type is now 2 bytes.  As I mentioned above, a typed pointers are scaled based on the size of the element.  Anyplace there is a PChar(Ptr) + 1, Inc(PChar(Ptr)), or similar expression, the result is an increment by 2 instead of 1.  As you can imagine, this has caused a few headaches.  In looking at the code, it was clear that the intent was to access this data buffer as an array of bytes, and was merely using a PChar as a convenience for accessing as an array or performing simple arithmetic.

What about using a PByte?  PByte has been a type declared in the Delphi System unit for many releases.  It is declared as simply a pointer to a byte or PByte = ^Byte.  One solution would be to teach the compiler special intrinsic knowledge of the PByte type just like the current PAnsiChar and PWideChar types, but I didn't like where that was heading (what about the next type we'd want to do this to, and the next?).  What if you wanted to have a typed pointer to some other type such as an Integer or a record?  What if you wanted to do the same kinds of scaled pointer arithmetic and array indexing?  So for Tiburón, we're introducing a new compiler directive {$POINTERMATH <ON|OFF>}.  If you declare a typed pointer while this directive is on, any variable of that type will allow all the scaled pointer arithmetic and array indexing you like.  Likewise, any block of code that is surrounded by that directive, will allow the same syntax for any typed pointers within that block regardless of whether or not it was originally declared that way.  PByte is declared with that directive ON.  This means that all the places that are using a the PChar type simply for the byte-pointer and byte-array access, can now use the PByte type instead and none of the existing code statements and logic needs to change.  A simple search and replace over the source is what is needed.  Sure, we could have changed the PChar references to PAnsiChar, but that simply serves to perpetuate the lack of clarity over what the intent of the code actually is.  This directive is not on by default, and the only existing type that is declared with it on is PByte.  It also only affects typed pointers.  Variables of type "Pointer" do not allow the pointer math features since it is effectively pointing to a "void" element which is 0 size.  Untyped "var" or "const" parameters are not affected because they're not really pointers (even though internally a "reference" in the form of an address is passed in).

This code will now compile with the new Tiburón compiler:

{$POINTERMATH ON}
procedure MoveRects(Rects: PRect; Count: Integer);
begin
while Count > 0 do
begin
MoveRect(Rects[Count - 1]); // This line will not compile today.
MoveRect((Rects + Count - 1)^); // Neither will this line.
Dec(Count);
end;
end;

Little known factoid: typed pointers have always been able to manipulated using the Inc() and Dec() standard procedures†.  The above code is very contrived and could be replaced the following that will compile today:


procedure MoveRects(Rects: PRect; Count: Integer);
begin
while Count > 0 do
begin
MoveRect(Rects^);
Inc(Rects); // This will scale the increment by the size of the element. 16 in this case.
{Rects := Rects + 1;} // The above is the same as this, which won't compile today.
Dec(Count);
end;
end;

Yes, this functionality can be considered very dangerous, especially where you can easily overrun of the end of a buffer.  This should be used judiciously and prudently in some very narrow instances.  If you don't want to use it, simply never turn on that directive and any code that tries to do this will not compile.


† There is an interesting bit of history with the Inc() and Dec() standard procedures.  When the PChar was introduced, Inc() and Dec() were also modified to operate on a PChar variable.  It turned out that it worked for all pointer types.  Additionally, it also properly scaled the increment or decrement operation based on the size of the pointed-to element.  Since a PChar pointed to a byte-sized element all the testing focused on that type.  The product was released and it was then discovered that Inc() and Dec() had this additional behavior.  The following release we decided to "fix" the problem.  You would have thought we were the angel of death and had just killed the first-born of every household!  The hew and cry from the field testers was overwhelming.  We finally relented and put the "bug" back in and it has remained to this day as a "feature."  I wonder what the reaction will be to this change?  I predict there will be hails of praise from one camp, and sneering and guffaws from another.  The largest camp will be the ones ambivalent to this change.   To which camp to you belong?

18 comments:

  1. Praise, definitely.

    But now that pointers come up - I know it's probably a bit too early to ask - but can you shed a bit of light on what we can expect from the move to x64 in Commodore and what we can do to prepare for it now? That would be much appreciated.

    Cheers, Phil

    ReplyDelete
  2. WAY TO GO DALLAS
    CODEGEAR !!!!!

    I have never quite understood why it's OK to do pointer math with PChar but not any other pointer type. This leads directly to the "hack" of using PChar inappropriately as a way of doing pointer math on non-Char buffers, which in turns leads to greater chance for error.

    Only allowing Pointer Math on PChar didn't/doesn't stop people doing pointer math and getting it wrong, it just increases the chances that if they do it that they will get it wrong.

    Even disallowing it on PChar wouldn't stop someone casting any old Pointer to Integer, doing the math, then casting it back to Pointer.

    It's quite easy to make a case for having Pointer Arithmetic being SAFER than not having it (an unnecessary danger..? ... you said it yourself - we've all done it! ergo, we need it).

    I'll say again: Way to go CodeGear!!

    :)


    Just one Q: Will there be a project switch to set a default ON state for this directive across an entire (i.e. a new) project or will we have to manually toggle it on in each unit as required?

    ReplyDelete
  3. Dang - there's supposed to be 6x backspaces after "DALLAS".

    ReplyDelete
  4. That must be the most useful bug I have ever encountered. If you took it away I would start screaming too as I use it all over the place. Just curious: When did you guys try to fix the Inc() pointer arithmetic bug?

    This pointer arithmetic stuff is a really nice addition. Will it be possible to switch it on globally in the project options?

    ReplyDelete
  5. Jolyon,
    That is certainly an argument one can make. By not having an officially sanctioned mechanism, people resorted to ugly hacks which tended to get them into irrevocable trouble. And, yes, we have all done it :-).
    As for a project wide setting, that will probably be the case. Maybe we should put it on an "Advanced Options" page that the user has to acknowledge that they're delving into the dark-arts and playing with fire and voodoo :-).
    Allen.

    ReplyDelete
  6. Jan,

    "When did you guys try to fix the Inc() pointer arithmetic bug?"

    I *think* it was early field tests for Borland Pascal 7. I do clearly remember the angst it caused. Internally we laughed about the whole affair for quite awhile. It was heralded as one of those "happy accidents" you so rarely ever see :-).

    Allen.

    ReplyDelete
  7. Allen, this is great news! I spent a minute or two thinking about code that needs to compile under Tiburon as well as earlier versions of Delphi. My idea is...

    type
    {$IFDEF Delphi12}
    PBinary = PByte;
    {$ELSE}
    PBinary = PAnsiChar;
    {$ENDIF}

    With this, we can write code to move data around in buffers, and it'll compile in any version of Delphi. There's probably a better name than PBinary, but that's the best I can think of so far.

    -Jay

    ReplyDelete
  8. Allen wrote :
    "Now that PChar is an alias to PWideChar, ..."
    Still supporting that idea of pushing everybody to Unicode by force ?
    Without some sort of a compatibility switch (Unicode / non Unicode) at the project or IDE level, it is really a very bad thing.
    At least, tell us for how long do you intend to support (and sell) D2007 as it will be (for many of your users) the last usable Delphi.

    ReplyDelete
  9. I have been thinking about this wonderful addition a bit more:

    Why would you want a compiler directive or project option at all? Is it only to protect those who do not want to use it from using it? That sounds a bit strange especially since buffer overruns can also happen with arrays, lists or strings. Plus if it is the reason then why is there no compiler directive to switch on/off the use of pointers?

    The reason I am asking is because too many compiler directives and project options obscure the Delphi interface already.

    Or is there a performance penalty too?

    ReplyDelete
  10. Now this should have come earlier because of the proposede change of Char / string to their unicode equals. Currently a lot of code that uses pointer arith. uses the PChar() tring and hence will lead to *a lot* of work. If this can be backported to Delphi 2007 it would be e great help in porting these applications.

    ReplyDelete
  11. David McCammond-WattsJanuary 25, 2008 at 1:49 AM

    Great addition to the language. I agree with the previous posters: I don't see a need for a compiler switch. It would not affect existing code one way or another. Allowing pointer arithmetic, when you need pointers, makes code simpler and more maintainable. At least having the switch on by default would make sense to me.

    ReplyDelete
  12. I tried pointer arithmetics with every new Delhi version. I've always been disappointed. Well, better late than never.

    PS: Why $POINTERMATH directive at all? Just turn the thing ON!

    ReplyDelete
  13. +1 for pointer math.

    And the $POINTERMATH warning is very appreciated too.

    ReplyDelete
  14. Why pointer should not allow pointer math? Why something like

    var
    P: Pointer;
    I, J: Integer;
    P := @Something;
    I := 20;
    J := PInteger(P + I)^

    should not work? My application processe a lot of raw data, and I would like to be able to work with pointer math as I could do in C. I understand it can be very unsafe, but why deny it if the programmer knows what he's doing?
    Often I really wonder why CodeGear keeps on inhibiting Delphi full potential lowering it to far less poweful languages (VB, C#, etc.) instead of unleashing its full power and making it a "better C++".

    ReplyDelete
  15. While I agree that it can be dangerous, it is also very useful and makes a lot of currently tricky code (casting to Integer or PChar and back) much more readable. The fact that it must be turned on explicitly(*) makes it safe enough, IMO.

    But I am quite happy with this.

    (*)One suggestion: perhaps it should be auto-turned off again at the end of a routine, so the next routine should explicitly set it to ON again. I know that no other directive does this, but hey, it could be a first.

    ReplyDelete
  16. I used this method:

    'var
    ' p : Pointer;
    ...
    'p := @a;
    'asm
    ' inc p
    'end;

    It works fine:)

    ReplyDelete
  17. Hi. pointer math sometimes is necessary. I solve the "pchar/pbyte" math in Delphi by making a "pbyte", library and set of functions. Interested ?

    ReplyDelete
  18. how can make convert this function work
    var
    pt_rec : any_pt; //pointer to a record;
    buf : array[0..1024] of char;
    begin
    PChar(pt_rec) := buf;
    ...
    ...
    end;
    if in case I declare
    var
    buf : array[0..1024] of byte;
    begin
    PByte(pt_rec) := buf;

    thanks

    ReplyDelete

Please keep your comments related to the post on which you are commenting. No spam, personal attacks, or general nastiness. I will be watching and will delete comments I find irrelevant, offensive and unnecessary.