The Oracle at Delphi: January 2008

Monday, January 28, 2008

Meanwhile, back at the (Unicode) ranch

I use the TStrings.ReadFromFile/WriteToFile methods all the time to read/write text files and manipulate them. Does this mean all these files are now Unicode and I can't read/write the files in the old format?

I've gotten this specific question a few times recently, so I figured I'd address it directly. The bottom line here is that we've got you covered. The defaults for those TStrings methods will now write the files ANSI encoded based on the active code page and will read the files based on whether or not the file contains a Byte-Order-Mark (BOM). If a BOM is found it will read the data encoded as the BOM indicates. If no BOM is found it will read it as ANSI and up-convert based on the current active codepage. All your files written with pre-Tiburón versions of Delphi will still be read in just fine, with the caveat that as long as you read with the active codepage the same as with what was written. Likewise, any file written with Tiburón (unless you override things which I'll describe in a moment) should be readable with a the pre-Tiburón version. At this point, only the most common BOM formats are detected (UTF16 Little-Endian, UTF16 Big-Endian and UTF8).

That's great news! But what if I want these files to be written as Unicode?

We realize that many Delphi applications will need to continue to interact with other applications or datasources, many of which can only handle data in ANSI or even ASCII. This is why TStrings exhibit the default behavior I described above. However, many of you will want to read/write text data using the TStrings class a loss-less Unicode format, be that Little-Endian UTF16, Big-Endian UTF16, UTF8, UTF7, etc... This led us to introduce a new TEncoding class to SysUtils.pas. This class is very similar in methods and functionality that you can find in the System.Text.Encoding class in the .NET Framework. That class contains several static functions that return standard instances of TEncoding class descendants that handle many of the common encoding formats. You can also create your own TEncoding class descendant should you feel the need to handle encodings that may be unique to your situation (say, for Base64 or Quoted-Printable).

The TEncoding class is key to how the TStrings class now handles being able to read/write its contents in the various formats. There are now overloaded versions of the TStrings.ReadFrom/WriteToXXXX functions that now take an instance of a TEncoding class. In most cases you'll just want to pass in one of the singleton versions that you can obtain by calling one of the class static methods on the TEncoding class. For instance, suppose you want to write the data out as UTF8, you'd call it like this:

var
  S: TStringList;
begin
  S := TStringList.Create;
  ...
  S.WriteToFile('config.txt', TEncoding.GetUTF8);
  ...
end;

Without the extra parameter, 'config.txt' would simply be converted and written out as ANSI encoded based on the current active codepage. The interesting thing to note here is that you don't need to change the read code since TStrings will automatically detect the encoding based on the BOM and do the right thing. If you wanted to force the file to read and written using a specific codepage, you can create an instance of TMBCSEncoding and pass in the code page you want to use into the constructor. Then you use that instance to read and write the file since the specific codepage may not match the user's active codepage.

I use INI files for configuration and extensively use TIniFile and TMemIniFile. What about those?

The same thing holds for these classes in that the data will be read and written as ANSI data. Since INI files have always been traditionally ANSI (ASCII) encoded, it may not make sense to convert these. It will depend on the needs of your application. If, you do wish to change to use a Unicode format, we'll offer ways to use the TEncoding classes to accomplish that as well.

In all the above cases, the internal storage will be Unicode and any data manipulation you do with string will continue to function as expected. (Unless you're doing some of those odd data-payload tricks I've mentioned previously). Conversions will automatically happen when reading and writing the data. Those of you that are familiar with VCL for .NET, you already know how all of the above works since the new overloads added to TStrings were introduced with VCL for .NET and use the System.Text.Encoding class for all these operations.

Thursday, January 24, 2008

Taking a big PByte of pointer math

Many of you have done it. We here have done it. It is just so danged convenient and satisfying. Sure, it is not regarded as a "safe" thing to do. Without the proper precautions, you can easily get into some serious trouble. But, hey, the C/C++ programmers do it all the time! What is "it"? Pointer math. Pointer math is simply treating any given typed pointer in some narrow instances as a scaled ordinal where you can perform simple arithmetic operations directly on the pointer variable. It also allows you to treat such a pointer variable as an unbounded array using the array [] operator. In Delphi there are only a few cases where this is possible. Let's set the WABAC machine for a little history lesson.

Around 1990 when Borland was moving Turbo Pascal to the new up-and-coming GUI operating system, Windows 3.0/3.1, the programming interface was a pure C-based API (with a "Pascal" calling convention, ironically). That meant that character strings were handled as null-terminated arrays of chars. These were mostly passed around by "reference" as a char* type. Given this, it was very common to scan through a string by treating the typed pointer as an array by applying the "array operator []." Likewise, simple arithmetic operations such as addition or subtraction are also common and convenient. In order to better match the Windows programming model, Turbo Pascal for Windows introduced a special PChar type. This type was unique in that the compiler had intrinsic knowledge of this type which allowed those same behaviors described previously.

Return to today. Because of this special "PChar" type behavior, many developers (ourselves included) would do crazy things such as casting a pointer of one type to a PChar and then do some pointer arithmetic. This was done because all other user-defined typed pointers would not allow these types of operations. What this has done is created cases where some code is littered either with a lot of pointers cast to PChars or the direct use of PChar pointers even when the data being manipulated isn't specifically byte-sized characters. In other words, PChars were used to manipulate byte-buffers of arbitrary data.

During the development of Tiburón, we discovered some of our own code was doing a lot of the above things. (I told you we've all done it!) Using the PChar trick was simply the only thing available and made a lot of the code simpler and a little easier to read. Now that PChar is an alias to PWideChar, things started falling over because the element type is now 2 bytes. As I mentioned above, a typed pointers are scaled based on the size of the element. Anyplace there is a PChar(Ptr) + 1, Inc(PChar(Ptr)), or similar expression, the result is an increment by 2 instead of 1. As you can imagine, this has caused a few headaches. In looking at the code, it was clear that the intent was to access this data buffer as an array of bytes, and was merely using a PChar as a convenience for accessing as an array or performing simple arithmetic.

What about using a PByte? PByte has been a type declared in the Delphi System unit for many releases. It is declared as simply a pointer to a byte or PByte = ^Byte. One solution would be to teach the compiler special intrinsic knowledge of the PByte type just like the current PAnsiChar and PWideChar types, but I didn't like where that was heading (what about the next type we'd want to do this to, and the next?). What if you wanted to have a typed pointer to some other type such as an Integer or a record? What if you wanted to do the same kinds of scaled pointer arithmetic and array indexing? So for Tiburón, we're introducing a new compiler directive {$POINTERMATH <ON|OFF>}. If you declare a typed pointer while this directive is on, any variable of that type will allow all the scaled pointer arithmetic and array indexing you like. Likewise, any block of code that is surrounded by that directive, will allow the same syntax for any typed pointers within that block regardless of whether or not it was originally declared that way. PByte is declared with that directive ON. This means that all the places that are using a the PChar type simply for the byte-pointer and byte-array access, can now use the PByte type instead and none of the existing code statements and logic needs to change. A simple search and replace over the source is what is needed. Sure, we could have changed the PChar references to PAnsiChar, but that simply serves to perpetuate the lack of clarity over what the intent of the code actually is. This directive is not on by default, and the only existing type that is declared with it on is PByte. It also only affects typed pointers. Variables of type "Pointer" do not allow the pointer math features since it is effectively pointing to a "void" element which is 0 size. Untyped "var" or "const" parameters are not affected because they're not really pointers (even though internally a "reference" in the form of an address is passed in).

This code will now compile with the new Tiburón compiler:

{$POINTERMATH ON}
procedure MoveRects(Rects: PRect; Count: Integer);
begin
  while Count > 0 do
  begin
    MoveRect(Rects[Count - 1]); // This line will not compile today.
    MoveRect((Rects + Count - 1)^); // Neither will this line.
    Dec(Count);
  end;
end;

Little known factoid: typed pointers have always been able to manipulated using the Inc() and Dec() standard procedures†. The above code is very contrived and could be replaced the following that will compile today:

procedure MoveRects(Rects: PRect; Count: Integer);
begin
  while Count > 0 do
  begin
    MoveRect(Rects^);
    Inc(Rects);  // This will scale the increment by the size of the element. 16 in this case.
    {Rects := Rects + 1;}  // The above is the same as this, which won't compile today.
    Dec(Count);
  end;
end;

Yes, this functionality can be considered very dangerous, especially where you can easily overrun of the end of a buffer. This should be used judiciously and prudently in some very narrow instances. If you don't want to use it, simply never turn on that directive and any code that tries to do this will not compile.

† There is an interesting bit of history with the Inc() and Dec() standard procedures. When the PChar was introduced, Inc() and Dec() were also modified to operate on a PChar variable. It turned out that it worked for all pointer types. Additionally, it also properly scaled the increment or decrement operation based on the size of the pointed-to element. Since a PChar pointed to a byte-sized element all the testing focused on that type. The product was released and it was then discovered that Inc() and Dec() had this additional behavior. The following release we decided to "fix" the problem. You would have thought we were the angel of death and had just killed the first-born of every household! The hew and cry from the field testers was overwhelming. We finally relented and put the "bug" back in and it has remained to this day as a "feature." I wonder what the reaction will be to this change? I predict there will be hails of praise from one camp, and sneering and guffaws from another. The largest camp will be the ones ambivalent to this change. To which camp to you belong?

Tuesday, January 22, 2008

Breaking the rules

Sometimes you just have to break the rules in order to learn something new. It is inherent in everyone, from the first time you ignored your parents about getting burned by the hot stove, to doing 10+ over the speed limit because "you can." Even though I mentioned in this post that you should never go mucking with the fields of a critical section structure, I just could not keep my grimy little hands off :-). One reason is that in Windows Vista, Microsoft introduced new condition variable APIs. My previous posts were about creating a full monitor object which is the combination of a mutual-exclusion lock (critical section) and a condition variable (wait/notify/notifyall). However there are cases where you would want to use more than one condition variable with the same lock. That is what these new Windows APIs allow you to do. After reading as much about the underlying implementation of these new bits of synchronization goodness, I found that they use a relatively new internal set of undocumented APIs for "Keyed Events." You can read this post by Joe Duffy for a high-level explanation. What if I could "backfill" these new APIs for older version of Windows within the DPL? I would need to break some rules in order to do that. I'd also have to forego any use of keyed events since those are only Windows XP and up (and not documented either). Windows 2000 would be out of luck. Let's get started with this hack-fest. For now let's ignore the "slim" reader/writer locks.

There are four key APIs to duplicate and one structure.

type
  PRTLConditionVariable = ^TRTLConditionVariable;
  _RTL_CONDITION_VARIABLE = record
    Ptr: Pointer;
  end;
  TRTLConditionVariable = _RTL_CONDITION_VARIABLE;

procedure InitializeConditionVariable(out ConditionVariable: TRTLConditionVariable);
procedure WakeConditionVariable(var ConditionVariable: TRTLConditionVariable);
procedure WakeAllConditionVariable(var ConditionVariable: TRTLConditionVariable);
function SleepConditionVariableCS(var ConditionVariable: TRTLConditionVariable;
  var CriticalSection: TRTLCriticalSection; dwMilliseconds: DWORD): BOOL;

The condition variable structure has one field which is not much space to store state information. My first thought was to simply allocate a structure that would be pointed to by the single field of the condition variable. The problem is there is no Delete/Destroy/FreeConditionVariable API. How would this structure be freed? Rats! That's a bust. This is where it is time to get all geeky and put on the propeller cap. If you recall from my discussion of the TMonitor, the Wait method would insert a little structure into a linked list that represented all the current waiters. We need to do something similar. There is a little twist here as well because the WakeCondtionVariable and the WakeAllConditionVariable are not required to be called from within the lock. In fact, all they take is the TRTLConditionVariable structure as the lone parameter. This makes managing that internal list of waiters a lot more tricky since we're not guaranteed to be in a lock while we need to update that list. This is compounded by the fact that the condition variable structure has only one pointer sized field. My first thought was to use a lock-free queue. The problem here is that we need two pointers, one to the head of the queue and one to the tail of the queue. I only have one pointer. Before we get into SleepConditionVariableCS where we're going to do some unholy things to the poor critical section structure, let's get the wait queue out of the way.

To set the stage we need a little review of field alignment. Alignment of fields of a structure, the local variables on the stack and the data on the heap is very important to making sure reads and writes are fast and atomic. Packing a structure that knocks the fields out of their "natural" alignment can sometimes have unwanted side-effects. Yes, there are times where you will get a few "dead bytes" between fields. A double word sized entity on the 32bit x86 is 4 bytes in size. This means that the "natural" alignment of this entity is on even 4 byte boundaries. As long as the entity falls on those boundaries, you know that the lowest 2 bits of any pointer to that entity will always be 0. We can exploit that fact with a bit of (OMG! is he going to do what I think he's going to do!!???) propeller-head bit twiddling. Now before you scroll down right now and hit the comment button and fire up the flame thrower, please bear with me.

Just like the Wait method of the TMonitor class, the SleepConditionVariableCS will simply place a structure allocated on the stack into a queue right before releasing the crit-sec, and block on an event. Because interacting with the queue does not use any inherent locking we need a thread-safe queue. We also only have that one field. How do we "lock" the list while it is being updated? The key is using one of those extra bits that we know will always be 0. If one of those bits is not 0, the list is locked. Here is the LockQueue and UnlockQueue functions:

type
  PWaitingThread = ^TWaitingThread;
  TWaitingThread = record
    Next: PWaitingThread;
    WaitEvent: THandle;
  end;

//UPDATE:  Slight modification based on comment
function LockQueue(var ConditionVariable: TRTLConditionVariable): PWaitingThread;
var
  ReleaseSlice: Boolean;
begin
  ReleaseSlice := GetProcessorCount < 2;  // This local could be eliminated by using a global
  while True do
  begin
    Result := PWaitingThread(Integer(ConditionVariable.Ptr) and not 1);
    if TInterlocked.CompareExchange(Integer(ConditionVariable.Ptr), 
      Integer(Result) or 1, Integer(Result)) = Integer(Result) then
      Break;
    if ReleaseSlice then
      SwitchToThread
    else
    asm
      PAUSE
    end;
  end;
end;

procedure UnlockQueue(var ConditionVariable: TRTLConditionVariable; WaitQueue: PWaitingThread);
begin
  ConditionVariable.Ptr := Pointer(Integer(WaitQueue) and not 1);
end;

It should be immediately obvious that the LockQueue function will never return until it can successfully lock the queue. It should also be noted that it will also spin-lock indefinitely, which can burn lots of cycles. Under a single proc system, doing a busy-wait is not cool so we should just release this thread's timeslice and let the scheduler get another thread to run. (Sleep(0); is not appropriate since that only allows threads with equal or greater priority to run, SwitchToThread is far more fair and much more like a traditional "yield"). Now that we have the tail of the linked list returned from LockQueue, we can manipulate the queue all we want. We just cannot "touch" the ConditionVariable.Ptr field and can only operate with the value returned from LockQueue. Once we're done we unlock the Queue by passing the potentially updated tail pointer to the UnlockQueue which will simply update the Ptr field of the ConditionVariable. About the performance implications; In theory, the queue should never grow very (one entry per waiting thread per condition variable) deep so the queue operations should be fast enough to not adversely affect performance (although I know it will a little bit). A final note about this technique is that it does not allow any type of recursion, so you much make sure there are no calls that could call LockQueue on the same condition variable while the lock is held.

If you haven't lost your lunch or broken out in hives by now, let's move on to the SleepConditionVariableCS function. It is commented similar to that of the TMonitor.Wait function.

function SleepCriticalSectionCS(var ConditionVariable: TRTLConditionVariable;
  var CriticalSection: TRTLCriticalSection; dwMilliseconds: DWORD): BOOL;
var
  WaitingThread: TWaitingThread;
  RecursionCount: Integer;
begin
  if CriticalSection.OwningThread = GetCurrentThreadId then
  begin
    WaitingThread.Next := nil;
    WaitingThread.Thread := CriticalSection.OwningThread;
    WaitingThread.WaitEvent := CreateEvent(nil, False, False, nil);
    try
      // Save the current recursion count
      RecursionCount := CriticalSection.RecursionCount;
      // Add the current thread to the waiting queue
      QueueWaiter(ConditionVariable, WaitingThread);
      // Set it back to almost released
      CriticalSection.RecursionCount := 1;
      TInterlocked.Add(CriticalSection.LockCount, -(RecursionCount - 1));
      // Release and get in line for someone to do a Wake or WakeAll
      LeaveCriticalSection(CriticalSection);
      // This is, admitedly, a potential race condition... what do you think?
      Result := WaitForSingleObject(WaitingThread.WaitEvent, Timeout) = WAIT_OBJECT_0;
      // Got to get the lock back and block waiting for it.
      EnterCriticalSection(CriticalSection);
      // Remove any dangling waiters from the list
      RemoveWaiter(ConditionVariable, WaitingThread);
      // Lets restore all the recursion and lock counts
      TInterlocked.Add(CriticalSection.LockCount, RecursionCount - 1);
      CriticalSection.RecursionCount := RecursionCount;
    finally
      CloseHandle(WaitingThread.WaitEvent);
    end;
  end else
    Result := False;  // should call SetLastError with the appropriate error code.
end;

Upon entry, we make sure that the calling thread owns the lock and if not return false (and as the comment suggests we should call SetLastError with the appropriate error code). Once we determine that the calling thread owns the lock, we need to start the process of tearing down the critical section and blocking on the wait event. The main difference between TMonitor.Wait and this function is that we still want to make sure that any threads waiting for the critical section are properly woken up, so we want to first tear down the critical section to within 1 count of being released, then call LeaveCriticalSection() which will do the final deed and unblock an waiters. We then immediately block on the wait event. Once we return from the wait either by being signaled or timing out, we have to re-acquire the critical section. This will only be a recursion level of 1, so we now need to restore any recursions to it. The QueueWaiter and RemoveWaiter functions use the LockQueue and UnlockQueue functions above to lock and manage the condition variable's waiting thread queue. Like the TMonitor.Pulse and TMonitor.PulseAll functions, the Wake and WakeAll functions are very simple.

procedure WakeAllConditionVariable(var ConditionVariable: TRTLConditionVariable);
var
  WaitingThread: PWaitingThread;
begin
  WaitingThread := DequeueWaiter(ConditionVariable);
  if WaitingThread <> nil then
    SetEvent(WaitingThread.WaitEvent);
end;

procedure WakeAllConditionVariable(var ConditionVariable: TRTLConditionVariable);
var
  WaitQueue, WaitingThread: PWaitingThread;
begin
  WaitQueue := LockQueue(ConditionVariable);
  try
    WaitingThread := DequeueWaiterNoLock(WaitQueue);
    while WaitingThread <> nil do
    begin
      SetEvent(WaitingThread.WaitEvent);
      WaitingThread := DequeueWaiterNoLock(WaitQueue);
    end;
  finally
    UnlockQueue(ConditionVariable, WaitQueue);
  end;
end;

For the WakeAllConditionVariable function, we want to atomically wake up all current waiters and don't want any intervening threads to be injected into the queue. So we have a special no lock version of DequeueWaiter which only takes the tail pointer passed by reference ("var" parameter). It will update the tail pointer appropriately, so when UnlockQueue is called, it should be nil. In fact DequeueWaiter simply locks the queue and then calls DequeueWaiterNoLock then unlocks the queue.

I'll totally understand if you have the sudden urge to take a shower, or to stick red-hot needles into your eyeballs... but sometimes it is just a little fun to break the rules (evil grin :-). I will not be surprised if this shows up on Coding Horror or The Daily WTF for public ridicule :-) As for an alternative implementation, all I want is an efficient user-level implementation of keyed events. That would eliminate the queue and make this function work more like the Vista version.

Friday, January 18, 2008

Simmering Unicode, bring DPL to a boil (Part 2)

As promised, I'll be covering the next group of methods on the TMonitor class. Wait, Pulse, and PulseAll. Before I do that, I want to quickly cover the spin-lock section of the Enter method from this post. (UPDATE: For the record, the TInterlocked class does not exist yet in any shipping VCL. You can simply replace these calls with the direct call to the Windows InterlockedXXXX functions).

It may seem counter-intuitive when you look at the loop in the first section of the Enter method. Why "waste" time in a loop when you could just block and release this thread's time-slice for some other thread? It has to do with the overhead and expense that goes along with blocking on a kernel object, returning to the OS thread scheduler and finally passing control to another thread. This could take literally thousands of CPU cycles and by the time this has all happened, the lock may have been released and the "waiting" thread can have the lock. Spin-locks are not for every case and if used improperly can actually do exactly what it looks like they do, waste cycles. The spin-lock will work best in situations where you have more than one CPU core, and more than one thread vying for the same resource in a relatively tight loop.

If a given core can just wait for a few cycles before doing the expensive and dreaded full-on kernel-level block, you can increase performance by drastically reducing user-level caused kernel-mode switches. Yes, the OS pre-emptive scheduler is still going to pre-empt the thread at various points, but those are things you cannot control. You can, however, control voluntarily giving up your time-slice. Remember, having a non-zero spin-count only makes sense for a system with more than one CPU core. In the case of systems with one CPU core, you must relinquish your time-slice in order to give the other "thread" a chance to actually complete its task. For this reason, the TMonitor makes sure there is more than one CPU core before allowing the internal FSpinCount field to be set to anything other than 0. Since the lock count is not a fixed value, that also means it is a tunable item. It allows you to "tune" the value that works best given the specific code in which it is used. The Windows critical section allows this value to be "tuned" in the same way using the InitializeCriticalSectionAndSpinCount API.

Some of the comments I've gotten from the previous post seem to me to indicate that some folks didn't read the whole post or simply missed a few key points. The whole point about that post was not that I'm trying to make a better critical section than what is provided by the OS. It was that until Windows Vista(and Win2008) you cannot use a critical section object very easily with other synchronization primitives. There was also some comments about how we shouldn't be spending any time on traditional locking since it is not "innovative." Not every little problem out there can (and should) be solved using things like continuation passing style, or other functional language-like constructs. Ultimately, some form of locking is going to be needed.

Let's start with the Wait function. This function's job is to atomically release the monitor lock and then block until some other thread calls Pulse or PulseAll. So you can actually be deeply nested in Enter calls on the monitor and then call Wait and the lock is completely released and the thread will block. This will allow another thread that may be blocked on the Enter call to gain the lock. Before Wait returns it will have re-acquired the lock and have properly restored the recursion state. Calling Wait implies a contract that some other thread must call Pulse or PulseAll before this thread can continue. If that never happens and the Timeout is INFINITE, it will block forever. Here's the Enter function:

function TMonitor.Wait(Timeout: Cardinal): Boolean;
var
  RecursionCount: Integer;
  WaitingThread: TWaitingThread;
begin
  WaitingThread.Next := nil;
  WaitingThread.Thread := CheckOwningThread;
  // This event should probably be cached someplace.
  // Probably not on the instance since this is a per-thread-per-instance resource
  WaitingThread.WaitEvent := CreateEvent(nil, False, False, nil);
  try
    // Save the current recursion count
    RecursionCount := FRecursionCount;
    // Add the current thread to the waiting queue
    QueueWaiter(WaitingThread);
    // Set it back to almost released
    FRecursionCount := 0;
    // This thread is no longer the owner either
    FOwningThread := 0;
    // Release and get in line for someone to do a Pulse or PulseAll
    if TInterlocked.Add(Integer(FLockCount), -RecursionCount) >= 0 then
      Result := SignalObjectAndWait(GetEvent, WaitingThread.WaitEvent, Timeout, False) = WAIT_OBJECT_0
    else
      Result := WaitForSingleObject(WaitingThread.WaitEvent, Timeout) = WAIT_OBJECT_0;
    // Got to get the lock back and block waiting for it.
    Enter;
    // Remove any dangling waiters from the list
    RemoveWaiter(WaitingThread);
    // Lets restore all the recursion and lock counts
    TInterlocked.Add(Integer(FLockCount), RecursionCount - 1);
    FRecursionCount := RecursionCount;
  finally
    CloseHandle(WaitingThread.WaitEvent);
  end;
end;

(Note: Before someone chimes in and complains about the heavy handed CreateEvent/CloseHandle calls... please read the comments).

Since Wait can only be called while holding the lock, the CheckOwningThread call will raise an exception if the calling thread doesn't own the lock. Now that we know we own the lock, we start the process of "unwinding" the lock. We need to save off the recursion count since this has to be restored exactly. The TWaitingThread type is merely a record and is strung together as a linked list among all the potential waiters by the call to QueueWaiter. In fact, QueueWaiter doesn't need to do any "lock-free" style linked list machinations since it can only be called while the lock is held so no other threads can get there. The queue is simply a FIFO (First-In-First-Out) in order to keep some level of fairness. Yes, I know, it is not totally fair if the waiting threads all have differing priorities. In general, this Wait locking style is used among several equal priority worker threads, so that priority-inversions are less common.

Once we've added this thread to wait queue, we need to set about relinquishing the actual lock and prepare to block on the event created above. This is done by explicitly setting the FRecursionCount and FOwningThread fields back to 0. The next tricky part is to now remove all the recursion counts from the FLockCount field. Since this field is touched from other concurrent threads, we need to interlock it. There is no Interlocked subtract function so we simply add the twos-compliment negation of the the RecursionCount. Just like in the Leave function we need to see if there are other threads currently waiting for the lock. If the result of the subtraction of the recursion count from the lock still yields a value greater than or equal to 0, there are waiters. In this case we need to use the nice atomic function SignalObjectAndWait. This function will atomically signal the lock's event handle and then turn around and immediately block on the event handle we created within this function. If there are no other threads waiting for the lock, there is no need to signal the lock event and a simple WaitForSingleObject is all that is needed.

Once the wait event is signaled by the Pulse or PulseAll method, or the wait times out, the first thing we need to do is reacquire the lock. Yes, Virginia, this may also mean that this method will block, again! That is the contract to the caller. When Wait returns the calling thread has the lock. Once we have the lock again, we need to do some house keeping and state restoration. The first thing is to remove this thread from the wait queue. This is really *only* necessary in the case where the wait above timed out. Nobody dequeued the WatingThread record and set the event so it is still in the queue someplace. RemoveWaiter does nothing if we were signaled. Once I've proven to myself that this is the only case where the WaitingThread record could ever still be in the wait queue, I can add some code to check the result and only call RemoveWaiter if the result is false.

After we've cleaned up the potentially "dirty" queue, we now need to restore the nesting or recursion state. Returning from the Enter call we know that the FLockCount field has our lock added to it and the FRecursionCount field will be 1. We need to add back in our recursive locks, less the one Enter already gave us. Because we hold the lock we can simply blast over the FRecursionCount field.

Like the Exit method, the Pulse and PulseAll methods are very simple:

procedure TMonitor.Pulse;
var
  WaitingThread: PWaitingThread;
begin
  CheckOwningThread;
  WaitingThread := DequeueWaiter;
  if WaitingThread <> nil then
    SetEvent(WaitingThread.WaitEvent);
end;

procedure TMonitor.PulseAll;
var
  WaitingThread: PWaitingThread;
begin
  CheckOwningThread;
  WaitingThread := DequeueWaiter;
  while WaitingThread <> nil do
  begin
    SetEvent(WaitingThread.WaitEvent);
    WaitingThread := DequeueWaiter;
  end;
end;

There is the CheckOwningThread call again. Pulse and PulseAll can only be called while holding the lock. Since we are holding the lock, the DequeueWaiter function doesn't need any complicated locking code on the linked-list queue. It simply removes an item from the queue and returns it. Pulse will simply release the first thread in the queue if one exists. An added check could be made to make sure there really are waiting threads. PulseAll will loop and dequeue all waiting thread records and release them all.

So there we go. A Win32 implementation of a monitor. The TMonitor class is, right now, a proof-of-concept and will probably look much different should it make it into a future version of Delphi. The current thinking is that the field variables should actually be somehow "attached" to any arbitrary object instance, thus enabling a whole object to be locked. We're also looking at integrating this concept into the language itself through the use of some sort of LockObject standard function. Lots of possibilities. Stay tuned.

Thursday, January 17, 2008

Simmering Unicode, bring DPL to a boil

Now that most of the ingredients are in the Unicode pot, I'm going to move it to the back burner and turn it down to a low simmer. We'll come back to it later and give it a few stirs and make sure the seasoning is right. Meanwhile I want to start work in the next course, DPL, or the Delphi Parallel Library. Remember this is merely a working name for all the work and research I've been doing around adding more support for parallel/concurrent processing. DPL, IS NOT and official feature of Tiburón.

As a part of the my research into parallel computing I've also been reading up whole bunch on the notion of lock-free data structures. These little bits of computer science wonders really help take some of the most common and mundane data structures and algorithms and make them much more friendly to the world of multi-core systems. However, they're also not a panacea and not all environments lend themselves to making the most out of them. Some of the best write-ups on lock-free algorithms are the ones from Julian Bucknall of Developer Express. He has some excellent write-ups on lock-free Queues and Stacks. Although these examples are written in C# and do depend on the nicely garbage collected universe in which it runs (.NET), the concepts mostly transfer. In one of the articles Julian even threatened that he'd translated some of these items into Delphi/Win32 code... Hmm... I wonder if he solved the one somewhat intractable problem about node deletion in the underlying linked lists? I'm still trying to work on that one myself, so Julian, if you've got something clever, let me know.

While trying to avoid locks is a laudable goal, at some point you're just going to have to use some form of locking. So I've been working on an implementation of a TMonitor class. .NET has another nice advantage here that the notion of a Monitor is intrinsic to any object instance in .NET. Eventually this is how I want the TMonitor to work as well. So what is a Monitor? It is basically the combination of a lightweight mutual exclusion lock and a condition variable. In fact the wikipedia article about a condition variable mentions the monitor as being that combination. Up until Windows Vista, Windows has not had any notion of a condition variable. Even Microsoft has admitted that the event based technique of using an auto-reset event with the PulseEvent API was fraught with problems. Even the PulseEvent documentation says you should no longer use it and instead use the new Condition variable APIs. Cool! Great! We now have intrinsic OS support for creating a full TMonitor class, right? Not so fast. The problem is that these APIs only appeared in Windows Vista and Server 2008! Grrrrr. I want my code to work at least with Windows 2000.

So, now I'm stuck with figuring out how to create a monitor without the benefit of the OS. So, off I went to research the bits an pieces of what I need to do this. I knew that POSIX threads (pthreads) have the notion of a condition variable so I wondered how those POSIX Win32 layers implement them. That was when I stumbled upon this cool article about implementing POSIX condition variables on Win32. It is a few years old but was also very informative. So off I went and was able to add a new TConditionVariable class to the SyncObjs.pas unit. The problem is that the underlying implementation, how shall I say it?, less than modern and is very academic. The other problem is that it depends on a kernel-level mutex handle as the surrounding lock. Mutexes are great if you want cross-process mutual exclusions or you need a handle on which to pass to a WaitForMultipleObjectEx call. Other than that, the real mutual exclusion work-horse is the critical section. Critical sections are process local, lightweight and fast. The nice thing about them is that if there is no contention (only one thread tries to acquire the crit-sec at a time), no locking happens. This means the code executes purely in user-space and no expensive kernel-mode switch is needed. Another problem is that critical sections are (supposed to be) opaque structures. Even though the Windows API headers clearly show all the fields and give them nice names like LockCount, RecursionCount, etc... one should not muck with them directly. If for whatever reason, Microsoft decided to change the structure or change the underlying implementation, any code that directly touched those fields is asking to to be spanked, and spanked hard.

What to do... what to do... While the specific critical section structure and related APIs are opaque, the underlying implementation is fairly straight forward. The tricky part will be in adding the condition variable-esque aspects, notably the Wait/Notify(Pulse)/NotifyAll(PulseAll) functionality. Lets start with the mutex (critical section) bits. We need only three APIs for this, Enter, Leave(Exit), and TryEnter. Just like the OS' critical section we need a few fields to track the state of the critical section. Here's the declaration of what is needed so far:

TMonitor = class sealed
strict private
  FLockCount: Integer;  // Initialized to -1, no lock holders
  FRecursionCount: Integer; // Keep track of how many times we've reentered
  FOwningThread: Cardinal; // Who currently owns the lock
  FSpinCount: Integer; // Special busy-wait spin-lock for multi-proc systems
  FLockEvent: THandle; // The auto-reset event on which to wait when in contention
  function GetEvent: Cardinal;
  function CheckOwningThread: Cardinal;
  class var FProcessorCount: Integer;
  class function GetProcessorCount: Integer; static;
public
  constructor Create; overload;
  constructor Create(ASpinCount: Integer); overload;
  destructor Destroy; override;

  procedure Enter; overload; inline;
  function Enter(Timeout: Cardinal): Boolean; overload;
  procedure Exit;
  function TryEnter: Boolean;
end;

Let's start with the TryEnter function since is the simplest and most straight forward since it never blocks the calling process. Here it is including the comments! (wow, he commented the code :-).

function TMonitor.TryEnter: Boolean;
begin
  // First things first - initially check to see if we can gain ownership
  if TInterlocked.CompareExchange(FLockCount, 0, -1) = -1 then
  begin
    //  Yep, got it.  Now claim ownership
    FOwningThread := GetCurrentThreadId;
    FRecursionCount := 1;
    Result := True;
  end else if FOwningThread = GetCurrentThreadId then  // check for recursion
  begin
    // We're recursing now, but FLockCount still needs to be guarded
    TInterlocked.Increment(FLockCount);
    // Only the owning thread can increment this value so no need to guard it
    Inc(FRecursionCount);
    Result := True;
  end else
    Result := False;
end;

The first thing we want to do is just try to get the lock. That is what the CompareExchange does. (oh, BTW the TInterlocked class is just a collection of thin class static functions that wrap the OS interlockedXXXX functions). CompareExchange is more commonly referred to as a Compare and Swap or CAS for short. This atomic function is the basis by which nearly all other atomics can be implemented. Learn it. Understand it. It is that important. Since the FLockCount is initialized to -1, we try to swap in a 0 if and only if FLockCount is a -1. This is done as an atomic operation which means while this operation proceeds, no thread switching, or other memory modifications can occur from anywhere (other CPUs, DMA operations, etc.). The CPU Bus is locked down for the duration of that one operation which is typically a single CPU instruction, CMPXCH on the the x86 platform. This is extremely important for this all to work. If CompareExchange returns a -1, then we know that the original value of FLockCount was -1 and the 0 was successfully "swapped" in. If it returns anything other than -1, someone else has already gotten here first or we already hold the lock. We check the latter by looking at the FOwningThread field and comparing it to the current thread's Id. If we already own the lock, then this is a recursion case and we just need to do some housekeeping and return. Notice that we don't need to use the Interlocked inc on the FRecursionCount field since only the owner of the lock can touch that field. If some other thread owns the lock, then we simply return indicating that we cannot get the lock.

If TryEnter was the vegatables for this course, then Enter is the meat. This is where things can get interesting. Here's the Enter function in all it's glory.

function TMonitor.Enter(Timeout: Cardinal): Boolean;
var
  SpinCount: Integer;
begin
  // Get the spin count
  if FSpinCount > 0 then
  begin
    Result := TryEnter;
    if Result then
      System.Exit;
    SpinCount := FSpinCount;
    while SpinCount > 0 do
    begin
      // if there are already waiters, don't bother spinning
      if FLockCount > 0 then
        Break;
      // Try to get the lock
      if FLockCount = -1 then
        if TInterlocked.CompareExchange(FLockCount, 0, -1) = -1 then
        begin
          FOwningThread := GetCurrentThreadId;
          FRecursionCount := 1;
          Result := True;
          System.Exit;
        end;
      asm
        REP NOP // Just do nothing here...
      end;
      Dec(SpinCount);
      // Keep trying until the spin count expires
    end;
  end;
  Result := True;
  // Add our count to the lock
  if TInterlocked.Increment(FLockCount) > 0 then
  begin
    // Now check if we're the owner
    if FOwningThread = GetCurrentThreadId then
      // Yep, we're the owner so just recurse
      Inc(FRecursionCount)
    else
    begin
      // We're not the owner, so blocking is needed
      // GetEvent does a "safe" allocation of the Event
      Result := WaitForSingleObject(GetEvent, Timeout) = WAIT_OBJECT_0;
      if Result then
      begin
        FOwningThread := GetCurrentThreadId;
        FRecursionCount := 1;
      end else
        // We timed out, remove our presence from the lock count
        TInterlocked.Decrement(FLockCount);
    end;
  end else
  begin
    FOwningThread := GetCurrentThreadId;
    FRecursionCount := 1;
  end;
end;

Let's ignore the spin-lock at the beginning of this function for now. That section deserves to be covered separately. Starting right after the spin-lock loop, we first try to add our count and/or acquire the lock with a simple interlocked inc of the FLockCount field. If this lock was not already owned, the result will be 0 (remember it starts at -1) so the else section is taken and we claim ownership of the lock. If the result is greater than 0, then that means someone else has to the lock or we're in a recursion. We do the same FOwningThread check as was done in TryEnter and if we already own the lock simply add to the recursion count. If we don't own the lock, then we're in contention and have to block. Unlike TryEnter, Enter either returns and the lock is now owned by the calling thread, or it timed out and never was able to get the lock. This timeout is something that the OS' critical section does not support. We'll cover it in a moment, but another neat concurrency trick is in the GetEvent method. We call WaitForSingleObject on the event handle, which is an auto-reset event, with the timeout the caller specified. The overloaded/inlined Enter method simply calls this version of Enter with INFINITE as the timeout. If the wait returns that the object was signaled that means we now own the lock (FLockCount should be >= 0 at this point) and we can claim it by setting the FOwningThread and the initial recursion count. However, if we timed out and could not get the lock, we simply need to decrement the FLockCount field to remove the fact that we were waiting in line for the lock.

The simplest and easiest to understand function is Exit. Here's what it looks like:

procedure TMonitor.Exit;
begin
  CheckOwningThread;
  Dec(FRecursionCount);
  if FRecursionCount > 0 then
    TInterlocked.Decrement(FLockCount)
  else
  begin
    FOwningThread := 0;
    if TInterlocked.Decrement(FLockCount) >= 0 then
      SetEvent(GetEvent);
  end;
end;

The CheckOwningThread thread is simply a function that ensures that the calling thread does in fact own the lock. It will raise an exception if that is not the case because Exit is only valid to be called if the lock is owned by the calling thread. If we do own the lock (an exception wasn't raised), we just decrement the FRecursionCount field. Remember, this is safe because only the lock owner can touch that field. Now we check if we still have recursion counts because Enter was called more than once. If so we just need to remove our presence from the FLockCount field. If FRecursionCount drops to 0 (we still "own" the lock) we know we need to release the lock and make sure one of the waiters are woken up and given the lock. We clear the FOwningThread field to relinquish the claim of ownership and then remove the final count on the FLockCount field. If there are still counts on it the result of the Interlocked decrement will be greater than or equal to 0 which means there is at least one thread blocked waiting for the lock. In that case we just set the event and then return.

What about the GetEvent method I mentioned above? This is where our good friend the CAS function really helps us out. Here's the GetEvent method:

function TMonitor.GetEvent: THandle;
var
  Event: THandle;
begin
  Result := FLockEvent;
  if Result = 0 then
  begin
    Event := CreateEvent(nil, False, False, nil);
    Result := TInterlocked.CompareExchange(Integer(FLockEvent), Event, 0);
    if Result = 0 then
      // We won!  Nobody else was trying to allocate the Event.
      Result := Event
    else
      // Oh Well. We tried. Close the handle if someone got to it first.
      CloseHandle(Event);
  end;
end;

The first thing we do is simply check if it has already been allocated. If so, just return it. If not, we now enter a race to see which thread gets to create the event. This is an intentional race and if you look closely at the code the result of the CAS operation tells us whether or not we won the race. So we create the new event object, then try and swap it into the FLockEvent field. The CAS will make sure FLockEvent is 0 and of so assign the Event value to it. It will then return the original value of FLockEvent. If by the time the CAS operation is performed, another thread was able to assign the FLockEvent field, then the CAS operation won't change the FLockEvent field. It will still return the original value of FLockEvent which is assigned to the result. This leave the dangling Event local variable, which we simply close it to release it back to the OS. Theoretically, there could be as many events as there are threads running the GetEvent method created, but only one will ever be assigned to the FLockEvent. Remember, the code within the if block will only execute if at the point of the test, the FLockEvent field isn't assigned. Once some thread is able to allocate the event, the next time this function simply grabs the value of the field and returns it.

There you have it, we just recreated a lightweight critical section with the same advantages that the OS critical section possess. It tries to remain in user-space for most things and only uses a kernel object when there is contention for the resource. Next time I'll cover the spin-lock in the Enter method and explain how it can serve to boost performance on multiple processor (or multi-core) systems. I'll also present the Wait/Pulse/PulseAll methods.

NOTE: Yes, I know that there is a chance that the CreateEvent call in GetEvent can fail if the system is low on resources. There is an excellent explanation on Joe Duffy's blog about this along with some interesting tidbits about the new condition variables in Vista. If you're interested in threading, concurrency, and parallelism, Joe's blog is a must-read. Just today he's posted some random thoughts on handling blocked threads and work schedulers. Which is rather interesting because all this stuff about a TMonitor class will eventually come back around to my latest foray into creating a thread pool... which is actually along the lines of his #2 commonplace solution. Until Delphi gets some kind of continuation passing style (CPS) that will probably continue to be the solution.

Friday, January 11, 2008

Unicode Character Categorization.

Another very common item that will hit many folks is how do you easily determine what category a particular Unicode code-point belongs to. In the simpler time of ASCII (American Standard Code for Information Interchange) and even the slightly more enlightened period of ISO-8859-x most folks would just assume that if a code-point fell between $41 and $5A, it was an upper-case letter (never mind all those ISO-8859-x characters with diacritics up above $80). For compilers (or even interpreted and scripting languages), it was just this simple since most compilers didn't allow identifier characters outside the normal ASCII range of ($00-$7F).

This just isn't the case anymore. With literally many thousands of different code-points available in the whole of the Unicode specification, simple range tests (and even the more sophisticated set "in" tests) just do not cut it. While you will still be able to do things like this:

function NextWord(const S: string; var I: Integer): string; var Start: Integer; begin // Skip any leading whitespace or control characters while (I <= Length(S)) and (S[I] in [#1..#31,' ']) do Inc(I); Start := I; // Now gather up the word while (I <= Length(S)) and (S[I] in ['A'..'Z', 'a'..'z']) do Inc(I); Result := Copy(S, Start, I - Start); end;

This code works fine for much of the western world... as long as there are no ä, û or similar characters. Sure this code could be updated to include those characters, but the definition of the code points > $80 differ between ISO-8859-1 (Latin1) and, say, ISO-8859-5 (Cyrillic). For example, the code-point for the Latin1 ä ($E4) is the Cyrillic Ф ($E4). This doesn't even take into account the far eastern languages, where the number of code-points far exceed the single-byte range (0-$FF). In those cases they have to provide escape maps. There is a "lead-byte" which tells processing code to interpret the next byte using a different table. There can be many different lead-bytes that denote different tables. That's the bad news.

Now for the good news. First of all, with Tiburón, the above code will compile just fine (you'll get a warning "WideChar reduced to byte char in set expressions" on the "in" expressions) and will actually function as originally designed. The compiler will still generate code to ensure that the WideChar at S[I] is within the byte range of the set (Object Pascal sets can only have 256 elements and can only be a maximum of 32 bytes in size). So you won't get any false-positives if the string contained, say, Ł (U+0141). Also, if you are OK with how the above works and still do not need this code to accept the full gamut of Unicode code-points and want to suppress the warning just add the {$WARN WIDECHAR_REDUCED OFF} directive to the source, or the -W-WIDECHAR_REDUCED command-line switch. The compiler told you about a potential problem and you now have an idea of how to fix it.

But what if you really wanted to make this code a little more Unicode-friendly? With the many thousands of Unicode code-points, it is simply impractical to provide a huge set that now holds > 256 elements. That is where we'll be introducing a full gamut of character categorization functions. To get an idea of what I'm talking about, we've modeled these functions after a similar set of functions in the Microsoft .NET framework. Take a look at the System.Char class. It has many static functions such as IsLetter, IsNumber, IsPunctuation, etc... How could the above code be written to be more Unicode friendly?

function NextWord(const S: string; var I: Integer): string; var Start: Integer; begin // Skip any leading whitespace or control characters while (I <= Length(S)) and (IsWhitespace(S, I) or IsControl(S, I)) do Inc(I); Start := I; // Now gather up the word while (I <= Length(S)) and IsLetter(S, I) do begin if IsSurrogate(S, I) then Inc(I); Inc(I); end; Result := Copy(S, Start, I - Start); end;

The implementation of these functions is fully table driven and is derived directly from information provided by the Unicode.org. The data is processed, compressed and accessed using very fast table look-ups. This processing of the tables is done during our build process, which generates a resource which is then linked in. It is based on the latest version (5.0) of the Unicode specification, so if that ever changes, all we'll need to do is get the latest data files and process them.

Thursday, January 10, 2008

And now the $100,000 question.

Will there be a switch to control when string = UnicodeString?

The current assumption about that is, no. Let me explain why.

DCU compatibility and linking problems - Suppose you built unit A with the switch to Unicode mode. Then you built unit B with it off. You can not link Unit A with B because of the type mismatches. Sure, some things might work, but lots of things won't. Things like assigning event handlers, var/out parameters and other would seem to fail with type mismatches. It gets even more confusing and frustrating for the user when they look at the source code and see string declared everywhere, yet things don't work. Even if we delivered two complete sets of DCUs, one for AnsiString and one for UnicodeString, there is still the problem with packages, design-time components, and third-parties.

The IDE would be natively built in Unicode mode - This requires that all design-time components be built in Unicode mode. You could not install a component package or IDE enhancement built in Ansi mode. The upshot of this is that the design-time experience could be very different than runtime.

Third-parties would have to deliver Ansi and Unicode versions of their components - I don't want to impose this kind of effort on our valuable and critical third-party market. From a source code standpoint, there is a lot (a majority) of code that can be compiled either way, but that doesn't help with testing, packaging, and install size. Delivering two versions of the same library for the same version of Delphi just doubled the testing effort.

Piecemeal moves to Unicode seem "safe" and are easier to "sell" to management, but there are real land mines - When you look at Unicode as merely a "way to display Kanji or Cyrillic characters on the screen" it is easy to conceptually think Unicode is only about display rendering. Unicode is really about a storage and data format for human-readable text. Display rendering is only one part of it. If you strictly relegate Unicode only to the visual portions of your application, you're ignoring the real "meat" of your application. A holistic approach is needed. If one were to take the seemingly "safe" route and only do a portion of the application at a time, you run the risk of hiding an untold number of "pinch-points" throughout your application where implicit or even explicit conversions are taking place. If a Unicode code-point does not map to the currently active Ansi codepage, that character(code-point) is dropped during the conversion. When it comes back out of the system and needs to be rendered, data is lost. The trick is finding all those "pinch-points" and figuring out what to do with them.

Character indexing and normal string manipulation remain unchanged - The UnicodeString is still a 1-based index, reference counted, lifetime managed entity. It is no different from AnsiString in this regard. The difference is that it has a payload of UTF16 (word-sized) elements instead of byte sized elements. String assignments, indexing, implicit conversions, etc all continue to work as expected. Length(UnicodeStringVar) returns the number of elements the same as Length(AnsiStringVar).

Code that must use AnsiStrings should be explicit - If your code absolutely must use AnsiStrings, you can explicitly change the declarations to AnsiString. You can do this right now with your existing code.

string is already a Unicode string - In the Delphi for .NET compiler, string has been equivalent to System.String which is a UTF16 element based string. Many of our customers have already had to deal with this fact, and have survived the transition very well.

An example.

As we've been working on making sure things compile with the new Unicode compiler, it has been surprising even for us as to how much code we have that simply just works. SysUtils.pas contains a lot of filename manipulation functions that do a lot of string manipulation internally. Take ExtractFileExt(), for example;

function ExtractFileExt(const FileName: string): string; var I: Integer; begin I := LastDelimiter('.' + PathDelim + DriveDelim, FileName); if (I > 0) and (FileName[I] = '.') then Result := Copy(FileName, I, MaxInt) else Result := ''; end;

This function simply recompiled and worked as is. Granted, it is only a few lines of code, but what I'm not showing here is code path for the LastDelimiter function. This function cases the Filename parameter to a PChar, then calls StrScan. Since all the functions that take PChar parameters do not do implicit conversions, we've provided overloaded versions of these functions. So even if you do a lot of "PChar" buffer manipulation, we've got those functions covered.

Beefed up warnings and hints.

Another thing we're doing to try and help folks easily identify sections of their code where they may need to inspect it, is the addition of more warnings. When the compiler sees certain code constructs such as implicit string conversions, strange pointer casting, etc extra diagnostic information will be output. Another compiler feature we've added is the ability to elevate any one or all warnings to be an error. We've actually been going through our own code (and I'm a little embarrassed to say we haven't been particularly "warning free") eliminating all warnings from the code and then elevating the warnings to a error. Now our own build processes will literally fail when someone checks in code that generates a warning.

Illusions, Delusions, Fantasy and Reality.

I hold no delusions that this change will be bump-free and every lick of code out there will work without a hitch. There will be a class of applications and libraries that will be affected far more than others. Our goal is to ensure that the vast majority of our users out there will see as little disruption as possible. Also, for those cases where disruption is bound to happen, we're working on providing tooling and education to assist in this transition. The cold-hard reality is that this change is arguably late in coming. This has been a perennial request for at least the last 5 - 6 years (probably longer). Getting on track, focusing our efforts, and addressing a real need for a large segment of our customers is sometimes a little painful.

Maintaining a strong bias for backward compatibility has come at a price. There are segments of customers clamoring for major sweeping changes to things like VCL (add skinning, a new data binding model, XML streaming, etc.). We, too, fantasize regularly about things like "what if we could ignore the past and just pick up the pieces and go for it?" The cold-hard reality is that we've built over the last 13 years high expectations about what customers have come to rely on from release to release. I know that.

For many of you, your first exposure to Delphi was maybe version 3 or later. Many of you never experienced the largest transition in Delphi's history, the move from 16bit Windows to 32bit Windows in the Delphi 1 to 2 cycle. Lots of changes happened there. The Integer data type grew to a 32 bit entity. string became a managed, reference counted, heap-allocated, entity that also managed to maintain a lot of "value semantics" encapsulated in an underlying reference type. The change was embraced because finally a string could hold huge amounts of data. You could "cast" the string to a PChar and call a Windows API function since it was also null-terminated.

Today, I realize that the landscape is different. There are many, many years of history. The Internet has permeated our very existence. The world has shrunk on account of this new level of connectivity. Countless millions of lines of code have been written. Given all of that, the need to communicate using a common, unified and standard encoding is paramount.

As Delphi moves into emerging markets, especially in the far east, if we are to continue to find acceptance and carve out our place, strong clear Unicode support is paramount. In many of those markets, governments are beginning to legislate and enforce how applications interact with character data. While this doesn't necessarily apply to the private sector, that sector does take a cue from those requirements and cannot afford to one day be shut out of certain jobs and markets. They too see the value and reasoning behind these rule and elect to follow suit.

Finally, I do not intend to fully shut the door on this issue as I know it will have (is having) a polarizing effect. I do, however, want to make sure people get as informed as possible. Agree or disagree, that's fine. One thing I learned early on in my career here at CodeGear (and Borland) was to truly think about a problem. Don't just pop-off with the first reason you can find about why something won't work. Also, continue to challenge your own conclusions and position. Don't be afraid to be wrong (and don't assume that is advice only for the "other guy" either). Get the facts. I'll help by presenting as many facts as I'm able. Let the games begin :-).

Wednesday, January 9, 2008

More FAQs about Unicode in Tiburón

Based on some of the comments I've already received, here are a few more common questions folks have:

Why not just provide a new set of controls for Unicode?

Because Unicode is not just about displaying characters. If Unicode were relegated only to the display surface, and all the rest of the program were left alone, you're not really Unicode enabled. To do Unicode correctly, you really have to start shedding the whole ASCII/ANSI notion of character data. Even with only one function remaining as Ansi within an application, you've introduced what I call a "pinch-point." If you look at how the character data flows through a system, if there are any places that perform implicit or explicit down/up conversions, the potential for data loss has increased. The more pinch-points in an application the less likely it will function properly with code-points outside of the active code page. We simply cannot take a piecemeal approach to this change. As we did the analysis, the problems with an incremental approach far outweigh the complete approach.

What about C++? All the .HPP files and code is generated as AnsiString.

This has been a major sticking point. Truth be told, this problem came nearly to scuttling the whole project if we could not find a workable solution. The solution we came up with probably deserves a post all to itself. Several other interesting things "fell" out of this solution that will benefit everyone.

All of my files are in ASCII/ANSI. Will they have to be Unicode now? Can I still read the old versions?

For this we "stole" a few items from the .NET playbook. There is now a TEncoding class that does exactly what it implies. In fact it closely matches the System.Text.Encoding class in interface and functionality. We also have done a lot of work to single-source the VCL for Win32 and the VCL for .NET. Because these code bases are heavily shared (with very few IFDEF's), we can provide like functionality. A lot of you use TStrings.[LoadFrom|SaveTo]File or TString.[LoadFrom|SaveTo]Stream. There are now overloaded versions that take an instance of the TEncoding class. The LoadFromFile function also now defaults to checking for the Byte Order Mark (BOM) and will select the correct encoding based on that. Other traditional File I/O or any manual I/O using TStream.Read or TStream.Write, will need to be reviewed for possible adjustments. Text File Device Drivers (TFDD), and if you don't know what they are you can ignore this, will always do down conversions from Unicode to Ansi since this is what redirected Console I/O expects, and what most existing code that uses them already expects.

Do all my sources files have to be Unicode now?

Nope. You can continue to leave your source files encoded as Ansi. The IDE code editor has supported many different encodings for several releases. UTF8 would be a good encoding to use if you plan on embedding higher order code points in strings and/or comments.

What about my existing project and the DFM files?

The streaming format for DFM files has not changed. It has supported strings encoded as Unicode for several releases already. This was required all the way back in the Kylix/CLX days and more recently because of VCL for .NET. Preemptive side comment: Hey! DFM files should really be XML! Different discussion and not germane to this post.

I'm sure there will be plenty more comments and speculation along with the odd panic-stricken comment from some. I'll try to address as many as I can.

DPL & Unicode - a toss up.

So far it's looking like a toss-up between folks wanting more information on the Delphi Parallel Library and those wanting more information about the shift to Unicode. I think both are extremely important and it is no surprise given the feedback. Since it is still not clear whether or not DPL will make it into the next release, I may opt to begin talking more about Unicode... then again, maybe not :-).

Right now, I'm wrestling with some compiler issues related to debugging when a generic type is instantiated... needless to say it's making the work on DPL a little tough. This is par for the course when you're trying to retrofit the airplane while it is still in flight :-). If it takes more than a few days before this is resolved, I'll probably jump back over to Unicode. That area is working and the team is full speed ahead on it.

Just to clear some things up, I'm going to answer a few of the common questions folks have about the move to Unicode.

Is there a new Unicode string type or are you just using WideString?

Yes, there is a new data type. UnicodeString. It will be reference-counted just like AnsiString and unlike WideString which is a BSTR. This new data type will be mapped to string which is currently an AnsiString underlying type depending on how it is declared. Any string declared using the [] operator (string[64]) will still be a ShortString which is a length prefixed AnsiChar array. The UnicodeString payload will match that of the OS, which is UTF16. This means you can, at times, have surrogate pairs for characters. For characters that fall outside the Basic Multilingual Plane (BMP).

Will I be able to still use the AnsiString type?

Yes. No existing types are being taken away.

What about Char and PChar?

Char will be an alias for WideChar and PChar will be an alias for PWideChar.

Will I have to explicitly call the "W" versions of the Windows API?

For all the Windows API header translations that CodeGear provides, your code should not have to change to call the "W" version. Since it has always been our intent to make this change at some point in the future, we have been specially processing the header translations (since Delphi 2 if you must know ;-) to ease this transition. If you want more details on how we do this you can visit the JEDI website for guidelines on how to use these tools. We'll be providing some updates for these tools in order to properly process a header to use the "W" versions.

Why didn't you just use UTF8? It's more compact than UTF16.

This was considered. However, this would have forced far more conversions throughout the VCL code as it talks to the Windows API, and it would have introduced a lot of very subtle breakages in much of user code. While a lot of code out there already handles DBCS (Double-byte character sets), that same code does not correctly handle characters that consist of > 2 bytes. ~~In UTF8 a single character can be represented by as many as 6 bytes.~~ [Correction: This is not the case in true UTF8. 5 and 6 byte sequences are illegal in UTF8 (thanks Aleksander)] In UTF8 a single character can be represented by as many as 4 bytes. Finally, UTF16 is the native format used internally by Windows itself. By calling directly to the "W" APIs, the "A" translation layer that Windows has is bypassed and should, in theory, increase performance in some cases.

OMG!! All my code is going to break! I can't handle this!!

Now hold on there. Before you get your knickers in a knot, please take a moment to fully understand the impact of this change and how to best prepare for it today. As we're in this process of working in Tiburon, we've been capturing a lot of the common pitfalls and idioms many of you are likely to encounter. We'll also be working on ways to get this information out to our customers. Blogs, Whitepapers, and other articles will be the vehicles by which we'll provide this information. We do understand that there are some types of applications that will be affected more than others. Many of you have written your own handy-dandy library of string processing functions and classes. The top categories of things you'll need to watch out for are:

Assumptions about the size of Char.
Use of string as a storage medium for data other than character data.
SizeOf(Buffer) <> Length(Buffer) where Buffer: array[0..x] of Char;
File I/O (console I/O will still be down converted to Ansi data since you it can be redirected)
set of Char; should be changed to set of AnsiChar;
- You should also consider starting to use the new character classification functions in Tiburon.
If your code must still operate on Ansi string data, then simply be more explicit about it. Change the declaration to AnsiString, AnsiChar, and PAnsiChar. This can be done today and will recompile unchanged in Tiburon.

What about the Windows 9x OS?

Not going to happen. If you absolutely must continue to support those operating systems, RAD Studio 2007 is a great choice. I realize this may not be a popular decision for some markets, but it is getting harder and harder to support an operating system that is barely even tacitly supported by MS themselves. We've even looked into MSLU (Microsoft Layer for Unicode) and that is not a very viable option since in order to get it to work with Delphi we'd have to duplicate a lot of the code that is in the COFF based .LIB file that is provided only for VC++. Yes there is the unicows.dll, but that is not where the "magic" happens. So, Windows 2000 and newer will be the supported target platforms.

In the coming months, I'll try and show some common code constructs that will need to be modified along with a lot of common code that will just work either way. It is has been pleasantly surprising how much code works as the latter, and how easy it has been to get the former to behave like the latter.

Monday, January 7, 2008

A new year and a fresh start

I've been wrestling throughout the past week with what direction I should be taking this blog in the coming year. There's a lot of cool stuff coming down the pike for Delphi & C++Builder. I'm just not sure where to start. There are all the Unicode changes, parameterized types (Generics) for Win32, some VCL enhancements, a Delphi Parallel Library (DPL) and some others that I just cannot talk about.

So, where should I start? The Delphi Parallel Library stuff is cool and fascinating. Information to prepare for the transition to Unicode is also important. So what do you think? Let's just make this post the official "suggestion box." I cannot promise I'll cover or answer your question since it also has to pique my interest :-).

Oh, BTW, the DPL stuff... I just made up that moniker. This is no official announcement.

The Oracle at Delphi