Thursday, September 18, 2008

A "Nullable" Post

Solving problems can be fun and challenging. What is really challenging is solving a problem using only the tools at hand. During the development of Delphi 2009, there were several language features that were dropped in favor of generics and anonymous methods. One such feature was what we called "managed records." This allowed for some very interesting things where you could now implement a constructor, destructor and several assignment class operators on a record that would be automatically called when the record came into scope, went out of scope, and was assigned to something or something was assigned to it (like another instance of itself). Early in the development cycle, I cobbled together a generic record type that would implement a Nullable<T> type. I modeled it after the Nullable<T> type present in the .NET framework. So in Delphi, it looked like this:

type
Nullable<T> = record
private
FValue: T;
FHasValue: Boolean;
function GetValue: T;
public
constructor Create(AValue: T);
function GetValueOrDefault: T; overload;
function GetValueOrDefault(Default: T): T; overload;
property HasValue: Boolean read FHasValue;
property Value: T read GetValue;
class operator Implicit(Value: Nullable<T>): T;
class operator Implicit(Value: T): Nullable<T>;
class operator Explicit(Value: Nullable<T>): T;
end;

However, it suffered from one major flaw that managed records would have cleanly solved. Mainly, that just declaring a variable of type Nullable<T> where T is some concrete type, would leave the resulting structure uninitialized.  Namely, the FHasValue field could contain random garbage, which would make it return false positives for the HasValue property. Managed records would have solved this because I could instruct the compiler to call my special constructor when a variable of this type came into scope, like when program flow entered a method with one of the local variables declared as Nullable<sometype>; Well, alas... that didn't happen so I figured I'd have to push the notion of having a Nullable<T> type to the next release... maybe :-).


During the 1980's I enjoyed watching a television show called "MacGyver." The premise was that the main character used his own clever ingenuity to get out of a jam using only the tools and items he had at his disposal at the moment. As sort of a pop-cultural icon, this MacGyver character would concoct some of the most unlikely solutions to get away from the "bad guys" each week. The jokes and hyperbole surrounding this was fun, "MacGyver saves the day by shutting down the critical nuclear reactor core with only a paper clip and plastic bandage!" Even though a lot of what this character did defied common-sense and logic, the premise was that using a little ingenuity, you can solve seemingly impossible problems.


In true MacGyver fashion I figured I'd dust off my Nullable<T> type and see if there is something about the available tools I can leverage to accomplish the above task. Part of what spurred this on was this post by Barry Kelly about implementing a "smart-pointer" in Delphi 2009. In this post his clever use of an interface reference to "catch" the in-to-and-out-of-scope transitions sparked an idea.


Let's go back to the key problem with the above type, all we really care about is that when a variable of type Nullable<sometype> is declared, we need to guarantee that the FHasValue field is initialized to a known value that we can interpret as "False." In the case where the above is embedded into a class, the above declaration would work fine because we know that all data areas of a class instance are specifically initialized to 0. This would mean that a field declared as FField: Nullable<Integer>; would initially be considered Null, meaning no value has ever been assigned to it. Even though the FValue field is also 0, that is a legitimate value so we cannot just say that '0' means Null. The real problem happens when we declare a local variable as Nullable<sometype>. Whatever happens to be on the stack at that location is what the values will be. Hmmm... this is different from interfaces, strings, variants, and dynamic arrays, which are always guaranteed to be initialized to a known value (zero) upon entry to the method. This is to ensure "exception safety" which I discussed a long time ago right here.


In the Nullable<T> case, all we really care about is knowing whether or not the FValue field was ever assigned. Enter interfaces; In this case we want the FHasValue field to always be initialized to a known state. Any field or variable declared as some interface type, the compiler ensures will be initialized to nil. Let's tweak the record above using the interface "paperclip."

type
Nullable<T> = record
private
FValue: T;
FHasValue: IInterface;
...
function GetHasValue: Boolean;
public
constructor Create(AValue: T);
...
property HasValue: Boolean read GetHasValue;
...
end;

So I've changed the FHasValue field to be an IInterface and changed the HasValue property to now call a getter method since HasValue is still a Boolean and FHasValue is an interface. All GetHasValue does is to return whether or not FHasValue is nil. Then we change the constructor to assign something to FHasValue, namely just create an instance of TInterfacedObject and assign the implemented IInterface to the field.

constructor Nullable<T>.Create(const AValue: T);
begin
FValue := AValue;
FHasValue := TInterfacedObject.Create;
end;

Now when you assign one Nullable<T> to another Nullable<T>, the compiler generates the proper code to ensure that the FHasValue field is copied properly and the the lifetime of the TInterfacedObject is properly managed. In other words, we don't leak the object.


But can this be made a little more efficient? After all, if we're "MacGyver-ing" this up, might as well go all the way. If we pull another trick out of MacGyver's bag-of-tricks (say a "plastic bandage"), we can eliminate the worry about leaking the object and get a slight performance gain to boot. This trick is used down inside Generics.Defaults.pas for creating a singleton interface. All we really care about is that FHasValue is properly initialized to nil and when we assign to the Nullable<T>, it gets properly set to a valid non-nil value. The compiler will still generate the AddRef and Release calls, but we can now be more efficient about it.


Down in the implementation section we declare the following:

function NopAddref(inst: Pointer): Integer; stdcall;
begin
Result := -1;
end;

function NopRelease(inst: Pointer): Integer; stdcall;
begin
Result := -1;
end;

function NopQueryInterface(inst: Pointer; const IID: TGUID; out Obj): HResult; stdcall;
begin
Result := E_NOINTERFACE;
end;

const
FlagInterfaceVTable: array[0..2] of Pointer =
(
@NopQueryInterface,
@NopAddref,
@NopRelease
);

FlagInterfaceInstance: Pointer = @FlagInterfaceVTable;

And in the Nullable<T> constructor, just change it to:

constructor Nullable<T>.Create(const AValue: T);
begin
FValue := AValue;
FHasValue := IInterface(@FlagInterfaceInstance);
end;

And now all instances of Nullable<T> will use the same "fake interface" instance, which will never leak since it isn't even on the heap. It is also a little faster since there isn't any bus-locking "Interlocked" calls to handle a reference-count since that is not even needed in this case. Here's the whole thing in all it's "MacGyver" gory... er, uh, glory :-):

unit Foo;

interface

uses Generics.Defaults, SysUtils;

type
Nullable<T> = record
private
FValue: T;
FHasValue: IInterface;
function GetValue: T;
function GetHasValue: Boolean;
public
constructor Create(AValue: T);
function GetValueOrDefault: T; overload;
function GetValueOrDefault(Default: T): T; overload;
property HasValue: Boolean read GetHasValue;
property Value: T read GetValue;

class operator NotEqual(ALeft, ARight: Nullable<T>): Boolean;
class operator Equal(ALeft, ARight: Nullable<T>): Boolean;

class operator Implicit(Value: Nullable<T>): T;
class operator Implicit(Value: T): Nullable<T>;
class operator Explicit(Value: Nullable<T>): T;
end;

procedure SetFlagInterface(var Intf: IInterface);

implementation

function NopAddref(inst: Pointer): Integer; stdcall;
begin
Result := -1;
end;

function NopRelease(inst: Pointer): Integer; stdcall;
begin
Result := -1;
end;

function NopQueryInterface(inst: Pointer; const IID: TGUID; out Obj): HResult; stdcall;
begin
Result := E_NOINTERFACE;
end;

const
FlagInterfaceVTable: array[0..2] of Pointer =
(
@NopQueryInterface,
@NopAddref,
@NopRelease
);

FlagInterfaceInstance: Pointer = @FlagInterfaceVTable;

procedure SetFlatInterface(var Intf: IInterface);
begin
Intf := IInterface(@FlagInterfaceInstance);
end;

{ Nullable<T> }

constructor Nullable<T>.Create(AValue: T);
begin
FValue := AValue;
SetFlagInterface(FHasValue);
end;

class operator Nullable<T>.Equal(ALeft, ARight: Nullable<T>): Boolean;
var
Comparer: IEqualityComparer<T>;
begin
if ALeft.HasValue and ARight.HasValue then
begin
Comparer := TEqualityComparer<T>.Default;
Result := Comparer.Equals(ALeft.Value, ARight.Value);
end else
Result := ALeft.HasValue = ARight.HasValue;
end;

class operator Nullable<T>.Explicit(Value: Nullable<T>): T;
begin
Result := Value.Value;
end;

function Nullable<T>.GetHasValue: Boolean;
begin
Result := FHasValue <> nil;
end;

function Nullable<T>.GetValue: T;
begin
if not HasValue then
raise Exception.Create('Invalid operation, Nullable type has no value');
Result := FValue;
end;

function Nullable<T>.GetValueOrDefault: T;
begin
if HasValue then
Result := FValue
else
Result := Default(T);
end;

function Nullable<T>.GetValueOrDefault(Default: T): T;
begin
if not HasValue then
Result := Default
else
Result := FValue;
end;

class operator Nullable<T>.Implicit(Value: Nullable<T>): T;
begin
Result := Value.Value;
end;

class operator Nullable<T>.Implicit(Value: T): Nullable<T>;
begin
Result := Nullable<T>.Create(Value);
end;

class operator Nullable<T>.NotEqual(const ALeft, ARight: Nullable<T>): Boolean;
var
Comparer: IEqualityComparer<T>;
begin
if ALeft.HasValue and ARight.HasValue then
begin
Comparer := TEqualityComparer<T>.Default;
Result := not Comparer.Equals(ALeft.Value, ARight.Value);
end else
Result := ALeft.HasValue <> ARight.HasValue;
end;

end.

Here's a little test that shows how it works:

procedure TestNullable;
var
NullInt: Nullable<Integer>;
NullInt2: Nullable<Integer>;
I: Integer;
begin
try
I := NullInt; // Exception raised here because NullInt was never assigned a value;
except
Writeln('Success');
end;
if not NullInt.HasValue then // Check if the FHasValue field is actually nil
NullInt := 10; // Exercise an Implicit operator
NullInt2 := NullInt; // Non-Null Nullable assigned to a Null Nullable makes it non-Null
if NullInt2 = 10 then // Exercise the Equal() class operator and the Implicit operator
Writeln('Success');
end;

begin
TestNullable;
end.

Now, in true "Raymond Chen" fashion, I'd like to relieve many of you from the compulsive need to post the following comments :-).


Pre-emptive snarky comments:


If you would just implement non-reference counted interfaces like any "modern" language (aka, C#, Java, etc..), you wouldn't have to resort to those ugly hacks to fool the compiler.


Ok, so why didn't you just implement this with the "?" syntax like C# and build it in the first place?