Thursday, September 25, 2008

News Flash! Someone has already invented the wheel!

I'm sure most, if not all, of you have heard the phrase "Why reinvent the wheel?" Another one I'm very fond of is "Standing on the shoulders of giants." The latter was often heard coming from Distinguished Engineer, Anders Hejlsberg during his tenure at Borland. So, why reinvent the wheel when there are giant shoulders to stand on ;-)?

If you've been following along for the last year or so, I've been working on something I loosely refer to as the Delphi Parallel Library. While I've not been blogging about the DPL for a while, I have been going back to it from time to time as I gain more knowledge and more information into the fascinating subject of parallelism and task dispatching. There is a veritable wellspring of information flooding the internet on this subject, so keeping up on it has been a daunting task. One excellent source of information is Joe Duffy the lead for the Task Parallel Libary team at Microsoft. Another is an interesting project at MIT called Cilk (that's pronounced "silk"). There is a spin-off company that is set to deliver on the work started from the Cilk project, called Cilk Arts. There is the Thread Building Blocks library from Intel. Finally, there are many smaller upstart projects, open source and otherwise out there. The point is that this is a hot topic.

One of the interesting facets about all of the projects I mentioned above, is that none of these libraries are really about "threading." They're about breaking down iterative algorithms into smaller and smaller chunks of work or "tasks" and then working on ways to execute as many of these tasks in parallel on separate cores. They're about scalability and performance. They try and minimize the overhead associated with traditional threading models. As I get deeper into creating a Delphi Parallel Library, the plan is to cull together some of the key concepts from all the existing research and libraries out there and use them as a "blueprint" for the DPL. I want to take these concepts and apply them in a "Delphi-like" manner that would be familiar and intuitive to the Delphi programmer while leveraging as much of the newer Delphi language features such as Generics and Anonymous methods. Joe Duffy's series about creating a Work Stealing Queue and combining it with a thread pool in which forms the basis for the .NET TPL is an excellent source of information. This whole notion of distributed work queues and work stealing is also something that the Cilk project has also done. A lot of the current research on parallelism also relies heavily on the latest in lock-free programming and algorithms. An excellent source for various lock-free lists, stacks and queues for the Delphi & C# folks is Julian Bucknall's blog.

The best way to learn is to actually do. While playing with some of these ideas and concepts, I've found that I really need to have an easy way to do some simple performance measuring. In .NET, there is an interesting little class, System.Diagnostics.Stopwatch. As I begin to lay the foundation, some of these little utility classes will be heavily used. Why not provide a stopwatch gizmo for Delphi? While the stopwatch is just a thin wrapper around the QueryPerformanceCounter() and QueryPerformanceFrequency() APIs, it provides a neatly encapsulated way to start, stop, reset, for measuring elapsed time. When I looked at the various use-cases for the stopwatch, it became clear that implementing this thing as a Delphi class would be require more housekeeping overhead than I wanted. Instead, I opted for doing this as a record with methods. Sorry, no Generics or Anonymous Methods in this one :-). Here's a Delphi version of the declaration:

  TStopwatch = record
strict private
FElapsed: Int64;
FRunning: Boolean;
FStartTimeStamp: Int64;
function GetElapsedMilliseconds: Int64;
function GetElapsedTicks: Int64;
function GetElapsedSeconds: Double;
class procedure InitStopwatchType; static;
public
class function Create: TStopwatch; static;
class function GetTimeStamp: Int64; static;
procedure Reset;
procedure Start;
class function StartNew: TStopwatch; static;
procedure Stop;
property ElapsedMilliseconds: Int64 read GetElapsedMilliseconds;
property ElapsedTicks: Int64 read GetElapsedTicks;
property ElapsedSeconds: Double read GetElapsedSeconds;
property IsRunning: Boolean read FRunning;
public class var Frequency: Int64;
public class var IsHighResolution: Boolean;
end;

Here's a simple use case using Barry's little benchmarker class (only the actual benchmarking functions):

class function TBenchmarker.Benchmark(const Code: TProc; Iterations,
Warmups: Integer): Double;
var
SW: TStopwatch;
i: Integer;
begin
for i := 1 to Warmups do
Code;

SW := TStopwatch.StartNew;
for i := 1 to Iterations do
Code;
SW.Stop;

Result := SW.ElapsedSeconds / Iterations;
end;

function TBenchmarker.Benchmark<T>(const Name: string; const Code: TFunc<T>): T;
var
SW: TStopwatch;
i: Integer;
begin
for i := 1 to FWarmups do
Result := Code;

SW := TStopwatch.StartNew;
for i := 1 to FIterations do
Result := Code;
SW.Stop;

FReportSink(Name, SW.ElapsedSeconds / Iterations - FOverhead);
end;

It's minor, yes, but it does reduce some of the thinking about calculating the time interval and having to declare a start and stop local variable. Also, since it is a record that is allocated on the stack, there are no heap allocations and there is no need to do any cleanup. Although, I admit that it would be nice to have a type constructor in order to better handle the initialization of the IsHighResolution and Frequency class variables.


And now the implementation:

class function TStopwatch.Create: TStopwatch;
begin
InitStopwatchType;
Result.Reset;
end;

function TStopwatch.GetElapsedMilliseconds: Int64;
begin
Result := GetElapsedTicks div (Frequency div 1000);
end;

function TStopwatch.GetElapsedSeconds: Double;
begin
Result := ElapsedTicks / Frequency;
end;

function TStopwatch.GetElapsedTicks: Int64;
begin
Result := FElapsed;
if FRunning then
Result := Result + GetTimeStamp - FStartTimeStamp;
end;

class function TStopwatch.GetTimeStamp: Int64;
begin
if IsHighResolution then
QueryPerformanceCounter(Result)
else
Result := GetTickCount * 1000;
end;

class procedure TStopwatch.InitStopwatchType;
begin
if Frequency = 0 then
begin
IsHighResolution := QueryPerformanceFrequency(Frequency);
if not IsHighResolution then
Frequency := 1000;
end;
end;

procedure TStopwatch.Reset;
begin
FElapsed := 0;
FRunning := False;
FStartTimeStamp := 0;
end;

procedure TStopwatch.Start;
begin
if not FRunning then
begin
FStartTimeStamp := GetTimeStamp;
FRunning := True;
end;
end;

class function TStopwatch.StartNew: TStopwatch;
begin
InitStopwatchType;
Result.Reset;
Result.Start;
end;

procedure TStopwatch.Stop;
begin
if FRunning then
begin
FElapsed := FElapsed + GetTimeStamp - FStartTimeStamp;
FRunning := False;
end;
end;

Now here's a question I've been unable to get a definite answer to: When would QueryPerformanceCounter and QueryPerformanceFrequency ever fail on a standard Windows system? Does this only occur for things like WinCE, Embedded NT, or systems with strange HAL layers? Even though I've taken into account that, according to the documentation, it could fail, in practice when is that?