The Oracle at Delphi: News Flash! Someone has already invented the wheel!

Thursday, September 25, 2008

News Flash! Someone has already invented the wheel!

I'm sure most, if not all, of you have heard the phrase "Why reinvent the wheel?" Another one I'm very fond of is "Standing on the shoulders of giants." The latter was often heard coming from Distinguished Engineer, Anders Hejlsberg during his tenure at Borland. So, why reinvent the wheel when there are giant shoulders to stand on ;-)?

If you've been following along for the last year or so, I've been working on something I loosely refer to as the Delphi Parallel Library. While I've not been blogging about the DPL for a while, I have been going back to it from time to time as I gain more knowledge and more information into the fascinating subject of parallelism and task dispatching. There is a veritable wellspring of information flooding the internet on this subject, so keeping up on it has been a daunting task. One excellent source of information is Joe Duffy the lead for the Task Parallel Libary team at Microsoft. Another is an interesting project at MIT called Cilk (that's pronounced "silk"). There is a spin-off company that is set to deliver on the work started from the Cilk project, called Cilk Arts. There is the Thread Building Blocks library from Intel. Finally, there are many smaller upstart projects, open source and otherwise out there. The point is that this is a hot topic.

One of the interesting facets about all of the projects I mentioned above, is that none of these libraries are really about "threading." They're about breaking down iterative algorithms into smaller and smaller chunks of work or "tasks" and then working on ways to execute as many of these tasks in parallel on separate cores. They're about scalability and performance. They try and minimize the overhead associated with traditional threading models. As I get deeper into creating a Delphi Parallel Library, the plan is to cull together some of the key concepts from all the existing research and libraries out there and use them as a "blueprint" for the DPL. I want to take these concepts and apply them in a "Delphi-like" manner that would be familiar and intuitive to the Delphi programmer while leveraging as much of the newer Delphi language features such as Generics and Anonymous methods. Joe Duffy's series about creating a Work Stealing Queue and combining it with a thread pool in which forms the basis for the .NET TPL is an excellent source of information. This whole notion of distributed work queues and work stealing is also something that the Cilk project has also done. A lot of the current research on parallelism also relies heavily on the latest in lock-free programming and algorithms. An excellent source for various lock-free lists, stacks and queues for the Delphi & C# folks is Julian Bucknall's blog.

The best way to learn is to actually do. While playing with some of these ideas and concepts, I've found that I really need to have an easy way to do some simple performance measuring. In .NET, there is an interesting little class, System.Diagnostics.Stopwatch. As I begin to lay the foundation, some of these little utility classes will be heavily used. Why not provide a stopwatch gizmo for Delphi? While the stopwatch is just a thin wrapper around the QueryPerformanceCounter() and QueryPerformanceFrequency() APIs, it provides a neatly encapsulated way to start, stop, reset, for measuring elapsed time. When I looked at the various use-cases for the stopwatch, it became clear that implementing this thing as a Delphi class would be require more housekeeping overhead than I wanted. Instead, I opted for doing this as a record with methods. Sorry, no Generics or Anonymous Methods in this one :-). Here's a Delphi version of the declaration:

  TStopwatch = record
  strict private
    FElapsed: Int64;
    FRunning: Boolean;
    FStartTimeStamp: Int64;
    function GetElapsedMilliseconds: Int64;
    function GetElapsedTicks: Int64;
    function GetElapsedSeconds: Double;
    class procedure InitStopwatchType; static;
  public
    class function Create: TStopwatch; static;
    class function GetTimeStamp: Int64; static;
    procedure Reset;
    procedure Start;
    class function StartNew: TStopwatch; static;
    procedure Stop;
    property ElapsedMilliseconds: Int64 read GetElapsedMilliseconds;
    property ElapsedTicks: Int64 read GetElapsedTicks;
    property ElapsedSeconds: Double read GetElapsedSeconds;
    property IsRunning: Boolean read FRunning;
  public class var Frequency: Int64;
  public class var IsHighResolution: Boolean;
  end;

Here's a simple use case using Barry's little benchmarker class (only the actual benchmarking functions):

class function TBenchmarker.Benchmark(const Code: TProc; Iterations,
  Warmups: Integer): Double;
var
  SW: TStopwatch;
  i: Integer;
begin
  for i := 1 to Warmups do
    Code;

  SW := TStopwatch.StartNew;
  for i := 1 to Iterations do
    Code;
  SW.Stop;

  Result := SW.ElapsedSeconds / Iterations;
end;

function TBenchmarker.Benchmark<T>(const Name: string; const Code: TFunc<T>): T;
var
  SW: TStopwatch;
  i: Integer;
begin
  for i := 1 to FWarmups do
    Result := Code;
  
  SW := TStopwatch.StartNew;
  for i := 1 to FIterations do
    Result := Code;
  SW.Stop;
  
  FReportSink(Name, SW.ElapsedSeconds / Iterations - FOverhead);
end;

It's minor, yes, but it does reduce some of the thinking about calculating the time interval and having to declare a start and stop local variable. Also, since it is a record that is allocated on the stack, there are no heap allocations and there is no need to do any cleanup. Although, I admit that it would be nice to have a type constructor in order to better handle the initialization of the IsHighResolution and Frequency class variables.

And now the implementation:

class function TStopwatch.Create: TStopwatch;
begin
  InitStopwatchType;
  Result.Reset;
end;

function TStopwatch.GetElapsedMilliseconds: Int64;
begin
  Result := GetElapsedTicks div (Frequency div 1000);
end;

function TStopwatch.GetElapsedSeconds: Double;
begin
  Result := ElapsedTicks / Frequency;
end;

function TStopwatch.GetElapsedTicks: Int64;
begin
  Result := FElapsed;
  if FRunning then
    Result := Result + GetTimeStamp - FStartTimeStamp;
end;

class function TStopwatch.GetTimeStamp: Int64;
begin
  if IsHighResolution then
    QueryPerformanceCounter(Result)
  else
    Result := GetTickCount * 1000;
end;

class procedure TStopwatch.InitStopwatchType;
begin
  if Frequency = 0 then
  begin
    IsHighResolution := QueryPerformanceFrequency(Frequency);
    if not IsHighResolution then
      Frequency := 1000;
  end;
end;

procedure TStopwatch.Reset;
begin
  FElapsed := 0;
  FRunning := False;
  FStartTimeStamp := 0;
end;

procedure TStopwatch.Start;
begin
  if not FRunning then
  begin
    FStartTimeStamp := GetTimeStamp;
    FRunning := True;
  end;
end;

class function TStopwatch.StartNew: TStopwatch;
begin
  InitStopwatchType;
  Result.Reset;
  Result.Start;
end;

procedure TStopwatch.Stop;
begin
  if FRunning then
  begin
    FElapsed := FElapsed + GetTimeStamp - FStartTimeStamp;
    FRunning := False;
  end;
end;

Now here's a question I've been unable to get a definite answer to: When would QueryPerformanceCounter and QueryPerformanceFrequency ever fail on a standard Windows system? Does this only occur for things like WinCE, Embedded NT, or systems with strange HAL layers? Even though I've taken into account that, according to the documentation, it could fail, in practice when is that?

20 comments:

El GiSeptember 25, 2008 at 9:38 AM
Just take a look also in fork/join Java API

Parallelism with Fork/Join in Java 7
http://www.infoq.com/news/2008/03/fork_join

Concurrency JSR-166 Interest Site
http://gee.cs.oswego.edu/dl/concurrency-interest/
ReplyDelete
Replies
Allen BauerSeptember 25, 2008 at 9:56 AM
El Gi,

Thanks for the links. Things in Java-land are looking very much like .NET and vice-versa.

Allen.
ReplyDelete
Replies
Mike PSeptember 25, 2008 at 11:02 AM
a favorite saying of mine too but it originated with Sir Isaac Newton in 1676:

"If I have seen further it is by standing on the shoulders of giants."

(the best part was his humility..."If I have seen further"...)

great blog. thank you!
ReplyDelete
Replies
Allen BauerSeptember 25, 2008 at 12:26 PM
Mike,
I didn't meant to indicate that Anders originated that comment, but merely his use of the phrase.

And, it is a great comment because that is really what we all do. We always build on the knowledge obtained by those that have gone before us.

Allen.
ReplyDelete
Replies
Allen BauerSeptember 25, 2008 at 12:29 PM
Xepol,

In the most technical sense, you're right. A CPU level bus lock is still a lock of sorts. In this case "lock" is referring to the heavy OS kernel-level locking primitives. How about "quick, lightweight, OS-lock free programming"?
:-)

Allen.
ReplyDelete
Replies
Jarle StabellSeptember 25, 2008 at 6:36 PM
The Newton quote is quite fascinating, it was likely meant as an insult to a "competitor" of Newton, Robert Hook. (Newton wrote it a letter to Robert Hooke).

From Wikipedia:

"Historians generally think the above quote was an attack on Hooke (who was short and hunchbacked), rather than – or in addition to – a statement of modesty. The two were in a dispute over optical discoveries at the time. The latter interpretation also fits with many of his other disputes over his discoveries – such as the question of who discovered calculus as discussed above."
ReplyDelete
Replies
Uwe RaabeSeptember 26, 2008 at 12:22 AM
I did a similar thing like the TStopwatch using an interface IStopWatch and a global function returning such an interface. Internally this is implemented by a reference counted object. As an advanatage I see the encapsulation of the intrinsics and the availabliliy to use it even with COM. Do you see any drawbacks?
ReplyDelete
Replies
Allen BauerSeptember 26, 2008 at 12:47 AM
Uwe,
Other than the added overhead of the reference counting and memory allocations, I can't think of any major drawbacks. If you've got something that works and serves your needs, I see no reason to abandon it.

Allen.
ReplyDelete
Replies
Bruce McGeeSeptember 26, 2008 at 4:28 AM
I don't know of any situation where the calls fail, but sometimes I see a big jump when using QueryPerformanceCounter, so I usually run tests more than once to rule out anomalies.
ReplyDelete
Replies
GielSeptember 26, 2008 at 5:38 AM
Bruce, do you see the 'big jump' on a multi-cpu machine? That's a 'feature' AFAIK.
ReplyDelete
Replies
Daniel LehmannSeptember 26, 2008 at 9:08 AM
You can fill your static class variables in the initialization section of the enclosing unit. Of course that will be run upon program start but a QueryPerformanceFrequency shouldn't matter ;)
ReplyDelete
Replies
Allen BauerSeptember 26, 2008 at 9:15 AM
Daniel,

"You can fill your static class variables in the initialization section of the enclosing unit"

Of course, however I didn't want to implicitly reference the type just by having the unit in my uses clause. By only doing the work on demand, I can ensure that the smart-linker will only bring in the requisite code if I actually *used* the type. This is where I'd like to have class constructors for classes and records in Win32. They're only available in .NET right now.

Allen.
ReplyDelete
Replies
Allen BauerSeptember 26, 2008 at 9:21 AM
Bruce, Joe,

You're probably seeing these "jumps" because the performance counters continue to run even when they're executing other threads/processes on the system. It is a single counter per-CPU. So it could be that your thread is running on a different core.

Allen.
ReplyDelete
Replies
Lars FosdalSeptember 26, 2008 at 8:12 PM
@Allen (and Barry :) )
Class Constructors - I really would love those.

I would also like class constants in a similar fashion to class variables. The type can be explicit or implied.

TMyClass = class(TObject)
class const Key = 1; unique; // optional, compile-time enforce to ensure uniqueness
class const WorkPoolSize = 8;
end;

TMyClassSub = class(TMyClass)
class const Key = 2; override;
end;
ReplyDelete
Replies
Bruce McGeeSeptember 27, 2008 at 7:12 AM
Thanks, Allen. I'll try setting processor affinity for benchmarking code and see if that helps.
ReplyDelete
Replies
Jolyon SmithSeptember 28, 2008 at 7:41 AM
In similar code of my own I hide the API behind a pair of function pointers:

type
TInt64Fn = function: Int64;

vars
GetTimerValue: TInt64Fn = NIL;
GetTimerFreq: TInt64Fn = NIL;

these are in a unit that has a little initialization code - if HPC is supported, these vars are hooked up directly to the HPC API functions. If not then they are hooked up to my own routines - one that returns the result of GetTickCount and the other that simply returns 1000, respectively.

The testing for HPC support is performed just once, automatically, and I simply write my code that uses whatever underlying timer is available using GetTimerValue and GetTimerFreq.

As for if/when HPC might not be supported... I have no idea but just as you, have always allowed for the possibility.

:)

General observation:

"Reinventing the wheel" is hopelessly over-used imho.

Nobody *EVER* re-invented a wheel.

What does happen - a lot - is that a new TYPE of wheel, and sometimes even a new axle and/or bearing, is required to be designed to suit the specific engineering problem to which a wheel provides the general solution.

To mix-in another metaphor:

The peg has already been invented - why re-invent the peg?

The answer is obvious if the peg is square and the hole you need to fit it in is round.
ReplyDelete
Replies
danielSeptember 28, 2008 at 12:53 PM
it was actually Sir Isac Newton who once said 3 things (all apparently related to this post)

1. action -> reaction
2. i've seen further climbing on giants's shoulders
3. i've just taken a tea spoon from an ocean

good luck a!
d
ReplyDelete
Replies
Ken KnopfliSeptember 28, 2008 at 5:32 PM
One thing I have wondered is what practical uses multicore/multithreading can be put to within a single application.

My list is very small.

Doesn't mean I'm not interested!
ReplyDelete
Replies
MarkusSeptember 29, 2008 at 6:26 AM
What failures of QueryPerformanceCounter is concerned, it has to do which underlying hardware clock Windows uses. In some cases when Windows decides to use the TSC, there can be problems on multicore systems especially on some types of AMD CPUs. Since on some CPUs each core has its own TSC, sometimes the TSC is read form one core sometimes form the other (and they may not be in sync depending on power saving states).

See aslo here:
http://support.microsoft.com/?scid=kb%3Ben-us%3B895980&x=12&y=14
ReplyDelete
Replies
Mason WheelerOctober 8, 2008 at 5:35 AM
Ken:
I'm building a game engine that can run scripts. I run each script on its own instance of the script engine, each one inside its own thread, because there are certain scripts that can take a noticeable amount of time to complete, especially if they have to wait on input from the user.
ReplyDelete
Replies

Please keep your comments related to the post on which you are commenting. No spam, personal attacks, or general nastiness. I will be watching and will delete comments I find irrelevant, offensive and unnecessary.

The Oracle at Delphi

Thursday, September 25, 2008

News Flash! Someone has already invented the wheel!

20 comments:

Blog Archive

Popular Posts

Labels

MVP

My Blog List