Friday, June 14, 2013

Give in to the ARC side

I thought I’d take a few moments (or more) to answer some questions regarding the just introduced Automatic Reference Counting (ARC) mechanism to Delphi on mobile platforms (namely iOS in XE4, and Android in a future release). Among some sectors of our customer base there seems to be some long-running discussions and kvetching about this change. I will also say that among many other sectors of our customer base, it’s been greeted with marked enthusiasm. Let’s start with some history…

Meet the new boss, same as the old boss.


For our long-time Delphi users, ARC is really nothing new. Beginning back in Delphi 2 (early 1996), the first Delphi release targeting Windows 32bit, the Delphi compiler has done ARC. Long-Strings, which broke free from the long standing Turbo Pascal limit of 255 characters, introduced the Delphi developer to the whole concept of ARC. Until then, strings (using the semi-reserved word “string”) had been declared and allocated “in-place” and were limited to a maximum of 255 characters. You could declare a string type with less than 255 characters by using the “string[<1-255>]” type syntax.

Strings changed from being allocated at compile time to a reference type that pointed to the actual string data in a heap-allocated structure. There are plenty of resources that explain how this works. For the purposes here, the heap-allocated data structure could be shared among many string variables by using a “reference-count” stored within that data structure. This indicated how many variables are pointing to that data. The compiler managed this by calling special RTL helper functions when variables are assigned or leave the current scope. Once the last string reference was removed, the actual heap-allocated structure could be returned to the free memory pool.

Hey this model works pretty well… Oh look! MS’ COM/OLE/ActiveX uses ARC too!


Delphi 3 was a seminal release in the pantheon of Delphi releases. For those who remember history, this is the first Delphi release in which the founder of Turbo Pascal, and the key architect of Delphi, Anders Hejlsberg, was no longer with the company. However, even with the departure of Anders, Delphi 3 saw the release of many new features. Packages and interfaces were major new language enhancements. Specifically interesting were interfaces, which placed support for COM/OLE/ActiveX-Style interfaces directly into the language.

The Delphi compiler already had support and logic for proper handling of an ARC type, strings, so it was an incremental step forward to add such support for interfaces. By adding interface ARC directly into the language, the Delphi developer was freed from the mundane, error-prone task of manually handling the reference counting for COM interfaces. The Delphi developer could focus on the actual business of using and accessing interfaces and COM/ActiveX objects without dealing with all those AddRef/Release calls. Eventually those poor C++ COM developers got a little help through the introduction of “smart pointers”… But for a while, the Delphi developers were able to quietly snicker at those folks working with COM in C++… oh ok, we still do…

Hey, let’s invite dynamic arrays to this party!


Introduced in Delphi 4, dynamic arrays freed developers from having to work with non-range-checked unbounded arrays or deal with the hassles of raw pointers. Sure there are those who get a clear kick out using raw memory locations and mud-wrestling and hog-tying some pointers… Dynamic arrays took a few pages from the string type playbook and applied it an array data type whose elements are now user-defined. Like strings, they too use the ARC model of memory management. This allows the developers to simply set their length and pass them around without worrying about who should manage their memory.

If ARC is good, Garbage Collection is sooo much better…(that’s a joke, son)


For a while, Delphi cruised along pretty well. There was Delphi 5, then a foray into the hot commodity of the time, Linux where the non-Delphi-named Kylix was developed. Delphi 6 unified the whole Windows and Linux experience… however not much really happened on the compiler front other than retargeting the x86 Delphi compiler to generate code fitting for Linux. By the time Delphi 7 was released, work on the Delphi compiler had shifted to targeting .NET (notice I have ignored that whole “oh look! didn’t Borland demonstrate a Delphi compiler targeting the Java VM?” thing… well… um… never-mind…). .NET was the culmination (up to that time) of the efforts of Anders Hejlsberg since he’d left for Microsoft several years prior.

Yes there has been a  lot of discussion over the years about whether or not MS really said that .NET was the “future” of the Windows API… From my recollection, I precisely remember that being explicitly stated from several sectors of the whole MS machine. Given the information at the time, it was clearly prudent for us to look into embracing the “brave new world” of .NET, it’s VM, and whole notion of garbage collection. We’re not mind readers, so regardless of any skepticism about this direction (oh and there was plenty of that), we needed to do something.

Since I have the benefit of 20/20 hindsight and having been a first-hand witness and participant in the whole move to .NET, there are some observations and lessons to be learned from that experience. I remember the whole foray into .NET as generating the most number of new and interesting language innovations and enhancements than I’ve seen since Delphi 2. A new compiler backend generating CIL (aka. MSIL), the introduction of class helpers which allow the injection of methods into the scope of an existing class (this was before C# “extension methods” which are very similar, if not more limited than helpers). I also think there were some missteps, which at the time, I know I vehemently defended. Honestly, I would probably have quite the heated discussion if the “me” of now were to ever meet the “me” of then Winking smile.

Rather than better embrace the .NET platform, we chose to mold that platform into Delphi’s image. This was reinforced by the clearly obvious link between the genesis behind Delphi itself and the genesis of .NET. They both were driven by the same key developer/architect. On several levels this only muddled things up. Rather than embracing GC (I have nothing really against GC, in fact, I find that it enables a lot of really interesting programming models), we chose to, more or less, hide it. This is where the me of now gets to ridicule the me of then.

Let’s look at one specific thing I feel we “got wrong”… Mapping the “Free” method to call IDisposable.Dispose. Yes, at the time this made sense, and I remember saying how great it was because though the magic of class helpers you can call “Free” on any .NET object from Delphi and it would “do the right thing.” Yes, of course it “did the right thing”, at the expense of holding the platform at arm’s length and never really embracing it. Cool your jets… Here, I’m using the term “platform” to refer more to the programming model, and not the built-in frameworks. Not the .NET platform as a whole…People still wrote (including us) their code as if that magic call to Free was still doing what it always has done.

The Free method was introduced onto TObject for the sole purpose of exception safety. It was intended to be used within object destructors. From a destructor, you never directly called Destroy on the instance reference, you would call Free which would check if the instance reference was non-nil and then call Destroy. This was done to simplify the component developer’s (and those that write their own class types) efforts by freeing (no pun intended) them from doing that nil check everywhere. We were very successful in driving home the whole create...try…finally…free coding pattern, which was done because of exceptions. However, that pattern really doesn’t need to use Free, it could have directly called Destroy.
Foo := TMyObj.Create;
try
  {... work with Foo here}
finally
  Foo.Free; {--- Foo.Destroy is just as valid here}
end;

The reason that Foo.Destroy is valid in this instance is because if an exception were raised during the create of the object, it would never enter the try..finally block. We know that if it enters the try..finally block the assignment to Foo has happened and Foo isn’t nil and is now referencing a valid instance (even if it were garbage or non-nil previously).

Under .NET, Free did no such thing as “free” memory… it may have caused the object to “dispose” of some resources because the object implemented the IDisposable interface pattern. We even went so far as to literally translate the destructor of declared classes in Delphi for .NET into the IDisposable pattern, even if all that destructor did was to call free other object instances, and did nothing with any other non-memory resource. IOW, under a GC environment it did a whole lot of nothing. This may sound like heresy to some, but this was a case where the power of the platform was sacrificed at the altar of full compatibility.

Come to the ARC side


What is different now? With XE4, we’ve introduced a Delphi compiler that directly targets the iOS platform and it’s ARM derivative processor. Along with this, ARC has now come to full fruition and is the default manner in which the lifetime of all object instances are managed. This means that all object references are tracked and accounted for. This also means that, like strings, once the number of references drop to 0, the object is fully cleaned up, destroyed and it’s memory is returned to the heap. You can read about this in detail in this whitepaper here.

If you ask many of my coworkers here, they’ll tell you that I will often say, “Names are important.” Names convey not only what something is, but in many cases what it does. In programming this is of paramount importance. The names you choose need to be concise, but also descriptive. Internal jargon shouldn’t be used, nor should some obscure abbreviation be used.

Since the focus of this piece (aside from the stroll down memory lane) is surrounding “Free”, let’s look at it. In this instance, Free is a verb. Free has become, unfortunately, synonymous with Destroy. However that’s not really it’s intent. It was, as stated above, was about writing exception-safe destructors of classes. Writing exception-safe code is also another topic deserving of its own treatment.

Rather than repeat the mistake in Delphi for .NET in how “Free” was handled, Free was simply removed. Yes, you read that right. Free has been removed. “What?” But my code compiles when I call Free! I see it in the RTL source! We know that there is a lot code out there that uses the proverbial create..try..finally..free pattern, so what is the deal!? When considering what to do here, we saw a couple of options. One was to break a lot of code out there and force folks to IFDEF their code, the other was to find a way to make that common pattern still do something reasonable. Let’s analyze that commonly implemented pattern using the example I showed above.

Foo is typically a local variable. The intent of the code above is to ensure that the Foo instance is properly cleaned up. Under ARC, we know that the compiler will ensure that will happen regardless. So what does Foo.Free actually do!? Rather than emit the well known “Unknown identifier” error message, the compiler simply generates code similar to what it will be automatically generated to clean up that instance. In simplistic terms, the Foo.Free; statement is translated to Foo := nil; which is then, later in the compile process, translated to an RTL call to drop the reference. This is, effectively, the same code the compiler will generate in the surrounding function’s epilogue. All this code has done is simply do what was going to happen anyway, just a little earlier. As long as no other references to Foo are taken (typically there isn’t even in Non-ARC code), the Foo.Free line will do exactly what the developer expects!

“But, but, but! wait! My code intended to Free, destroy, deallocate, etc… that instance! Now it’s not!” Are you sure? In all the code I’ve analyzed, which includes a lot of internal code, external open-source, even some massive customer projects used for testing and bug tracking, this pattern is not only common, it’s nearly ubiquitous. On that front we’ve succeed in “getting the word out” about the need for exception safety. If the transient local instance reference is the only reference, then ARC dictates that once this one and only one reference is gone, the object will be destroyed and de-allocated as expected. Semantically, your code remains functioning in the same manner as before.

To steal a line from The Matrix, “You have to realize the truth… there is no Free”.

Wait… but my class relies on the destructor running at that point!


Remember the discussion above about our handling of IDisposable under Delphi for .NET? We considered doing something similar… implement some well-known interface, place your disposal code in a Dispose method and then query for the interface and call if present. Yuck! That’s a lot of work to, essentially, duplicate what many folks already have in their destructors. What if you could force the execution of the destructor without actually returning the instance memory to the heap? Any reference to the instance would remain valid, but will be referencing a “disposed” instance (I coined the term a “zombie” instance… it’s essentially dead, but is still shambling around the heap). This is, essentially, the same model as the IDisposable pattern above, but you get it for “free” because you implemented a destructor. For ARC, a new method on TObject was introduced, called DisposeOf.

Why DisposeOf, and not simply Dispose? Well, we wanted to use the term Dispose, however, because Dispose is also a standard function there are some scoping conflicts with existing code. For instance, if you had a destructor that called the Dispose standard function on a typed pointer to release some memory allocated using the New() standard function, it would fail to compile because the “Dispose” method on TObject would be “closer in scope” than the globally scoped Dispose. Bummer… So DisposeOf it is.

We’ve found, in practice, and after looking at a lot of code (we do have many million lines of Delphi code at our disposal, and most of it isn’t ours), it became clear that the more deterministic nature of ARC vs. pure GC, the need to actively dispose of an instance on-demand was a mere fraction of the cases. In the vast majority (likely >90%) can simply let the system work. Especially in legacy code where the above discussed create..try..finally..free pattern is used. The need for calling DisposeOf explicitly is more the exception than the rule.

So what else does DisoseOf solve? It is very common among various Delphi frameworks (VCL and FireMonkey included), to place active notification or list management code within the constructor and destructor of a class. The Owner/Owned model of TComponent is a key example of such a design. In this case, the existing component framework design relies on many activities other than simple “resource management” to happen in the destructor.

TComponent.Notification() is a key example of such a thing. In this case, the proper way to “dispose” a component, is to use DisposeOf. A TComponent derivative isn’t usually a transient instance, rather it is a longer-lived object which is also surrounded by a whole system of other component instances that make up things such as forms, frames and datamodules. In this instance, use DisposeOf is appropriate.

For class instances that are used transiently, there is no need for any explicit management of the instance. The ARC system will handle it. Even if you have legacy code using the create..try..finally..free pattern, in the vast majority of cases you can leave that pattern in place and the code will continue to function as expected. If you wanted to write more ARC-aware code, you could remove the try..finally altogether and rely on the function epilogue to manage that instance.

Feel the power of the ARC side.


“Ok, so what is the point? So you removed Free, added DisposeOf… big deal. My code was working just fine so why not just keep the status quo?” Fair question. There is a multi-layered answer to that. One part of the answer involves the continuing evolution of the language. As new and more modern programming styles and techniques are introduced, there should be no reason the Delphi language cannot adopt such things as well. In many cases, these new techniques rely on the existence of some sort of automatic resource/memory management of class instances.

One such feature is operator overloading. By allowing operators on instances, the an expression may end up creating several “temporary” instances that are referenced by “temporary” compiler-created variables inaccessible to the developer. These “temporary” variable must be managed so that instances aren’t improperly “orphaned” causing a memory leak. Relying on the same management mechanism that all other instances use, keeps the compiler, the language, and the runtime clean and consistent.

As the language continues to evolve, we now have a more firm basis on which to build even more interesting functionality. Some things under consideration are enhancements such as fully “rooting” the type system. Under such a type system, all types are, effectively, objects which descend from a single root class. Some of this work is already under way, and some of the syntactical usages are even available today. The addition of “helper types” for non-structured types is intended to give the “feeling” of a rooted type system, where expressions such as “42.ToString();” is valid. When a fully rooted type system is introduced, such an expression will continue to work as expected, however, there will now be, effectively, a real class representing the “42” integer type. Fully rooting the type system will enable many other things that may not be obvious, such as making generics, and type constraints even more powerful.

Other possibilities include adding more “functional programming” or “declarative programming” elements to the language. LINQ is a prime example of functional and declarative programming elements added to an imperative language.

Another very common thing to do with a rooted type system is to actively move simple intrinsic type values to the heap using a concept of “boxing”. This entails actually allocating a “wrapper” instance, which happens to actually be the object that represents the intrinsic type. The “value” is assigned to this wrapper and now you can pass this reference around as any old object (usually as the “root” class, think TObject here). This allows any thing that can reference an object to also reference a simple type. “Unboxing” is the process by which this is reversed and the previously “boxed” value is extracted from the heap-allocated instance.

What ARC would enable here is that you can still interact with the “boxed” value as if it were still a mere value without worrying about who is managing the life-cycle of the instance on the heap. Once the value is “unboxed” and all references to the containing heap-instance are released, the memory is then returned to the heap for reuse.

The key thing to remember here: this is all possible with a natively compiled language without the need for any kind of virtual machine or runtime compilation. It is still strongly and statically typed. In short, the power and advantages gained by “giving in to the ARC side” will become more apparent as we continue to work on the Delphi language… and I’ve not even mentioned some of the things we’re working on for C++…

53 comments:

  1. Hi Allen, thanks for your post.

    When I read something like "a real class representing the “42” integer type", I got really worried.
    I really hope that you at Embarcadero keep in mind that PERFORMANCE is one of the things that Delphi is still better than many other languages. We don't need, nor want, a full object to represent integer types. We want the native integer type with native performance, not a type to allow me to write something like 42.toString() instead of IntToStr(42). Plase, don't kill Delphi performance copying other languages just to look cool.

    ReplyDelete
  2. I was just thinking the same thing. Please don't "box" native types unnecessarily just because you can. They need to stay native value types. The only time "boxing" is really needed is when assigning an integer where an object is expected, and vice versa. And even then, there are times when you WANT the integer to be assigned as-is to a pointer. Without "boxing" it.

    ReplyDelete
  3. Alexandre --

    Did you miss the part where Delphi has been using ARC since 1996? Who is copying whom?

    ReplyDelete
  4. Values will never be "boxed" unnecessarily... Boxing is only done when needed or explicitly asked for. If (when) we decide to add this to the language, I'll explain how it will work. Fundamental types will *remain* as efficient as they've always been. Even in other rooted languages such as Java and C#, fundamental types are allocated using their "natural" size as "value types." The "objectness" of variables of these types is interesting in how it is accomplished... In fact, the manner in which we implemented helpers for these fundamental types is a clue to how it can be done in fully rooted systems. The "objectness" of these types can even have a virtual method table and override methods from the single base class (TObject). Until the value is "boxed" there is no VMT pointer, yet there is still a VMT for that class! :-)

    To be honest, I wrestled with that whole question for a very long time... How can you have a rooted type system *and* ensure the fundamental types remain as efficient as they've always been? Especially since you are likely to have a VMT and overridden methods on these types. While working on some of the language evolution with the compiler team, I had an epiphany and was finally able to "reason it all out".

    Aside from the implementation of helpers for fundamental types, I also added support to the existing compiler that will be another key to enabling a rooted type system. The compiler can, in many instances, convert virtual method calls to non-virtual direct calls.

    ReplyDelete
  5. Alexandre - I think you've fallen into the Delphi Trap. There's a lot of antiquated, annoying, stale, boilerplate aspects to the language. Sometimes we tell ourselves there's some special magical point to it all just to make ourselves feel better and not be jealous of everyone else. The reality is a human being can't perceive the nanoseconds of difference involved. It's not about being cool; it's about employing the state of the art in computer science today.

    Let's embrace the present if not the future and stop being the Grandpa language everyone makes fun of. The state of the art in computer science says that everything being an object provides a great deal of power to the developer. Let's not be afraid of accepting that power and the wonderful new things we'd be able to do with it. In fact, let's go beyond integer objects and make types themselves objects too. :-)

    ReplyDelete
  6. "In fact, let’s go beyond integer objects and make types themselves objects too."

    Uh… they are… Even now. In Delphi. While not "objects" in the same sense of using them to create instances, they *are* still objects in the sense that the act of declaring them creates an "instance."

    In fact, since Delphi 1, the form designer has treated the form class being designed and it’s type as a true, memory representable type.

    In other languages, they’re also referred to as "meta-types" or simply types that describe other types…

    And before you ask… yep, "it’s turtles all the way down." (http://en.wikipedia.org/wiki/Turtles_all_the_way_down)…

    ReplyDelete
  7. Interestingly enough the changes does not do what you say, as it breaks code in two ways: codes that needs to call the destructor needs to be updated by changing Free to DisposeOf, and memory leaks/ballooned up memory usage.

    The first requires analysis and is error prone, the second happens whenever the destructor is what is breaking the dependency chain, and is a sibling of the first. Interestingly enough your own code is hit had by both issues in XE4. Interestingly enough as well, this will make maintaining code that has to run under different Delphi version quite complicated, making any upward transition of a code base problematic.

    The error in the .Net implementation wasn't using the dispose pattern, it was using it as a magic wand, the error this time is using the renaming of Free as a magic wand. In both cases, you should be looking at preserving not just the compatibility, but also be forward looking. For .Net you didn't have the "forward looking" portion, for ARC, you have only crummy versions of both: old code will compile fine, but behave differently and leak until updated. New code will introduce reliance on DisposeOf and weak reference tagging. Migrated code will also not be free of Free and their try..finally.

    Also you shouldn't be laughing at GCs when the alternative you have involves a global lock for weak references management...

    ReplyDelete
  8. "Interestingly enough the changes does not do what you say, as it breaks code in two ways: codes that needs to call the destructor needs to be updated by changing Free to DisposeOf, and memory leaks/ballooned up memory usage."

    Uh, no.. Where did you miss that *most* code will not need to be changed? Were did you get this notion? I take it you've actually used this and are working with it right now?

    "The error in the .Net implementation wasn’t using the dispose pattern, it was using it as a magic wand, the error this time is using the renaming of Free as a magic wand."

    So, in other words, we didn't do it the "way you would have" therefore it's wrong. Got it. And Free was *not* renamed...

    Eric, I really, really think you are blowing this way out of proportion. There are *some* of what you mention, but in my not so humble, extremely experienced opinion, the benefits gained far outweigh any of the downsides that you mention.

    I have many other developers with whom I work on a daily basis and whose opinion I deeply respect. Many of them have more experience in many other areas than I do. If these issues were nearly as egregious as you seem to be saying, I’d have heard about it long before we shipped… or even started down this path.

    As far as a "global lock" in weak reference management… there’s a bloody global lock when hitting the memory manager in many cases!

    Where did I "laugh at GC's?" I did give a little a dig at the C++ folks who were manually calling AddRef/Release... But I was never laughing at GCs...

    ReplyDelete
  9. "As far as a "global lock" in weak reference management… there’s a bloody global lock when hitting the memory manager in many cases!"

    That is you justification for that, seriously? You are basically saying that it is ok to have another global lock because another global lock already exists.

    Did you do heavy multithreading tests. Take OTL, do tests without the lock and with the lock. And use cases when the lock actually hinders performance. Then tell us the results. If they are good then I will not argue about it anymore.

    ReplyDelete
  10. Oh and by the way who is not saying that memory manager is not getting a bottleneck also in heavy multithreaded environment.

    Parallelism is going to be more and more important, not less.

    ReplyDelete
  11. "As far as a "global lock" in weak reference management… there’s a bloody global lock when hitting the memory manager in many cases!"

    I never expected this from you.

    ReplyDelete
  12. "As far as a "global lock" in weak reference management… there’s a bloody global lock when hitting the memory manager in many cases!"

    GC runs in a separate thread, delay in execution of GC process won't slow down your app, but in ARC it is done in the same thread, if Delphi ARC model uses Global lock then it is definitely slow(at least in Windows because of the Monitor).

    ReplyDelete
  13. Please don't dismiss Eric's arguments by pulling rank. Nobody is questioning neither yours nor your coworkers experience and knowledge. If Eric can show a way to incorporate ARC with 100% backward compatibility, no weak attributes, full benefits of future plans, better leak detection, less impact on global locks and more predictable release of resources, this should not be taken lightly. I can only see win-win-win-win situation here.

    I like the plans for a fully rooted type system and the concept of boxing. In order for this to fully work, all intrinsic functions/procedures must be implemented as class operators. I mean that common Delphi language idioms like Length,High,Low etc is necessary to keep such constructs as part of the language.

    ReplyDelete
  14. I think you are misinterpreting the discussion around Free, and let's put asside possible bottleneck in handling weak references.

    It is not that people don't want to use ARC, it is about old code that is bound by memory management of classic compiler and cannot use ARC in it's full glory because it has to maintain compatibility with classic compilers too.

    All language improvements that you mention for the future would still be possible even with DisposeOf functionality inside Free. One has nothing to do with the other.

    I don't want to question your experience and knowledge, but frankly I think you have dropped the ball here. Nothing you can say can make up for that. We are maybe blowing this issue out of proportion, but you can expect that for any breaking changes that are not neccessary and bring zero benefits to us users. Please keep in mind that no matter how trivial they may look, all changes costs us time and money.

    ReplyDelete
  15. I would suggest calm down and talk on measurable facts. For my feeling a proper "heavy multithread testcase (with and wihtout ARC)" running on a real multicore server would help a lot. Just to get a feeling if we talk about a real problem or not. Perhaps it would make sense to improve the memory manager as well for that area. The multicore story gets now more and more important to properly scale the performance.

    What I really miss is the ARC support on win32/win64 and the strategy to move the bigger 3rd party suppliers within a short time into this direction. Otherwise I have new nice theoretical feature that I cannot use for my daily work.

    ReplyDelete
  16. i neeeeeeed emb. speak more about RUNTIME Performance
    without good performance really what uses if it ??????

    ReplyDelete
  17. Kudos Allen!!!

    Great wander through Delphi and EMBT's internal requirements to explain what-is!

    Q

    ReplyDelete
  18. The problem with saying that DisposeOf vs. Free doesn't break things 90% of the time is that the math doesn't add up. If you have millions of lines of code at your disposal, and 90% of that still works fine when you make a potentially breaking change, that still means you end up with hundreds of thousands of lines of broken code. To those of us who have to maintain that code, it really is that simple.

    ReplyDelete
  19. First of all, I find it interesting that folks immediately jump in and start talking about what this or that is *wrong* with what we've done. That's certainly is no way to start a conversation. All that does is make those areas that do need to be addressed seem far more of a problem than they are in practice.

    As far as the "global lock" comment, I was simply trying to highlight that the presence of a global lock within the memory manager doesn't cause huge, massive slow downs when looked at from the proper perspective. Only when you are exercising the memory manager among many threads/CPUs will you begin to see an issue. The same is true for [Weak] references. They are only an issue if you are dealing with a lot of [Weak] references across many threads. I would say that, in practice, that is not as likely... even less likely than hitting the memory manager.

    So let's talk parallelism for a second; No matter how you slice it, working with a global resource (and memory is a global resource) within a bunch of parallel tasks is going to be an issue no matter what. In most cases, the best practice for doing large numbers of parallel tasks on sets of data is to ensure that all the memory needed is acquired upfront. Then dispatch different portions of that data to the various parallelized tasks. The more isolation among tasks, the far better your performance is going to be. The more inter-dependencies among tasks, the worse your performance will be. That should seem axiomatic when talking parallel processing.

    Of course that isn't meant to be a justification for using a global lock or sticking with the existing implementation. Yes, there are better approaches.. but many times we have to balance many things among all the other work needed to be done. For instance, the [Weak] reference implementation is certainly on the list of things to address in future releases.

    If we want to discuss runtime performance of our implementation, I can say with all certainty that we do beat Apple's, hands-down. Apple maintains the reference count for instances in a, are you ready?... a *hash table*... with locks! The individual reference count for each instance are much more likely to even live on the same cache-line. So even the atomic (LL/SC is used in ARM, http://en.wikipedia.org/wiki/Load-link/store-conditional) increment/decrement can suffer from cross-thread cache stalls. We only use that technique for the more rarely used [Weak] reference management, and not for reference counts.

    Finally, moving to ARC is not going to be without its costs. I get that. That is *precisely* why we decided to introduce it on the mobile platforms first. Most folks are not going to be moving their multimillion line application to a mobile device. They're going to look at moving only a subset of that application and only what makes sense for mobile. That's going to mean less code they'll need to deal with when moving to the new platform. Also, moving to mobile is simply not going to be a load-it-up, recompile and go affair. You are likely going to have to rethink and inspect a *lot* of your existing code *anyway*, regardless of the move to ARC.

    What we did was try to balance the introduction of the new ARC model with enough things to make the move as smooth as possible. We were not going to go out of our way to actively try and hide ARC or somehow mask it's presence.

    ReplyDelete
  20. Francisco J Ruiz NuñezJune 15, 2013 at 7:43 AM

    Good post Allen. Thank you for the details.

    ReplyDelete
  21. Thanks Allen !
    I think, that's exactly what the community needs - discussing with someone who knows what he is talking about! Don't get discouraged by negative feedback. I for one really appreciate your explanations about WHY you have done things this or such way.

    ReplyDelete
  22. Thanks, good read, and good explanation.

    I'm very happy to see that the language and the compiler keep evolving.

    ReplyDelete
  23. I also appreciate the explanations given and the discussion here. In order to be constructive there are some more things I'd like to know:

    1. there was talk about some algorithm which would do away
    with the problem of cycles in ARC without needing a
    [weak] declaration. I just don't remember who brought
    this up first, either Eric grange or Joseph Mitzen I
    think. Wouldn't using that one in the next version be
    helpful? (means deprecating [weak] already)

    2. What about optional support for ARC on the
    win32/win64/OSX platforms? Any plans for this one?
    And I'm specifically asking about optional support! ;-)

    3. What about this one?
    qc.embarcadero.com/wc/qcmain.aspx?d=116514
    It doesn't need to be the 1:1 implementation of this
    suggestion, but just giving us something to help find
    out which classes allocate non memory references in a
    way making calls to DisposeOf useful/necessary.
    Not everybody has the time to look through all those
    RTL/VCL/FMX classes to find out and for some 3rd party
    stuff there might not even be source code available.

    ReplyDelete
  24. Thanks for this excellent article! One wish I had was something like an attribute or $opt- to skip ARC completely (for perf critical code) without [weak] overhead. (ie. in gpu canvas textlayout.render lists, generates lots of references)

    ReplyDelete
  25. ARC is going to be a great addition to Delphi. Primarily because it is going to make code easier to write and easier to read, increasing developer productivity.

    Great post, very informative.

    ReplyDelete
  26. >Ok, I’ll rephrase my request as "let’s make types objects >that we can actually use". :-) Or perhaps "let’s make a type >type".

    Class references are "type types"... In a rooted type system, they will, by necessity, become even more fully featured.

    >theType := Real;
    >x := theType(z);

    What does that even mean? I don't know what you're trying to say here. Where would this be used?

    Your "ParseFunc()" example could be done with a rooted type system... it could also be done today with the use of the TypeInfo() standard function. Your example would be easier to accomplish with a rooted type system and be syntactically cleaner... however with a little bit of work, it could also be done with the language today (ARC or not).

    > Only true if it’s open source.

    I vehemently disagree with that assertion... So in your mind, only if it's open-source do people need to be civil and treat other developers with respect. Didn't know that... I just assumed civility and mutual respect were supposed to be universal.

    > 2) you’ve already gone on record asking people not to
    > submit patches with bug reports.

    What? Where did I say that? The closest I've ever come to saying anything remotely like that was when I expressed some frustration when *our* developers blindly apply some submitted patch without fully vetting it. I most assuredly never told customers to not submit patches.

    > For many years whenever it was observed that development
    > tool X had some feature that Delphi didn’t,
    > apologists/marketers/dev relations told users "But we’re
    > FASTER."

    What did you expect folks to say? You always talk about your strengths! Are you not doing the same thing by espousing the *strengths* of Python (which I really, like, BTW)? Criticizing a business for doing marketing is a little odd. I fully *admit* to doing that as well... Do you go into a prospective employer and immediately talk about all the things you failed at and are least qualified to do? You always build on your strengths, while at the same time work on minimizing the weaknesses.

    > I’m watching the same thing happen all over again with
    > the new marketing mantra that "non-native is evil".

    I can assure you that we've never done anything to characterize it in that manner... or is our stance on "native" code somehow threatening to your position on Python ;-) ?...

    We can play this game all day... but I will tell you that there is a place for native and a place for non-native JIT and interpreted. Those will forever be in tension over where that line is drawn. And that is a good thing, because it will make *both* styles better for having been at odds.

    From my perspective, this whole thing is rather interesting because for a while folks were saying "You guys need to add, this or that feature to the language!" or, even specifically, "When are we going to get LINQ in Delphi!?"... now we're actually looking at doing some of those things, and now we're getting "Oh... I didn't mean like *that*!" or "Why are you copying other language XX?!"... How are we supposed to respond?

    ReplyDelete
  27. Eric Fleming BonilhaJune 16, 2013 at 10:55 AM

    Allen
    I was against ARC before reading your post, but now I'm in favor of giving it a try. I just need to learn a little bit more of where should I use the [Weak] references, but your explanations are great

    You should write more often, I trully respect you and I personally like to read articles made by the master brains that actually make everything happen.

    Keep on the excellent work and never leave us!

    ReplyDelete
  28. What "zombie" instances are useful for? Frankly, I do not understand why instances that cannot be used anymore should stay in memory and not freed immediately. They can hold a lot of memory (depending on the class design), and IMHO they could also be a security issue and lead to some "use-after-free" (of better, "use-after-disposeof") vulnerabilities. Frankly I hope ARC will be only an option in Win32/64, or it will be a reason to move all of our server code to C/C++ (using GCC and VS, sorry, we abandoned C++ Builder years ago because of compatibility and performance). Maybe will keep Delphi only for GUIs.

    ReplyDelete
  29. Reply to Luigi:

    Zombie instances are not too useful. But they avoid that somebody else who didn't yet get the information that this instance is on longer valid tries to call something on it and get's an access violation or if that memory should be used by something else already even call errorneously something absolutely unwanted (a call to PostMessage(0, wm_quit, 0, 0);
    comes to mind... ;-) (not sure about the order of params out of my head)

    ReplyDelete
  30. But waht happens if I call a method of a zombie instance without checking first if it is? Is any call automatically checked, or not? I understand DisposeOf in the context of a GC, far less in the ARC context.

    ReplyDelete
  31. Allen: "As far as a "global lock" in weak reference management… there’s a bloody global lock when hitting the memory manager in many cases! ....

    "the best practice for doing large numbers of parallel tasks on sets of data is to ensure that all the memory needed is acquired upfront. Then dispatch different portions of that data to the various parallelized tasks. The more isolation among tasks, the far better your performance is going to be."

    Guenther Schoch: "Perhaps it would make sense to improve the memory manager as well for that area. The multicore story gets now more and more important to properly scale the performance."

    This is unannounced at the moment, but I am working on a new memory manager which does not have a global lock, and is designed for multithreaded usage, including cases where memory is allocated in one thread and freed in another, and many threads are allocating and freeing at once. It also uses a more secure design than FastMM4, which may be important for world-facing code, eg web servers. It's a personal project which I have not yet announced, but if you are interested (Allen, Guenther, others) please feel free to contact me at vintagedave@gmail.com.

    ReplyDelete
  32. QUOTE: the best practice for doing large numbers of parallel tasks on sets of data is to ensure that all the memory needed is acquired upfront. Then dispatch different portions of that data to the various parallelized tasks. The more isolation among tasks, the far better your performance is going to be

    Do you realize that by doing this way you are increasing memory requirements of your application as you need to be duplicating some of the data.
    Do you think you would get much performance gain if that data is rapidly changing? No you won't! Why? Becouse everytime the data changes you will need to synchronize this change throughout your threads and by doing this you will be creating large data copying which would still suffer due to global memory lock.

    ReplyDelete
  33. "Do you realize that by doing this way you are increasing memory requirements of your application as you need to be duplicating some of the data."

    Huh? That's not implied at all by what I said. I was talking about isolation between parallel tasks. If there are inter-dependencies between tasks, that is going to effect performance. The more you can isolate the tasks and their data, the better your performance. Yes, that does mean you sometimes gain performance at the expense of increased memory usage. That's *always* been a tradeoff and continues to be. As soon as you must "synchronize" data changes between parallel tasks, you've lost that isolation.

    ReplyDelete
  34. What about the suggestion of getting rid of Weak by using the Bacon's algorithm described here?
    http://en.wikipedia.org/wiki/Reference_counting#Dealing_with_reference_cycles

    ReplyDelete
  35. Some well known poster named Rudy just provided a link to some IBM research paper about that algorithm which might provide more detail about it:

    http://researcher.watson.ibm.com/researcher/files/us-bacon/Bacon03Pure.pdf

    ReplyDelete
  36. re: Bacon.

    That's very interesting... Mainly because something very close to that algorithm has been bouncing around my head for the last few months. One key difference is when to place objects onto the "roots" list. I was looking at actually detecting real "roots" by where they are physically located in memory. This algorithm treats all non-zero decrements as potential "roots".

    Another interesting thing about this algorithm is the use of "color" to determine the state of the object. This is also done in the current ARC implementation... Objects are marked "white" when they are in the process of being freed (see __MarkDestroying()). Likewise, when objects are "Disposed" (see __SetDisposed) they are marked (loosely analogous to) "black". In fact, the algorithm indicates that the RC(T), Color(T), and Buffered(T) all occupy the same "word" in the object instance. The FRefCount field contains the destroying(T), disposed(T) and RC(T) flags, similarly.

    As I've already stated, this is certainly something I want to continue to look into...

    ReplyDelete
  37. Allen, I'm not sure I understand your comment: "the best practice for doing large numbers of parallel tasks on sets of data is to ensure that all the memory needed is acquired upfront. Then dispatch different portions of that data to the various parallelized tasks. The more isolation among tasks, the far better your performance is going to be"

    I don't disagree that the more isolated tasks are, the better. But why allocate all memory up-front? A task will do its work and it's perfectly valid for a task to, at some point, allocate the memory it needs. In the context your other comments about the memory manager global lock, it seems this might be a workaround of a contended resource (the lock), not a contended resource (memory in general.)

    /If/ allocating memory in different threads has little contention, what reason is there to allocate all memory that tasks need is acquired upfront?

    ReplyDelete
  38. Because memory is a global resource, not only from a process perspective but also from an OS-level perspective. The more you exercise the global memory manager the more chance for contention. Even if you have a new thread aware memory manager that minimized contention, chances are that memory manager will reduce it's contention by grabbing larger blocks of memory from the OS at once, then hand out that memory to each thread independently.

    My point is that whether your code pre-allocates the memory or the memory manager does it, it's likely to happen at some level in order to reduce contention, even if that is happening down at the OS level. You can rely on the memory manager to be efficient or you can work in your own code to, which has a better picture of what the memory footprint and access patterns are likely to be, ensure you're not beholden to whatever the memory manager is doing.

    There are also locality of reference issues that can damage performance. Unless you can be assured that each thread requesting memory from the memory manager is handing out memory from different regions for each thread, you run the risk of two or more threads sharing memory on the same cache-line... of course a simple "fix" for that is to make sure all allocations are aligned on a cache-line boundary. For lots of tiny allocations, that can be *less* efficient use of memory and might increase your memory pressure. Cache-line sized can range from 32 to 256 bytes, depending on the hardware and the CPU involved.

    ReplyDelete
  39. What about ARC and BASM? Will the compiler be able to spot reference changes in ASM code, especially when references are held in registers? Or has it to be managed manually?

    ReplyDelete
  40. > and I’ve not even mentioned some of the things we’re working on for C++…

    looking forward to more horror stories on bcb front

    ReplyDelete
  41. @Vladimir Ulchenko lol! ;)

    ReplyDelete
  42. I have only one opinion:
    Give vs a chance to choose ARC or NOT ARC, just like the relationship between the Object and Class key word.

    If we have no chooice, we will have the feeling like been
    raped. No matter you give us what much wonderfull things.

    ReplyDelete
  43. By the way. Most things have there advantages and disadvantages. Why not let both of them be there.
    And let users determine how to use them.

    ReplyDelete
  44. So how would a library that is intended to be used with ARC work with other libraries that are not? Let's follow that logic in a thought-experiment...

    Option 1: Compiler switch.

    How would Unit A built with ARC off properly handle an object from Unit B built with ARC on? Wouldn't that really mess up the lifetime and reference counts of that object? What if you don't have the source to either or one of the units? (sorry, merely saying that you don't buy components without source doesn't solve the issue because many customers *do* buy components without source). Let's go even further and assume that each object carries some "flag" indicating that it is ARC enabled... Now *all* code must test this flag and decide whether the particular instance is ARC-enabled.

    Option 2: ARC enabled base class

    How is this a choice? Suppose Unit A's classes are built with the ARC base class and you want them to interact with Unit B's functionality which expects objects descending from the non-ARC base. Since they descend from separate bases, they're not going to be type-compatible.
    At what level in the hierarchy should ARC be enabled? Should there now be two TObject-like base classes which share no common base? If they do share a common base, is that base ARC-on or ARC-off?

    ReplyDelete
  45. I mean NO ARC and ARC exist in the same time.
    Compiler switch can not achieve this goal .

    We can regard Class as NO ARC and regard Interface as ARC.
    Just like string(ARC) and pchar(NO ARC). That's very good .

    I can choose by myself acording to the different requirements.

    You had mentioned string is ARC.
    OK, it is ARC. But pchar worked well with string and
    they are not exclusive in the old days.

    Why we have to choose one and kill another today?

    ReplyDelete
  46. Ok, I see... you want the status quo. That would mean that things like operator overloads on classes would not be possible.

    ReplyDelete
  47. Can we think about operator overloads in c++.

    Class object in stack and Copy Constructor?

    ReplyDelete
  48. Class object in stack will have more efficiency.

    ReplyDelete
  49. [...] and unboxed as needed to store it inside a TObject.  However, Delphi is not a rooted language, at least not yet.   Well, the technique is so useful I was unwilling to give [...]

    ReplyDelete
  50. We all knows that c++ has operator overloads .
    C++ has CComPtr.
    ARC and operator overloads not exclusive .

    I'm very confused for how are you positioning Delphi now.
    Language C is no less because of the fewer language features.
    Language C++ is no less because of the complex language features.
    What's the matter?

    If Delphi is still focus on the Native code. Delphi should
    learn c/c++ more, but not a dynamic language.

    ARC is wonderful. But ARC is not the whole world. Just like class. Now, we can write functions in Record type.
    This also proved that class is not the whole world. The
    world is diverse, we should have a variety of ways to describe the world.
    If you still think ARC is the only one chooice.
    I beg you let record support virtual functions and let record can override.

    Thanks.

    ReplyDelete
  51. [...] Sender parameter. Yes, the new Android compiler is using ARC, just like the iOS compiler. Check out Allen’s blog post on ARC for a lot more detail on how it [...]

    ReplyDelete
  52. Sorry to post here but i got lost and didn't yet find other way but you had some information about Ford J3 diy adapter what you did do? Please contact via email.

    ReplyDelete

Please keep your comments related to the post on which you are commenting. No spam, personal attacks, or general nastiness. I will be watching and will delete comments I find irrelevant, offensive and unnecessary.