String instances are thread-safe. String variables are not.
Let's define what I mean by "instances" and "variables";
Instances are those string entities which live out in the heap that represents a string value.
Variables are the things you declare, either local, global, or a field of a class or record.
Recently I noticed there was some confusion about this, and in fact a test case surfaced that purported to demonstrate how strings were not thread-safe. The problem was that the demonstration merely showed exactly what I stated above; that string variables are not thread-safe.
Because multiple string variables may be referencing the instance or value, the chance of that value being accessed across threads is greater than zero. Even if the variables are properly guarded against simultaneous cross-thread access, that does nothing to help the values out on the heap. This is why the internal implementation of strings uses atomic operations to manage the reference count. This is key to ensuring the reference count is always correct and consistent.
Just like any other variable that is being read from or written to from many threads, string variables require the same level of protection as you would give to a simple Integer variable.
I suspect the confusion about this stems from the fact that strings are treated like "value objects" while they are implemented as a "reference objects". The other bit of confusion likely comes from the fact that the internal implementation details of strings is visible for all to see. When folks looked into it, they saw a LOCK INC XXXX instruction or an AtomicIncrement() standard function and thought, "Aha! That is there for thread-safety, so therefor strings are 'thread-safe'".
So the point here is that you should treat a string variable like any other variable in threaded code. Don't worry about the string instances; the runtime has that part taken care of.
Hi Allen,
ReplyDeleteI was also initially confused about the extent of string thread safety. What threw me was the xchg instructions in LStrAsg and LStrLAsg in system.pas. With a memory operand those have an implied lock, and since they're operating on the variable itself it gave me the impression that it was an attempt to make the variables themselves thread-safe too. (The PurePascal code which was added later does not have that lock.) I filed a QC for it some time ago:
http://qc.embarcadero.com/wc/qcmain.aspx?d=50564
Andreas has incorporated the suggestion into his IDE Fix Pack. Back when I filed the report it made a noticeable difference in performance, but I have not tested it on more modern CPUs since.
Best regards,
Pierre le Riche
The IDE Fix Pack doesn't contain that patch anymore, because it caused a lot of random crashes with other IDE plugins. I guess those have code that assumes that string variables are thread-safe.
DeleteHi Andreas,
DeleteThat's good to know. Removal of those implied locks certainly widens the window for bad things to happen, but one could argue it's better that way since it makes such bugs easier to reproduce.
I would suggest removing the locks in beta versions of the RTL, but leaving them in in the final release until such time as everyone has had a chance to fix their packages. It's certainly an issue that will surface again if the IDE ever becomes 64-bit (since it is only the x86 implementation that has those extra locks).
Hi Allen,
ReplyDeleteSo the purpose of LOCK INC here is to protect the "Reference Counter" not the string itself.
Yes. The manner in which the compiler and runtime operate further protect the whole content.
Delete