Monday, March 22, 2010

Simple question… very hard answer… Talk amongst yourselves…

I’m going to try a completely different approach to this post. I’ll post a question and simply let the discussion ensue. I would even encourage the discussion to spill over to the public newsgroups/forums. Question for today is:

How can you effectively unit-test synchronization primitives for correctness or more generally, how would you test a concurrency library?

Let’s see how far we can get down this rabbit hole ;-).


  1. I would put it in the concurrent work and see how it performs...

  2. Performance is a separate issue. I said *correctness*, not performance. Yes it is a critical issue, but without correctness, performance is irrelevant.

  3. I would first define tests:-) What do we want to test? "Library" should be something reusable across a number of projects. What kinds of projects are these?
    I would start from use-cases of using such a library. In other words: cases of apps using the library being tested.

  4. Off-topic I know, but *you* don't need to test primitives because the OS vendor does that. If you really want to know how they are tested then I'd look at the tests for Linux kernel.

    Starting from scratch I think the first thing you need to do is to enumerate how your primitive could fail and then try to devise a way to make that failure manifest. Easy to say, not so easy to do. Since these primitives will very likely be quite hardware dependent you need a lab with a lot of different hardware!

  5. I am not an expert in the field, but Helgrind and the UPPAAL project ( might be relevant to the discussion.

    Many hard real-time systems research groups deal with this kind of problems everyday, they probably have a lot to offer in this area.

    The traditional "academic" approach would probably be to formalize some invariants and show that they hold in every possible situation. I am not sure this is always feasible.

  6. For primitives, I usually write a test contains as much opportunities as possible that may/should cause a data race to cause an observable side effect. E.g., is the test I wrote for TMultiReadExclusiveWriteSynchronizer (I hereby release it to the Public Domain, so feel free to look at it and use it in any way).

  7. Is TMultiReadExclusiveWriteSynchronizer a primitive?

    In fact Allen, what do you mean by primitive?

  8. Synchronisation primitives are functionality that does not do anything by itself save for providing functionality to synchronise multiple threads of execution.

    There are obviously simple (mutex, semaphore, ...) and more complex (barrier, reader/writer locks, ...) primitives, and many primitives can be expressed/implemented using other primitives (semaphore using a mutex, barrier using mutexes or semaphores, ...).

  9. 1. synchronization + dead lock
    2. speed - compare to using array of TThread(s)
    3. memory leak?

  10. i would not use windows standard synchronization primitives, but use something similar to instead

  11. Give the problem to mathematicians, they can prove or disprove this kind of things :) Seriously, I would do that.

  12. I would use Microsoft's CHESS (

  13. In this case, I think the term "integration test" is more appropriate than "unit test".

    I assume you already have a good handle on the idea of TDD and have at least a basic framework of tests in place. If not, that's a whole other discussion.

    For this kind of fundamental core functionality, that HAS to work reliably every time, I think you need to re-engineer some of these in to long running tests and run them on as many different hardware/OS combinations as you can get your hands on for extended periods of time. Basically, a dedicated testing lab. And/or a USB key launchable suite that a random co-worker can run on their PC over night.

    No matter how many ways I anticipate how something will fail, something unexpected almost always comes up.

  14. I remember seeing a video on Channel 9. Microsoft was working on a tool which could test multithreaded code by finding out all possible sequences in which the code can execute. The tool could then execute all these paths.

  15. Make them run "slow", and throw execution test cases in these slow scenarios.

    Run those primitives in debugger mode, instruction by instruction, and at each step, have a variety of tests run against that primitive's state, so that your primitive isn't just tested at each of its high-level states, but at each of its actual CPU-level states (which is usually where hard to spot trouble lives).

    For instance, for an atomic assignment, this is intended to spot that there isn't an intermediate state where the assignment isn't atomic, even on instructions that *usually* last no more than a fraction of a cycle (thanks to pipelining, thus potentially resulting in extremely infrequent issues).

    If you don't have that exhaustive approach, but go for tradionnal unit tests in threads, odds are your correctness will only be statistical, and potentially merely circumstancial.

    That or going for a mathematical proof of course, as already said.

    Also, I think performance should be part of the unit test framework, not as much for pure performance sake, but to spot race conditions and implicit serializations (which would render your primitives entirely pointless if these are not serialization primitives you're testing!)

  16. This sounds like a perfect Stack Overflow question!

    This line from Wikipedia entry on unit testing is pertinent: "Like all forms of software testing, unit tests can only show the presence of errors; they cannot show the absence of errors."

    I'd start by unit-testing it like any other module. Have simple unit tests for the various use cases, add new tests as appropriate to catch 'normal' bugs that are found during the unit's life.

    But, as you know, this isn't really going to catch many of the concurrency edge cases, except by accident, so there are a couple of different techniques to add.

    a: eyeballs. There is really no substitute for major, frequent code reviews on this kind of stuff, with as many good engineers as you can find. If your tests (or 'live' use) does throw up a problem, you'll probably have to do this anyway to find it.

    b: soak testing. Set up a machine (or several) running a serious multithreaded application (it probably doesn't matter what, but you should extend it according to the bugs you find), and run it 24/7. No VMs - this needs real metal.

    c: Unask the question. Instead, ask "why am I reinventing the wheel?" I've been round this myself too often to mention in years gone by, and would rather eat catfood than write/debug another set of synchronization primitives.

    I have the dubious honor of having found/fixed a fairly ugly BCB2009 RTL unicode string concurrency bug in the not-too-distant past. Found by eyeballing the code.

  17. It's a null question. Without knowing what functionality the "Concurrency Library" is supposed to provide it's impossible to devise tests or envisage what tests might be necessary.

    "How would you test a Data Access library?"
    "How would you test a Mathematics library?"
    "How would you test a Graphics library?"
    "How would you test a Language library?"

    You can't test a name/label, you can only test functionality.

    So, reverting to the more specific question about synch primitives, still the question remains, what are the synch primitives involved? What are they supposed to do? How are they supposed to work?

    Once armed with that information, the tests should be quite straightforward to devise I should think.

  18. One obvious piece of this equation is how to execute the tests. The test bed, once written, needs to be ran across a range of single, dual+ core systems with various speeds and capabilities, along with a combination of OS releases.
    (A single core running on XP will behave a lot different than a dual quad core Windows Server 2008)

    As far as the test bed goes, I would lean towards massive stress testing. Something on the order of the MS web stress tester where you vary the number of threads (or processes depending on the type of access required) and workers in which some workers are set to continuously write to different types of shared value(s), some workers are set to continuously read from these shared values, and some set to randomly read/write these shared values. Randomized wait lengths introduced before and during a read or write block. Enabled workers would need to log start and successful completion utilizing some sort of non-synchronized logging mechanism which can be analyzed later for success rate.

    $0.02 Good luck, sounds fun.

  19. Release it. Ship some time-bombed DCUs or a similar way of delivery. The biggest problems in any library are the design problems. Don't ignore them.

    Also, in order to "effectively test" you must envisage real-world cases. But your 'real world' is very different than ours. And this is natural. Hence one more reason to have a public beta.

    Have a look here:

    ...and take ideas.


  20. In OmniThreadLibrary, I'm stress-testing all lock-free code. The code runs with different number of threads, from one to *2, and repeats all tests "indefinitely".

    Typically, all bugs were detected in less than 15 minutes. However, I'm running all such tests overnight, for at least 12 hours.

  21. Forgot to add: It is important to stress-test with more threads than there are cores in the computer, because that causes threads to be paused at different places for a (comparatively) long time.

  22. I would follow the advice given by Brian Goetz in "Java concurrency in practice". A recommended book for all concurrency issues, and a great reference to the Java concurrency design, which is very clever, IMO.

    I haven't read that part of the book, yet, myself. But it seems to give a thorough review to various testing methods and principles.

    A short quote:

    "Most tests of concurrent classes fall into one or both of the classic categories of _safety_ and _liveness_. In Ch.1 we defined safety as "nothing bad ever happens" and liveness as "something good eventually happens"."

  23. Besides stress testing, the obvious solution would be to write a basic emulator of a multi-core system for your test. Most importantly, that would potentially give you complete control over the Time factor, in so far that you could emulate any specific instructions being carried out both concurrently and in a clearly defined order. Problems would have to be detected when evaluating the test data output.

  24. I wrote a Delphi threading library for the company I work for. Instead of regular unit testing, I do as Primoz did: test the library's internals with varying numbers of threads, and run the tests in different ways, for long periods of time. Generally my experience was the same as his: problems were found very quickly.

    How long you can you keep your ugg boots clean and new


Please keep your comments related to the post on which you are commenting. No spam, personal attacks, or general nastiness. I will be watching and will delete comments I find irrelevant, offensive and unnecessary.