Friday, March 26, 2010

Another installment of “Talk Amongst Yourselves”

Let’s start thinking about thread pools. How do you manage a general purpose thread pool in the face of no-so-well-written-code? For instance, a task dispatched into the thread pool never returns, effectively locking that thread from ever being recycled. How do you monitor this? How long do you wait before spooling out a new thread? Do you keep a “monitor thread” that periodically checks if a thread has been running longer than some (tunable) value? What are the various techniques for addressing this problem?

So, there you go... Talk amongst yourselves.

7 comments:

  1. Well, you probably already have parameters that determine when a new thread is started. New threads will be needed from time to time, when the others are busy. You don't have to do anything specifically for not-so-well-written code. That's the "end developer's" problem.

    ReplyDelete
  2. If a task dispatched into the thread pool never returns, your library should assume the application developer intended this to happen.

    ReplyDelete
  3. I tend to agree with Henrick-essentially that a service writer can be over-protective. As Henrick points out, you can't really be sure that what you're seeing is an error, and no service code can really protect itself against any possible misuse anyway. You can be easy to use, but 'foolproof' is neither a worthwhile nor attainable goal.
    You can, however, make suspicious behavior easy to find. Perhaps by enable a debugging thread that can report thread usage on pool shutdown: names of thread processes and lifetime statistics, names of any threads that never ended. etc.
    bobD

    ReplyDelete
  4. Functionality to kill hanging jobs upon shutdown or by calling a function to do it would be nice. Well maybe you can add a MaxTime property for a single job but the only thing that should happen when MaxTime has been reached is fire an OnJobMaxTimeReached event.

    ReplyDelete
  5. I like the idea of the ThreadPool monitoring the states of the threads in the pool. There could be a thread timeout property on the pool that causes an event to be called if the thread doesn't return under the timeout. The timeout, of course, should be overridable such that the pool can ignore timeouts or individual threads can signal the ThreadPool that they need more time.

    ReplyDelete
  6. I made no attempt to implement this in any of my thread pools. I think that as long as it's possible to enumerate over the threads from outside the thread pool (in a thread-safe fashion) that you can leave that one to users of your library.Using a monitor thread is overkill in use cases that need no such support.Something I would like to implement that is tangentially related: asynchronous thread aborts that correctly support try/finally semantics. If you figure out how to implement that one in the compiler that would be incredibly useful! It may not be possible, though. I don't think Win32 supports it.

    ReplyDelete
  7. Allen,

    Finally I can insert carriage returns! (The welcome page view in the Delphi IDE doesn't seem to allow that).

    Perhaps you are thinking about the wrong solution to the problem. A timeout doesn't seem to me to be a robust method for treating a thread as stalled. What timeout value would be correct? Even an end-user/programmer of the thread pool would have trouble determining which value to use. Also, what action could the pool monitor possibly take if a thread exceeded this threshold? Terminating the thread is not an option: the thread could be holding a resource that may not get released upon termination. Program state could certainly get corrupted by such an operation.

    What is the most likely reason a thread would not finish its task and not return to the pool? A thread blocked on a resource owned by another thread: a deadlock. Perhaps the worst cause of deadlocks are cyclical lock acquisitions. An example:

    Thread A acquires Lock 1
    Thread B acquires Lock 2
    Thread A acquires Lock 2 and waits
    Thread B acquires Lock 1 and waits - deadlock!

    In complex systems cyclical deadlocks are a constant concern, especially with "not-too-well-written code". :)

    I think that it's preferable to raise an exception when a deadlock occurs than have the program silently stop working. At least an exception can generate a stack trace, and of course the caller can handle it and retry the operation from the beginning.

    The deadlock detection algorithm maintains a list of locks and their owners, and checks that list before attempting to acquire a locked resource. A wait graph has to be constructed to do this properly. The wait graph isn't built until a TryEnter operation fails after several attempts (to minimize performance loss).

    A very good explanation of how that works can be found here:

    http://msdn.microsoft.com/en-us/magazine/cc163618.aspx

    I am planning to modify the critical section and mutex objects that I use here to do this.

    ReplyDelete

Please keep your comments related to the post on which you are commenting. No spam, personal attacks, or general nastiness. I will be watching and will delete comments I find irrelevant, offensive and unnecessary.