Thursday, September 2, 2004

100% Defect free vs. Reality.

Is it possible to create 100% bug free software?  Theoretically, I suppose it is possible.  But then again, how would you know if you've actually acheived it?  Design, specs, pre-conditions, post-conditions, unit-testing, test cases, metrics, audits, are all designed to mitigate the introduction of software defects.  Unit-testing is designed to break down a problem into managable pieces with test cases that are tailored to that one little bit of code.  The idea is that if you have well designed test cases that thoroughly cover a particular domain and all those tests pass, when you begin putting all these pieces together, you run less risk of introducing more defects because each “unit” is solid.  This philiosophy is core to XP (Extreme Programming).

I'm not going to try and burst any bubble surrounding XP as I agree with many of its tenets. What I am going to do, however, is try and and least poke at the myth that it is possible to achieve 100% defect free software.  Given enough time and resources, I think you may be able to reach 99.999999...%, but there is no way to really know whether or not you've reached the panacea of 100%.  I'm talking on a macro level here folks.  Systems that are reasonably sized, not some dinky sized program that does nothing but spit out “Hello World” and terminate.  Let's say a normal application has upwards of 1000 “units” (I'm not talking “units” in the Delphi sense, but the minimum testable block of actual code).  Complex applications may be 100 times that size.  As you begin to assemble these “units“ you are multiplying the permutations of both input data and output results.  There are also the unforseen interactions that take place when two pieces are brought together.  Just like some chemicals react violently when mixed, you can have these various “units“ come together and create some pretty wierd and wonderful smells and colors.  Design and architecture is most definately needed in order to mitigate the chances of a meltdown.  But then again, it is those pesky humans who dream up these designs and architecture.  This is also where another recent methodology, patterns, is gaining favor... but that is a topic for another time...

What I think happens when folks tout the 100% defect free mantra, they completely miss the human factor that is involved.  Bottom line is this; Humans are not perfect and will make mistakes.  They also miss the practicality and marketability of such a system.  There is a delicate balancing act. The customer says, “I want the software I use to solve a problem and operate as specified or as I expect.”  Those same folks also want as many features as possible to make they're life easier when using the software.  Where the balancing act comes in is with that one thing all of us have the same amount of; Time.  The software producer wants to minimize the amount of time it takes to get their goods to market in order to gain or maintain a competitive advantage.  The customer also wants a timely release of the software because the sooner they can use it the sooner they themselves can begin to become competitive.  Schedules, Feature lists, deadlines, etc.. all serve to push a product out the door, whereas defects are the opposing forces that serves to push the delivery in the opposite direction.  The difficult part is knowing when to press harder on the accelerator and when to put on the brakes.

This is where experience, intuituin (yes, I said intuition...), and solid metrics all come together to help you make those critical descisions.  One factor in determining what defects to focus on fixing and which ones to let slide come from understanding your customers in a broad sense.  Given a particular defect, we can use the previous tools (experience, intuition, metrics, etc...) to determine a defect's “surface area.”  The “surface area” of a defect is a somewhat subjective combination of the severity of the defect, its location, frequency and a smattering of other metrics thrown in, including a subjective risk metric and the current point in the delivery cycle.  Suppose in your testing phase you encountered a defect that occured each time you open a file?  Suppose opening and processing files is one of the primary functions of this application.  It is very likely that nearly all users of the application will encounter this defect.  Suppose this defect is that it injects random data into the file when it is opened.  The location combined with the severity and the frequency all combine to create a very high “surface area” defect.  On the opposite end of the scale, suppose there is a defect that only occurs when certain rare operaions are performed on a file and ojly if that file contains a sequence that is encountered in a statistically miniscule number of files.  One could easily conclude that this defect, while it may have a very high severity (say, it causes file corruption), in all it has a relatively small “surface area.”  No, we don't actually grab our calculators and begin feeding values into some magic formula, but rather it is a mental excercise that usually takes place whithin a small group, ranging from two-ten people.  Making any change to the code also carries a certain degree of risk.  Changing a commonly used routine can carry a very high risk because of the potential for wide ranging destabilization.  This risk also increases in a somewhat inverse geometric proportion to the remaining time on the schedule.

“OK Bauer!  Where are you going with all this?”  There has been a lot of interesting discussions out on the borland.public.delphi.non-technical newsgroup about the disposition of all the defects reported in Quality Central. They range from “Just fix everything reported in QC” to “Let's vote on the ones that hurt us the most” to “Borland ignores QC”  First of all, we do frequently “mine“ Quality Central for defects.  Many of them are transferred to our internal database (yes, they remain tethered).  All this data is used to help us determine the relative “surface area” of a given defect.  If we notice that it came from QC, that metric is used in these mental calculations.  Believe it or not, we have even been known to factor in data gleened from just lurking through the various Borland newsgroups.  In fact this should be evidenced by the fact that Steve Trefethen, recently made a request for someone to gather together a list of defects from QC.  He even posted a detailed feedback message stating that some of the bugs have now been fixed and will be available in the next Delphi release (shameless BorCon plug here..). 

“You bozo!  That doesn't help me now!”  Sadly, you're right.  But then again, if you'd followed along to this point you'd have seen that there are a lot of factors that go into deciding what we fix and what we don't and why.  Then there is also the fact that we can't fix what we don't know about.  Also, QC is not the only metric we use in determining what features to implement and what defects to fix.  Those are the drawbacks to having a publically available database.  There also tends to be a mob-mentality.  The positive side of a publically available database, especially one that allows commenting and voting, is that it tends to self-regulate.  The community as a whole can help weed out all the random cruft that will inevitably fill the database.

“I don't want excuses, I just want my bug fixed!“  This may all sound like I'm standing on my soapbox and telling you all how rough we have it... OK may be I am a little, but hey, it's my blog after all ;-).  I just see that there seems to be a small vocal subset of people who like to cast stones. Perspective plays a huge role here.  To many folks, it is just this one little bug, it shouldn't take that long, right?  Many times a bug fix takes much more time to research its impact than the time it takes to apply the fix.  We have to evaulate a fix not just in terms of a given test-case, but also figure out if there are other similar cases that this fix can address, or if it will have a negative net effect on the product.

It has also been commented that the QC bugs should take a higher priority that internally reported bugs.  While it seems that would be the best move politcally, many times it would simply steal time away from other more critical, higher “surface area“ defects.  “Then just delay your ship dates.“  Here's a very sticky one...  While there have been releases in the past where ship dates are dictated to the team and we had to make those dates and clean up the bodies later, that hasn't been the case recently.  While we are still given guidelines and target quarters, we for the most part control our own schedule as a team.  Make no mistake, there is still a schedule that is communicated to many other teams.  They must know this information in order to align their groups' work to match our team's schedule as closely as possible.  Groups such as marketing and sales need to know with a high degree of certainty when we'll release the product.  All of these are factors that go into determining the “surface area“ of a defect.  We could argue for days on the relative weight of each factor in the “surface area“ calculation. But the bottom line is that we have a lot of very experienced people that understand and know the product down to the individual lines of code that know how to make these determinations.  Do we make mistakes? You bet we do.

You can take away what you will from the above meandering drivel, but I'd like to think that most developers who uses our products for producing market driven, (internally developed and used software also has a “market”) software has had to make these same determinations regarding a defect's “surface area.”  You've also had to factor in schedules and deadlines when determining this information.

Again, you should really try and attend this year's Borland Conference where you'll hear about the next Delphi release, codenamed, Diamondback.  You'll also begin hear about our [Borland's] roadmap.  You can bet that Delphi is going to be a critical part of Borland's roadmap.