On Software and Languages: Some Observations on Software Estimation

Saturday, 11 December 2010

Some Observations on Software Estimation

In my prevoius post post in introduced the "real life factor" frw for software estimation, and conjectured it to be as high as 3. I remebered that as I read James Iry's short blogpost. Let me quickly restate its contents. He recollects the course of a software project and the estimates at each point of its history. I summed that up and added the reason for each estimate failure:

             | Project | 
   Estimate  |  Phase  |   Error Reason 
  -----------+---------+-----------------
    2 weeks  |    0%   |  initial guess
             |         |  
    4 weeks  |   50%   |  too optimistic!
             |         |  
    6 weeks  |   90%   |  hidden complexity
             |         |  
    8 weeks  |   99%   |  forgotten the tests,
             |         |  docs and review
             |         |  
    8 weeks  |  100%   |  not really ready, but 
             |         |  has had enough!

So there it is, when estimating we are too optimistic from the start, and then we tend to forget a lot of things .So using this error "model" the real life factor would be frw=4! I'd say rather high. Maybe we need a better model? And what if we want to be really careful?

Software Estimation: Demystifying the Black Art (Best Practices (Microsoft))

When speaking about serious software estimation you will usualy end up with the "Demistyfying the Black Art" book (an excellent book indeed, read it!), and will probably learn and use the PERT technique, where you have to give 3 estimates: the best case, the normal case, and the worst case estimates. Then you take a formula and get the expected project duration.

That's all very well and good, but when using this technique, we still tend to be overly optimistic and to forget a lot of things, exactly as in the above example!

The following blogpost hits the nail on the head:

Our experience has taught us that simply asking developers to make two estimates (“low” and “high”) doesn’t really result in much more information than a single point estimate. Developers tend to be optimistic when it comes to individual estimates. So without a little structure, the “low” estimate usually means the absolute smallest amount of time the task could take – an event with a very low probability of coming true. The “high” estimate means what it’s most likely to take to get done.

Given that data, let us do a smal, back of the envelope calculation for the probabilities of the 3 PERT estimates. Thus:

  low estimate => rather improbable => say 20% probability
  high estimate => most likely => i.e. 50% probability

then

  medium estimate => will lie inbetween => ~30% probability

So the real life factor for the medium (i.e. realistic) here is frw=3.3, not that different form the above, unscientific guess! PERT will correct this somehow, by not that much. So should we really always multiply our carefully crafted estimates by 3 (or 2), thus admitting that we cannot do it? That's scary.

But wait, we don't need 100% certainty, that would be sandbagging! Let us say, 80% is enough (see Pareto), but that's still 2,7 times the our "realistic" estimate or 1,6 times the "worst case estimate". I don't want to make any conclusions though, it's only some observations because it makes me curious that the value of 3 seems to resurface rather often!