Almost 20 years ago, Chris Schofield and I published a paper entitled “Estimating software project effort using analogies”  that described the idea of using case-based (or analogical) reasoning to predict software project development effort using outcomes from previous projects. We tested the ideas out on 9 different data sets and used stepwise regression as a benchmark and reported that in “all cases analogy outperforms” the benchmark.
Today (27.10.2017) is a landmark in that not only is our paper 20 years old, it has (according to Google Scholar) 1000 citations. So I thought it appropriate to take stock and offer some reflections.
Why has the paper been widely cited?
I think the citations are for four reasons. First, the paper proposed a relatively new approach to an important but tough problem. Actually, the idea wasn’t new but the application was. Second, we tried to be thorough in the experimental evaluation and provide a meaningful comparator. Third, the publication venue of the IEEE Transactions on Software Engineering is highly visible to the community. Finally, there is an element of luck in the citation ‘game’. Timing is all important, plus once a paper becomes well known, it garners citations just because other writers can recall the paper more easily than alternatives, that might be more recent or relevant but less well known.
What ideas have endured?
I see three aspects of our paper that I think remain important. First, we used meaningful benchmarks with which to compare our prediction approach. We chose stepwise regression because it’s well understood, simple and requires little effort. If analogy-based prediction cannot ‘beat’ regression then it’s not a competitive technique. I think having such benchmarks is important, otherwise showing an elaborate technique is better than a slightly less elaborate technique isn’t that practically useful. At it’s extreme Steve MacDonell and myself showed  that yet another study using regression-to-the-mean and analogy  was actually worse than guessing, however, I hadn’t realised because at the time I hadn’t used meaningful benchmarks.
Second, we used a cross validation procedure, specifically leave-one-out cross validation (LOOCV). Although cross-validation is a complex topic the underlying idea of trying to simulate predicting unseen cases (or projects in our study) is important.
Third, in terms of realism we also noted that data sets grow one project at a time so in that sense LOOCV is an unrealistic validation procedure. Unfortunately our data did not include start and end dates so we were unable to properly explore this question except through simple simulation.
What would I do differently if we were to rewrite the paper today?
There are three areas that I would definitely try to improve if I were to re-do this study. The first is — and it’s quite embarrassing — the fact that the results cannot be exactly reproduced. This is mainly because the analogy software was written in Visual Basic and ran on Windows NT. It also used some paid-for VBX components. We no longer have access to this environment and so cannot run exactly the same software. Likewise the exact settings for the stepwise regression modelling are now lost and I can only generate close, but not identical, results. A clear lesson is to properly archive scripts, raw and intermediate results. However this would still not address the problem of no longer being able to execute this early version of our Analogy tool (ANGEL).
Second, the evaluation was biased in that we optimised settings for the analogy-based predictions by exploring different values for k (the number of neighbours) but the regression modelling was taken straight out of the box.
Third and finally, we reported predictive performance in terms of problematic measures such as MMRE and pred(25). We did not consider effect size or the variability of the results. Subsequent development in this area has greatly improved research practice.