It Will Never Work in Theory

The IROP paper

Posted Oct 11, 2011 by Jorge Aranda

| Mining | Noticed |

(Re-posted from my blog, Catenary --Jorge Aranda)

If you keep track of recent developments in empirical software engineering, you may have already heard of the fantastic IROP study. I was too busy writing a paper to blog about it when Andreas Zeller presented it at PROMISE 2011, but here I go, in case you haven't read it.

Basically, Zeller, Thomas Zimmermann, and Christian Bird did what I'm afraid some researchers in our field do on a regular basis: take some mining tools and some data, and then go nuts with them---abuse of them in the most absurd ways imaginable. Luckily, Zeller, Zimmermann, and Bird did it on purpose and as a parody.

Here's what they did: take Eclipse data on code and errors, and correlate the two to find good predictors of bugs. Sounds sensible. But they did the correlation at the ASCII character level. So it turns out, for Eclipse 3.0, the characters that are most highly correlated with errors are the letters 'i', 'r', 'o', and 'p'. What is a sensible researcher to do facing these findings? Well take those letters out of the keyboard, of course! Problem solved:

The IROP keyboard

They then go over a supposed half-baked validation study with three interns, who reported great success in adapting to a life without 'i', 'r', 'o', and 'p' in their keyboards. Trial feedback:

We can shun these set majuscules, and the text stays just as swell as antecedently. Let us just ban them!
Near the end, the authors go over everything that's wrong with their approach (lack of theoretical grounding, dishonest use of statistics, and a long et cetera). It's a fun read, and instructive. Research, in general, needs more parodies. If you like this one, some of my other favourites are:
Comments powered by Disqus