FaceTAV 2017 Workshop at Facebook

On Monday and Tuesday (6-7th Nov, 2017) I was privileged to attend a two day workshop at Facebook, London on Testing and Verification organised by Mark Harman and Peter O’Hearn, along with about 90 other people.  Unusually for these kinds of events it was a very balanced mix of industry and academia.
The theme was testing and verification and some kind of rapprochement between the two groups. As it turned out this was hardly contentious as everyone seemed to agree that these two areas are mutually supportive.
As ever it seems a bit invidious to pick out the key lessons, but for me, I was most struck by:
  1. Google has ~2 billion LOC.  That is hard to appreciate!
  2. Practitioners strongly dislike false positives (i.e. predicted errors that do not exist), it was suggested more than 10% is problematic.
  3. Testing multiple interleaved threads is complex and it’s possible to occasionally observe very unexpected behaviours.
  4. Testing costs are a key concern to developers but not something that researchers, yet, have a good handle on.
  5. The quality and sophistication of some automated testing tools e.g. DiffBlue and Sapienz (see the paper by  Ke Mao, Mark Harman and Yue Jia) and their ability to scale is seriously impressive!
So thank you speakers and organisers!

Why I disagree with double blind reviewing

As both an author and member of a number programme committees, I’ve been reflecting on the recent decision of various academic conferences including ICSE 2018, to opt for double blinding of the review process. Essentially this means both the identify of the reviewer and the author are hidden; in the case of triple binding, which has been mooted, even the identity of one’s fellow reviewers is hidden.

But there remain many potential revealing factors e.g. the use of British or US English, the use of a comma or period as a decimal point indicator, choice of word processor and the need to cite and build upon one’s own work. Why not quadruple or even quintuple blinding?!! Should the editor or programme chair be known? What about the dangers of well-established researchers declining to serve on PCs for second-tier conferences? Perhaps there should just be a pool of anonymous papers randomly assigned to anonymous reviewers that will be randomly allocated to conferences?

Personally, I’m strongly opposed to any blinding in the review process. And here’s why.

Instinctively I feel that openness and transparency lead to better outcomes than hiding behind anonymity. Be that as it may, let’s try to be a little more analytical. First, what kinds of bias are we trying to address? There seem to be five types of bias derived from:

– personal animosity
– characteristics of the author e.g. leading to misogynist or racist bias
– the alignment / proximity to the reviewer’s research beliefs and values
– citation(s) of the reviewer’s work
– the reviewer’s narcissism and the need for self-aggrandisement

It seems that only the first two biases could be addressed through blinding, and this of course assumes that the blinding is successful, which in small fields may be difficult. Although I would never seek to actively discover the identity of the authors of a blinded paper, in many cases I am pretty certain as to who the authors are. And it doesn’t matter.

In my opinion, double blinding is a distraction, but one with some negative side effects. The blinding process harms the paper. As an author I’m asked to withhold supplementary data and scripts because this might reveal who I am. Furthermore sections must be written in an extremely convoluted fashion so I don’t refer to my previous work, reference work under review or any of the other perfectly natural parts of positioning a new study. It promotes the idea that each piece of research is in some sense atomic.

Why not do the opposite and make reviewers accountable for their opinions by requiring them to disclose who they are. Journal editors or programme chairs are known so why not the reviewers too?

Open reviewing would reduce negative and destructive reviews. It might also help deal with the situation where the reviewer demands additional references be added all of which seem tangential to the paper but, coincidentally, are authored by the reviewer! The only danger I foresee, might be that reviews become more anodyne as reviewers do not wish to be publicly controversial. But then this supposes I, as a reviewer, wish to have a reputation as a bland yes-person. I would be unlikely to want this, so I’m unconvinced by this argument.

So whilst I accept that the motivation for double blind reviewing is good, and I also accept I seem to be in a minority (see the excellent investigation of attitudes in software engineering by Lutz Prechelt, Daniel Graziotin and Daniel Fernández) but I think it’s unfortunate.