Tuesday 1 August 2017

My major disappointments with peer-review (so far)

I always tell my PhD students that the academic peer-review system is bad, but it is the best we've got. Every time one of them feels they were unfairly treated by an editor's decision, or when they perceive that reviewers didn't really put the effort to understand their paper, I have to tell them not to take it personally, and to accept the flaws of the system because we haven't yet been able to figure out a better way to improve the quality of our collective scientific output. And I always tell them a story from my personal experience, to show them that they are not the only ones facing disappointment, and that they will live to tell others that things are not so bad.




The story I've been using over the past few years was the rejection of a paper Borislav Nikolic, myself and Stefan Petters submitted to the NOCS 2014 conference (which we have subsequently uploaded to arxiv, as it turned out its contribution was significant, but more on that later). The paper had received fairly good reviews, but one of the reviewers stated that our paper tried to fix exactly the same problem as another paper we did not know of, published a few months earlier at RTAS 2013, so the reviewer gave us the lowest possible score. Verbatim from the review: "...the authors have replicated the results of Kashif et al. in their RTAS 2013 paper". That's fair enough: a tough reviewer can take a stance that our paper was worthless without a clear comparison with the state-of-the-art, which in this case clearly should have included Kashif et al.'s paper. The punchline of this story, however, is the fact that the reviewer was not really criticising our lack of diligence on reviewing the state-of-the-art, but implying that we were trying to copy the approach of that paper. Again, verbatim: "As can be seen, the goals are identical. Now coming to the contributions, the current paper and the RTAS paper fix exactly the same pessimism in the model of Burns" (referring to a previous paper by Zheng Shi and Alan Burns published at NOCS in 2008). And here is my favourite part of that review:


"Also, please look at Fig.1 in the current paper and Fig. 3.1 on page 20 of the thesis (ref. [8] of the RTAS paper) to be found here: https://uwspace.uwaterloo.ca/bitstream/handle/10012/6906/Gholamian_Sina.pdf?sequence=1

I know that this is basic stuff, but the figures are too similar for this to be merely a coincidence."


That's it: the reviewer accused us of plagiarism! Of using in our paper, without credit or reference, a figure published in the thesis of one of the authors of that RTAS paper. Immediately after reading that review, I went to check which figure the reviewer was referring to, first in our paper and then following the link to the PDF of that thesis. I could clearly see that the figures are the same, except for some cosmetic change (different font, line thickness, colour). But that was a figure I had created several years earlier for a paper I wrote with Alan Burns and Zheng Shi in 2009 (and published in 2010 in the IJERTCS journal). I had used the same figure in other documents afterwards, and also in our NOCS 2014 submission. So what really happened is that the thesis author had plagiarised me. And I was accused of, effectively, plagiarising myself.

We did contact the programme chairs of the NOCS conference, Joerg Henkel and Sudha Yalamanchili, to explain the situation, and they in turn contacted the anonymous reviewer. The reviewer replied that the plagiarism claim was not the reason for the low score, and that he/she "would have given the paper a score of 1 even if figure 1 did not exist". The programme chairs decided they could do nothing else. And we never received an apology from the reviewer or the programme chairs for being falsely accused of plagiarism. They all pretended that falsely accusing someone was normal, and unimportant.




Unfortunately, I now have a better (or worse) story to tell my students about the flaws of peer-review. This new story started in May 2016, when I was made aware of the work of Xiong et al. on the real-time analysis of priority-preemptive wormhole Networks-on-Chip, which has been presented in the GLSVLSI conference. Their paper presented a scenario showing that Shi and Burns analysis from NOCS 2008 was optimistic. At this stage, I'd like to clarify to those who are not familiar with worst-case analysis that a pessimistic analysis can be useful, i.e. if a system is designed to cope with a pessimistic worst case, it can also cope with the actual worst case, which will be less severe. It can be improved upon, of course, to avoid unnecessary over-dimensioning of a system, and that was the goal of both Kashif et al.'s RTAS 2013 paper and our rejected NOCS 2014 submission mentioned above. What Xiong et al. have shown, however, is that the analysis by Shi and Burns was optimistic, meaning that there are possible cases that are worse than the worst case predicted by their analysis, making it unsafe. That was a significant game changer, because it showed a flaw on an important piece of research (by that time, the original Shi and Burns paper already had more than 100 citations). More than that: each and every approach that tried to reduce the pessimism from that work, i.e. had a stricter view of what the worst case was, was also flawed. I must say at this point that the optimism in the analysis was not uncovered by the identification of flaws in the proofs and theorems given in any of the previously published papers, but by finding a counter-example through simulation: Xiong et al. found some results in their simulations that were worse than what was predicted by all state-of-the-art analyses, and then tried to identified the likely cause. Their paper then explained that cause, showed the optimism of the previous analyses in small examples, and proposed a new analysis to fix that problem.

My goal here is not to explain the exact nature of the problem they uncovered, or the details of their fix. But you can imagine how surprised I was when I found out about their paper. Many researchers, including myself, worked for many years on that precise topic, and nobody has seen that problem before. The first paper addressing the problem was published in the early 90s! Theorems and proofs were written, but the complex nature of such systems prevent such proofs from being perfect. Those are in reality proof sketches, and try to convince the reader with some intuition and logic that they are correct. To prove them wrong, however, it is a straightforward job: simply show a scenario where the real behaviour of the system is worse than the worst case predicted by the analysis. Such scenarios may be extremely hard to find, though. But in this case, one of such scenarios was well described in the GLSVLSI paper, and I was keen to understand it in depth. It took me a few days to get used to the notation used in that paper do describe the system, to familiarise myself with the scenarios under which the previous analysis would break, and to understand the analysis they proposed. Once I understood how they made it safe even under the corner cases that broke the previous analyses, I was convinced I could do a better job. Over the following days, I worked on improving analysis on my own, exploiting buffering and backpressure effects differently than all previous analyses. Springtime was particularly pleasant in York that year, so I did most of the work outdoors (on campus, or on the Millennium Bridge). Once I had convinced myself that my improved analysis was correct, I wrote a draft paper about it and shared it with Alan Burns and Borislav Nikolic, who were the two people I've previously worked with on NoC analysis. Unlike most researchers in this area, I prefer to write my papers without any proof sketches, relying only on intuitions and experimental work to get my point across. As I mentioned before, proof sketches are not fail-proof, and I find that they often make it harder to spot issues because of awkward notation. With input from Alan and Borislav, I was convinced that the new analysis attacked the most significant source of pessimism in Xiong et al.'s analysis, and it also opened the path to tackle other sources of pessimism (one of them using the techniques Borislav and I have explored in the paper with the self-plagiarism, but that was left as future work). And finally, after discussing in detail with Borislav about the weaknesses of Xiong et al's analysis, he came up with a counter-example that showed that their analysis was also optimistic! I then updated the paper draft with the details of that counter-example, with additional examples and large-scale experiments to compare the pessimism of the new analysis against Xiong et al. I then uploaded that paper to arxiv on June 9th.

Up to this point, I've had one of the best experiences in my life as a researcher: within a few weeks I was made aware of a new research problem that was hiding in plain sight, I had understood the state-of-the-art solution to that problem, and through compelling collaborations with great colleagues I was able to find a solution that was better than the state-of-the-art.

Once I published the paper in arxiv, we contacted all authors of the GLSVLSI paper to notify them of our findings that their analysis was flawed and unnecessarily pessimistic. We also invited them to be co-authors of a journal submission based on our arxiv paper, as we felt that we would not have done that work without the trigger caused by their GLSIVLSI paper. Our choice of a journal was the IEEE Transactions on Computers (ToC), which had previously published a paper by Kashif et al. (actually an extended version of their RTAS 2013 paper mentioned above), and which was the state-of-the-art on priority-preemptive wormhole analysis at the time. This resulted in an exchange of emails with two of the authors of the GLSVLSI paper, Qin Xiong and Zhonghai Lu, who informed us that they would be really glad to have a joint paper with us in that topic. They also mentioned that they had already submitted an article to IEEE ToC with an in-depth description of their analysis, and showing that their counter-example invalidated the analyses published by Shi and Burns in NOCS 2008 as well as the latest work by Kashif et al. on that same journal. They recognised the flaw we have pointed out in their GLSVLSI paper, and told us that it also appeared in their ToC submission, which was under review at that point. They therefore proposed to work on the joint paper with our improved analysis, and then decide what to do once they had a response on their submission.

So I carried on improving our draft, with further inputs from Alan, Borislav, Qin and Zhonghai. When our draft was nearly ready for submission, Qin informed us that their ToC submission was returned to them for major reviews. On behalf of his co-authors, he then proposed to include our fix to their analysis in the next revision of that paper, and to add Alan, Borislav and myself as co-authors in that paper. We felt that it would be confusing to have two papers under review at the same time, in the same journal, with the same authors, and one of them fixing and improving the other. Furthermore, we would prefer to have the fix of their flawed analysis as a contribution of our new submission, and would not feel comfortable to publish a paper with a flawed analysis. So we replied that we would prefer to carry on with the submission of our paper including the fix and the improvement on the analysis (i.e. based on the arxiv paper), that we would decline their offer to be co-authors in their current submission, and that it would then be best to separate the authors of both contributions. They insisted in having two joint papers, perhaps in two different journals, but we were keen on having our paper published in the same venue that had published the latest state-of-the-art in this area. As we couldn't find a solution that was completely acceptable to both sides, and as we could not really ask them to withdraw the paper with the flawed analysis, we decided to detach ourselves completely from that submission and to carry on with our new submission, including all of us as co-authors as initially proposed. That paper was submitted to ToC on October 10th. About the same time, Qin informed us that he would carry on with the major revisions in their submission, and that he wanted to include our fix as part of the revision, and offer once more a co-authorship. We once more declined, and simply asked for a citation to our arxiv paper as the source for the corrected analysis. Despite such a complex situation, the discussions and exchanges were done in a very respectful manner, and all parties tried to be as fair as possible with each other and to give academic credit to whoever it was due. In an ideal world, both papers would be published by the journal, and the whole community would be aware of the successive extensions to the state-of-the-art.

However, the world is not ideal, and this is when the disappointments started. On January 9th 2017, we received an email with a decision about our ToC submission, based on three reviews. The first reviewer was positive about the contributions of our work, pointed out a few minor corrections needed, and stated that a theorem that definitively proves that the proposed analysis is safe for all possible scenarios would be desirable and should be proposed as an open problem for future work. The second reviewer went deeper into the technical details and provided a number of requests for more details, clarifications and overall improvement (most of them very pertinent and useful). The third reviewer was a complete disappointment, providing a two-sentence review (and the second of them mostly irrelevant to our submission): "Authors proposed an analytical method to analyze the latency of a priority-preemptive wormhole network. However, they just provided the experimental result for 4x4 and 8x8 network with XY router under an uniform random traffic, which is not sufficient to support the fidelity of their model." But the most disappointing part of that email was actually the decision by the editor-in-chief Paolo Montuschi that the paper should undergo a reduction to a Transactions Brief as part of the requested major revisions. A Brief can have only 8 double-column pages, as opposed to a regular contribution which can have up to 16. That decision seemed completely arbitrary, given that our work claimed to supersede an article (by Kashif et al.) that was published as a regular contribution, and to supersede and fix another article (by Xiong et al.) that was being considered as a regular contribution. Furthermore, the reviewers of our submission have requested clarifications and more details about the approach, which would increase rather than reduce the number of pages of our contribution. Therefore we spent several weeks preparing a revision that addressed the comments of all three reviewers, which led us to make improvements on the analysis, the overall paper organisation, and to redo all experiments over a wider range of scenarios. To satisfy the third reviewer, we also performed a whole new series of experiments under different network topologies and using different routing algorithms, despite the fact that we did not believe those cases would change the experimental conclusions of our original submission (and, as expected, they didn't).

In the meantime, we were notified by Qin and Zhonghai that their submission has been accepted as a regular contribution to ToC. We then had to remove from our paper all the parts that showed the flaw in their original GLSVLSI paper, and our proposed fix to that analysis, as that would no longer contribute to the state-of-the-art once that was published in their paper. With the additional material requested by the reviewers, and with the removal of the material related to the fix to the flawed GLSVLSI analysis, the overall page count of the revised paper was slightly reduced to 10.5 pages. We then submitted that revision on May 8th, with an extensive letter to reviewers explaining how we addressed all their comments, and respectfully asking the editor-in-chief to consider our submission despite being above the 8-page limit of a Transactions Brief. One day later, we received a decision by Paolo Montuschi rejecting the paper without sending it to the reviewers. He argued "that the request to reduce the paper to a Transaction Brief (...) is not just an issue related to paper's length but also of impact of the proposed research, according to current TC quality standards". The fact that our paper fixed and superseded two papers published in that same journal as regular contributions, and that it effectively re-established the state-of-the-art in that area, was apparently not impactful enough for him. After discussing with my co-authors, I decided that it was pointless to argue with the editor-in-chief and to try to make them see that it is absurd to blindly apply rules related to page limits in a world of digital publications. I had to convince myself to agree that their bureaucratic approach to managing a journal would make it harder for us to establish the state-of-the art in our area, but that I should persevere nonetheless. That was not easy at all.

In the same decision email, we were given the chance to make a new submission as a Transactions Brief, i.e. reducing the paper to a maximum of 8 pages. I then invested a few days to do that: I removed some of the literature review, reorganised the background work and problem description, redrew all figures to occupy less space, reformatted tables and captions, abbreviated references, and removed all the additional experiments we had added to respond to the third reviewer, reduced the number of examples explaining our contribution and significantly reduced the summary of conclusions, comparison to related work and potential future work. I also contacted Qin and Zhonghai, stating my disappointment with the rejection of our joint paper, acknowledging that their contribution to the work has already been published in their ToC paper, and declaring that I would try to get our work published with the original set of authors (i.e. Alan, Borislav and myself). With all those changes, and with a reduced author list, I managed to produce a reduced-scope version of the paper within the 8-page budget. I have then submitted it as a Transactions Brief on May 16th.

We received a decision notification on July 27th, and again the editor-in-chief decided to reject the paper. This time, we've got two reviews. One of them was a three-liner: "The paper provides a tighter worst-case latency analysis for pre-emptive wormhole networks. A proof or explanation to confirm the safety of their analysis would be a plus. Section II, III and IV provide a good background and problem description, however, they are too lengthy compared by the contribution part of this paper." The second reviewer went a little further and raised a number of minor points, but they were either irrelevant (e.g. our choice for the operating frequency of the network routers in the experiments), based on their lack of familiarity with real-time networks (e.g. criticising the lack of support for adaptive routing), or were addressed in our previous submission and had to be removed to comply with the page budget for Briefs (e.g. definitions of "non-zero critical instant" and "sub-route interference"). Again, there was no convincing argument to justify the rejection of a contribution that superseded two articles recently published in that journal. The criticism that our work did not include a proof of its correctness is a valid criticism, but it is not a fair reason for rejection: all previously published papers (including those published by ToC) had such a proof, but they were nonetheless proven wrong by our counter-examples. We cannot guarantee that that there isn't a counter-example that could prove our analysis wrong, but if we are not allowed to publish it, the community would not know where to look for such a counter-example (as we did when we learned about the previous ones).

As a summary, we had to submit three versions of our work: the first received two useful reviewers (one more than the other) and a two-liner; the second was rejected right away and was never read or reviewed; the third received a three-liner and an uninformed review. This is way below the level of quality and engagement I expected in a top journal: poor reviewing and, above all, poor editorial work, as it is the job of the editor to choose the reviewers and to accept their reviews. As a result, more than one year has passed since we proposed an analysis that fixed the flaws and improved significantly the pessimism of the work published by Xiong et al., but that work is still the published state-of-the-art in this area. That's an incredibly disappointing situation.

As a final statement, I'd like to reiterate how I'm glad and proud of the way that all the active researchers behaved in the situations reported here, and that my disappointment is restricted to the low quality of review and editorial work we have experienced. Working with Alan and Borislav was a delight; and dealing with Zhonghai and Qin, on behalf of their co-authors, was always a very positive experience and I'm sure that the impact we made to each other's works greatly outweighs the fact that we were denied a joint publication giving fair credit to our contributions to a very challenging research problem. During this period I have also interacted with Hiren Patel (one of the authors of the Kashif et al. papers), talking about the "plagiarism" claims we faced, asking more details about their analysis and sharing with him the new developments we achieved. This was also a very positive interaction, including some good feedback on our arxiv paper.