Numerous proposals and experiments have addressed the stresses resulting from computer science's shift from a journal to a conference publication focus, discussed in over two dozen commentaries in Communications, three panels at CRA Snowbird conferences, the Workshop on Organizing Workshops, Conferences and Symposia for Computer Systems (WOWCS'08), and a recent Dagstuhl workshop.a We focus here on recent efforts to blend features of conferences and journals, highlighting a conference that incorporated a revision cycle without increasing the overall reviewer workload. We also survey a range of approaches to improving conference reviewing and management.
Proposals fall into three principal categories:
- Return to the journal orientation that has long served the sciences and engineering well.
- Develop hybrid approaches that combine features of journals and conferences.
- Improve conference reviewing and management in other ways.
1. Return to the traditional journal focus of the sciences. Rolling back the clock is attractive but probably not feasible. It would require a unified effort in a famously decentralized discipline. It would be resisted by established researchers who built their careers on conference publication. It would have to overcome the forces that motivated the shift to conference publication in the first place.4
2. Combine journal and conference elements. Conference proceedings have usurped two key journal functions: They are now archived and widely available. The boundary is blurred further when conferences increase reviewing rigor and journals reduce reviewing time. Article lengths are converging as reduced production costs let conferences relax or drop length limits and as demands on reader attention push journals to decrease article length.b The most significant remaining distinctions are: journals encourage more revision and are less deadline driven; conferences promote informal interaction and other community-building activities.
Most calls for change are related to these two distinctions. Conference program committees evaluate papers on different dimensions (originality, technical rigor, audience engagement, and so forth) and make binary, in-or-out quality determinations under time pressure on first drafts. Conferences struggle to foster a sense of community when rejecting the great majority of submissions. Authors perceive injustice, feel their careers could be affected, and are driven to attend or form other conferences. Table 1 lists approaches that seek a middle ground to deliver benefits of both conferences and journals.
Journal acceptance precedes conference presentation. Articles submitted to the online monthly journal Proceedings of the Very Large Data Bases Endowment (PVLDB) are limited to 12 pages and receive three rapid reviews. Those that have been accepted a couple months prior to the annual fall VLDB conference are eligible for presentation. Longer versions of PVLDB articles can be published in the independently managed VLDB Journal. The same process is used by the Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), which presents papers that have been accepted by the ACM Transactions on Code Optimization (TACO). Although these journals stress rapid reviewing, this undercuts the hypothesis that rate of innovation was tied to the shift to a conference focus in computer science in the U.S.
VLDB and HiPEAC no longer have separate submission and review processes. Alternatively, partial overlaps are being explored. Submissions to a special issue of the journal Theory and Practice of Logic Programming are also submissions to The International Conference on Logic Programming (ICLP), and granted one revision cycle if necessary, as is common to journal special issues. This approach retains much of the conference deadline and program committee structure. We will describe this approach in detail as used for conferences unaffiliated with a journal.
Similarly, the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD) now will have a journal track comprising papers accepted by the Machine Learning and Data Mining and Knowledge Discovery journals. An editorial board is created to handle submissions for the journal track of the conference that is distinct from the journal editorial boards and from the program committee that will handle submissions to non-journal conference tracks.c
These ensure the preeminence of a journal. They do not provide the conference's traditional role of building community by providing authors with feedback on less polished work that is intended for subsequent journal submission: The adage that a computer science conference is "a journal that meets in a hotel" is literally true in this case. Although best suited to a small research community, the practice has been embedded in larger conferences. The Computer-Human Interaction (CHI) conferences' 1,5002,000 submissions are more than a journal review process could cope with, but ACM Transactions on Computer-Human Interaction articles comprise about 5% of CHI presentations. Similarly, 23% of Intelligent User Interfaces (IUI) 2011 presentations were of papers from ACM Transactions on Interactive Intelligent Systems. The Association for Computational Linguistics (ACL) also presents papers published in the Transactions of the ACL.
Shepherded conference papers become journal articles. SIGGRAPH and Infovis proceedings become issues of ACM Transactions on Graphics (TOG) and IEEE Transactions on Visualization and Computer Graphics, respectively. IJ-CAI'13 will have a track in the Journal of AI Research. Following a conditional acceptance, a committee member may be assigned to check each revision. In practice, the committee expressed an inclination to accept and time is limited; acceptance is all but certain. Such papers are rarely if ever rejected. Shepherding is not a journal-style revision and re-review.
Journals encourage more revision and are less deadline driven; conferences promote informal interaction and other community-building activities.
Concerned that this practice will lower journal standards, ACM has a new policy2 that forbids rechristening of conference papers as journal articles (exempting SIGGRAPH, although many authors list SIGGRAPH papers as peer-reviewed conference papers rather than TOG journal articles on their CVs). ACM may be fighting a tide that is eroding the conference-journal distinction. Outside ACM, boundaries continue to blur: the Journal of Machine Learning Research publishes a volume of workshop and conference proceedings, a useful collection of related work.
Conferences without a journal affiliation that incorporate a revision cycle. Aspect-Oriented Software Development (AOSD) offers multiple submission deadlines. For the March 2012 conference, authors could submit in April, July, or September of 2011. The April and July review processes yield accepts, rejects, and revise and resubmits. Submissions in September are either accepted or rejected. The Asian Conference on Machine Learning (ACML2012) had two deadlines and the same model.
The ACM Computer Supported Cooperative Work 2012 (CSCW 2012) conference employed a single submission date and a five-week revision period. This is essentially a rapid version of a journal special issue process: a submission deadline followed by reviewing and one round of revision that is fully examined by the original reviewers. This process is discussed below. It has now been used by ACM Interactive Tabletops and Surfaces 2012, SIGMOD 2013, and CSCW 2013.
Adding a revision cycle resembles a time-compressed variant of the usual practice, where a rejected paper can be resubmitted in a year, but the dynamic is different. A reviewer does not have to make an often stressful, binary, in-or-out recommendation. Reviewers have more incentive to provide constructive, complete reviews, rather than primarily identify flaws that justify rejection. Authors who will get the same reviewers have more incentive to respond to suggestions than when they resubmit a rejected paper to a different conference with a new set of reviewers. Reviewers have to examine some submissions twice, but most find the second review easier and are rewarded by seeing authors take their suggestions seriously. When a rejected paper is resubmitted elsewhere, the overall reviewing burden for the community is greater and less rewarding.
The CSCW 2012 experiment. As program chairs for this large annual conference, we were asked to move the submission deadline forward two months to reduce the reviewing overlap with the larger CHI conference. To turn this lemon into lemonade, we inserted the revision cycle.
Computer Supported Cooperative Work conferences averaged 256 submissions and 56 acceptances (22%) from 20002011. In recent years, most submissions were reviewed by two program committee members (called Associate Chairs or ACs) and three external reviewers, followed by a face-to-face meeting of the ACs.
The 2012 conference received a record 415 submissions. In the first round, each submission was reviewed by an AC and two external reviewers. Forty-five percent were rejected and authors notified. The authors of the remaining 55% had five weeks to upload a revision and a document describing their responses to reviewer comments. A few withdrew; the others revised, often extensively. The revisions were reviewed by the original reviewers. After online discussion only 26 remained unresolved. The face-to-face program committee meeting may have been unnecessary, but it had been scheduled. Final decisions were made there.
In the second round, 27% of the revisions were rejected. Overall, 39.5% of the original submissions were judged to have cleared the quality bar. The traditional process would have accepted approximately 22% 'as is'; this year all papers were revised. The consensus was that a high-quality conference had become larger and much stronger.
Analysis of the CSCW 2011 data enabled us to focus reviewing where it was neededwe did not increase the overall reviewing burden. In 2011, 60% of submissions found no advocate in the early reviews and not one of those was ultimately accepted. Yet each received a summary review by one AC, was read by a second AC, and was given a rebuttal opportunity. For CSCW 2012, 45% had no advocate in the first round and were summarily but politely rejected. The average workload per submission was reduced despite each "revise and resubmit" paper receiving eight reviews by four reviewers over the two rounds.
The number of reviews or serious considerations fell from an average of 5.5 per submission the previous year to 4.6, a reduction of 400. And 35% of all reviews were of revisions, aided by the authors' documents describing the changes. The rebuttal process was not needed; dropping it led to no objections. Finally, the 6080 papers that were accepted because of the revision process would have been prime candidates for resubmission elsewhere, so the community was spared hundreds of additional reviews downstream. (Many conferences already have more streamlined review processes than we started with, but could still realize a net reduction in effort by adopting a revision cycle.)
Positive outcomes. First the good news. Overall the reports were very positive. The revise-and-resubmit option for borderline papers in the first round reduced reviewer stress. Reviewers could focus on finding what might be of interest and formulating constructive guidance for revision, rather than identifying flaws that warrant rejection. Reviewers found it rewarding to see authors who had responded well to comments. An interesting benefit was that acceptance was not a zerosum game. We did not have a quota, just the goal of keeping the same quality bar. Without the pressure to reject 75% to 80% of submissions, a four-person review team could iterate with an author without disadvantaging other authors. Some review teams engaged in protracted online discussions. With few decisions left to make, half of the face-to-face meeting was spent discussing broader issues and planning for next year. Some program committee members remarked that for the first time, the review process left them energized rather than drained.
Conference attendance increased 80% to a record 657. Over one-third of attendees responded to a post-conference survey. Of those expressing an opinion, 94% felt the new process improved the conference. The community building that was once the raison d'être of conferences was arguably strengthened. Authors and reviewers sent many positive comments, exemplified by the postcard shown earlier in this column.
Challenges. Some reviewers found that multiple reviewing sessions for one conference were taxing, especially given that reviewing was in the summer when vacations were scheduled. An operational challenge was that despite good intentions, PC members and reviewers with strongly entrenched habits born of years of committee service could find it difficult to adjust to new ways of working.
The greatest benefit of the revise-and-resubmit approach is that rather than rejecting many papers that need to be polished or fixed, reviewers can coach the authors to produce higher-quality versions fit for publication. Ironically, a challenge to employing this approach is the perception that a higher acceptance rate signals low conference quality. Despite a widespread view that quality had risen, some researchers fear that external referees will look unfavorably on the acceptance rate. Many senior researchers discount the selectivity = quality equation, but it is built into the assessment practices of some universities and has traction among junior researchers who like to think that decisions are based on visible, objective data. Acceptance rate as a signifier of quality has a sound historical basis.
In 1999, some peer-reviewed conferences were formally recognized as sources of high-quality computer science research in the U.S.6 However, not all conferences in the U.S. and few elsewhere stressed polished research; they remained more inclusive, providing authors with feedback on work in progress toward journal publication. Acceptance raterejecting 75% or more of its members' proffered worksignaled that a conference emphasized quality over community-building inclusiveness.
That said, for acceptance rate to truly indicate quality across different conferences, three things must be true: the submissions must represent the same quality mix; the process used to reject submissions must accurately measure merit; and the same assessment process must be used by the conferences being compared. Unfortunately, none of these are reliably true. Conferences vary in the quality of the submissions they attract. Thomas Anderson has argued eloquently that beyond the top handful of strong papers, there is little ability for reviewers to reliably differentiate between a large fraction of submissions.1 And process changes such as ours will raise the proportion of papers that attain a high level of quality.
Eventually, citation or download numbers could indicate the impact of a process change. CSCW 2012 is accumulating citations far more rapidly than its predecessors, but it is too early for definitive judgment. For now, to see the problem with the selectivity = quality equation, consider this analogy: Two organizations each admit six applicants for a probationary year. One gives them minimal attention, tests them at year's end, and retains the two who managed to learn on their own. The second invests in training and at year's end keeps four: The two who would have survived on their own are, by virtue of the training, rock stars, and two others passed. The first organization had a 33% acceptance rate, the second a 66% acceptance rate, based on the same raw materials. Low acceptance rate accompanied lower quality. The same is true with journalsthose that work patiently with authors over multiple versions raise both quality and acceptance rate. Process matters.
Acceptance rate as a signifier of quality has a sound historical basis.
Another concern is that many people enjoy single-track conferences. Raising the acceptance rate requires adding tracks, lengthening a conference, or allocating less presentation time to papers. CSCW had already moved on from being single-track. As the field grew and specializations developed, trade-offs arose. With more submissions and more varied submissions, rejection rates were pushed up, reviewing became more stressful and less uniform, and incremental advances competed for space with new ideas. CSCW shifted to multiple tracks and a tiered program committee. Other conferences have maintained a single track and driven acceptance rates into single or low double digits. Many reports of conference stress published in Communicationsd come from these fields. High rejection rates foster disaffection and drive authors to other conferences, dispersing the literature and undermining the sense of community that the single track initially created.
Finally, the perennial large-conference challenge of matching reviewers to submissions. We devised a comprehensive set of keyword/topic areas. We let associate chairs bid on submissions. Nevertheless, a match based on topic often proves to be poor due to differences in preferred method, theoretical orientation, or other factors. Some reviewers, who might be called 'annihilators,' consistently rate papers lower than others who handle the same submissions. Some reviewers are 'Santa Clauses.' Statistical normalization does not fully compensatea submission does not get the essential advocate by adjusting the scores of annihilators. Others see some merit and some weakness in everything and rate everything borderline. Add Anderson's observation that differences in quality are very small over a broad range of submissions, and luck in reviewer assignment can be a larger factor in outcomes than submission quality. Next, we review other proposals and experiments to improve conference management.
3. Improve existing review processes. Approaches listed in Table 3 have been noted in Communications commentaries. We are not endorsing them all. Some conflict. Many could be used together. Some are in regular use in some conferences; others have been tried but did not take root.
Conference size is an important variable in assessing utility. Some prestigious conferences attract fewer than 150 submissions and might have a flat program committee. Some attract more submissions and enlist a second tier of external (or 'light') reviewers. The largest can have three levels, with tracks or subcommittees. Potential comparisons increase non-linearly with submission number; approaches that work for one size may not scale up or down.
Tracking submission histories. The International Conference on Functional Programming allows authors of papers previously rejected (by ICFP or other conferences) to append the review history and comments. Eleven percent of ICFP 2011 authors declared a history. Half of those provided annotated reviews. Half were re-reviewed by one of the original reviewers, and all except one were accepted. Although mandating that authors report a paper's prior submission history would face practical and ethical challenges, it can be a win-win for authors and reviewers when authors opt to do it.
Streamlining. In phased reviewing, submissions first get two or three reviews, then some are rejected and reviewers added to the others. Often all results are announced together, but EuroSys starting in 2009 has notified the authors of rejected papers following the first phase, as CSCW 2012 did after the first round, enabling authors to quickly resume work. Conferences have also experimented with various methods of ordering papers for discussion in the committee: randomly, periodically inserting highly rated or low-rated papers, starting at the top, starting at the bottom. No consensus has emerged, although some report that papers that are first discussed late in the day tend to fare poorly whatever their rating.
Double-blind reviewing. Evidence indicates that author anonymity is fairer, and this practice has spread. To anonymize is sometimes awkward for authors. Because anonymity can inhibit program committee members from finding duplication or extreme incrementalism, some two-tier committees only blind the less influential external reviewers.
Clarifying review criteria. Reviewers have been asked to rate papers on diverse dimensions: originality, technical completeness, audience interest, strengths and weaknesses, and so on. In our experience, committees working under time pressure focus on the overall rating; writing quality gets some attention and nothing else does once conferences reach a moderate size.
Matching reviewers to papers. In general, the smaller the conference, the more easily reviewer assignments can be tuned. Keyword or topic area lists are common. Matching semantic analyses of a reviewer's work to submissions has been tried. Some conferences let reviewers bid on submissions based on titles and abstracts. CSCW 2013 authors could nominate ACs for their papers; although no promises were made, their choices were helpful. IUI twice let program committee members choose which submissions to review; an absence of reviewer interest was a factor in the final decision.
Normalizing to control for consistently negative or positive reviewers. Since 2006, Neural Information Processing Systems (NIPS) has calculated a statistical normalization to offset consistently high or low reviewer biases. Other conferences have tried this, usually just once. It does not counter biases directed at particular topics or methods, the occasional reviewer who gives only top and bottom ratings "to make a real difference," or those who uniformly give middling reviews. It does not produce an advocateknowing that reviewers were inherently negative does not replace harsh critiques with positive points. Normalization may be more useful for smaller conferences with fewer papers to discuss. Another approach tried once by SIGCOMM (2006), SOSP (2009), and other conferences had reviewers rank submissions. Although the rankings were used primarily to create a discussion order, relative judgments could counter a reviewer's overall positive or negative bias.
More or less constructive reviews? A high-pressure binary decision process often yields reviews that focus on identifying a fatal flaw. Some people suggest that providing less feedback could reduce reviewer load and discourage incomplete submissions, but calls for more balanced and constructive appraisal are more often heard.
Rebuttals. Authors are generally prohibited from promising to make changes, but many appreciate the opportunity to express themselves. This practice has spread after being introduced over a decade ago. How rebuttals affect outcomes is unknown. The art of writing an effective rebuttal must be mastered, which disadvantages the uninitiated.
Shadow PCs. Several conferencesNSDI, SIGCOMM, SOSP, and EuroSyshave staged full-blown mirror events for training purposes.3,5 Shadow PC output does not inform PC decisions. Large differences in acceptance decisions do suggest that younger and older researchers have different orientations and support Anderson's hypotheses about imprecision in conference reviewing.1
Mentor or shepherd authors of tentatively accepted papers. Assigning mentors to papers in the preparation phase can help authors, but is stressful for the mentors of struggling submissions. This practice has been tried but not taken root. More frequently, a shepherd is assigned to guide a tentative acceptance into final form. Some authors find shepherding helpful, others report perfunctory or absentee shepherds. Assigning a shepherd is often a face-saving means to calm down a program committee member who has reservations. Shepherded papers virtually always make it into the corral. The 2011 Internet Measurement Conference gave authors a choice of a shepherd or a 'soft' open review alternative (publishing the paper with its reviews and the author's descriptions of changes). Most chose the latter.
Publish reviews. Reviews of accepted HotNets 2004 and SIGCOMM 2006 papers were posted publicly. Neither conference continued the practice, perhaps because of the extra effort that reviewers reported. Similar experiments are under way.
Improve presentations. ICME 2011 required authors of accepted papers to submit lecture videos. A subset was selected for oral presentation.
Other member support efforts include offering a free registration and a five-minute 'boaster' presentation to finishing graduate students at Innovations in Theoretical Computer Science. Publicly honoring exemplary reviewers, a practice of some journals, has been encouraged for conferences.
Conclusion: Change Is Probably Inevitable
In computer science especially, conferences and journals compete to communicate and archive results. Journal articles grow shorter and reviewing time decreases. Conference reviewing rigor increases and proceedings are more polished. Measures of impact now cover both. There are stresses, but is there a need for a major adjustment?
We think so. The wealth of proposals and experiments signal dissatisfaction with the status quo. Some involve bringing conferences and journals closer through direct ties or shared features. Adding a revision cycle led to more acceptances, but also shorter presentation times, more parallel sessions, and a shift from acceptance rates to citations and downloads as measures of impact.
At risk with conference-journal hybrids is the community building and community maintenance that conferences once provided. Many conferences decline in size even as the researchers and practitioners in the field increase in number. The popularity of workshops that accompany conferences reveals a need for member support and a sense of community, but a set of disjoint workshops does not signify a thriving community. Indeed, successful workshops often spin off to become stand-alone conferences.
At risk with conference-journal hybrids is the community building and community maintenance that conferences once provided.
Other changes may be coming. Globalism has made geographically anchored conferences more expensive. As real-time audio and video become more reliable, travel becomes more uncomfortable, and concern for our carbon footprint grows, community activity may move online, perhaps suddenly. We cannot predict the future, but we do know the future will not resemble the present or the past.
2. Blockeel, H., Kersting, K., Nijssen, S. and Zelezný, F. A revised publication model for ECML PKDD. Computing research repository, 2012; http://arxiv.org/pdf/1207.6324v1.pdf
3. Feldmann, A. Experiences from the SIGCOMM 2005 European shadow PC experiment. ACM SIGCOMM Computer Communication Review 35, 3 (2005), 97102; http://dl.acm.org/citation.cfm?id=1070889
5. Isaacs, R. Report on the 2007 SOSP shadow program committee. ACM SIGOPS Operating Systems Review, 42, 3 (2008), 127131; http://dl.acm.org/citation.cfm?id=1368524.
a. Links to Communications commentaries, Snowbird panels, WOWCS'08 notes, and the Dagstuhl workshop are at http://research.microsoft.com/~jgrudin/CACMviews.pdf.
b. Haslam, N. Bite-size science: Relative impact of short article formats. Perspectives on Psychological Science 5, 3 (2010), 263264; http://pps.sagepub.com/content/5/3/263.abstract. SIGGRAPH dropped submission length limits in the 1990s, followed by UIST and CSCW more recently. Few papers exceed 10 or 12 pages.
d. Links to Communications commentaries, Snowbird panels, WOWCS'08 notes, and the Dagstuhl workshop are at http://research.microsoft.com/~jgrudin/CACMviews.pdf.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2013 ACM, Inc.