Many computer science academics have written lately about problems with how our publication procedures have failed to scale as the field has grown (for example, see 2,3,4,8,9). While others have focused on trying to shift CS from conferences back to journals, it is worthwhile to understand exactly which problems we are trying to solve.
Acceptance Rates. The top conferences, where publication can make or break a career, may publish 10% of the submitted papers.a Submission rates have grown in the past decade with acceptance rates either flat or dropping, despite an increasing absolute number of papers accepted. What happens to the rejects? Realistically, there are three categories. First, there are the "bubble" papers. If, for whatever reason, the conference were to double its acceptance rate, these would be published, but they were rejected either because they were seen as too narrow or uninteresting, or they were considered to have significant flaws. Next are "second tier" papers that could well be publishable at area-specific workshops or less competitive conferences. Also in the "second tier" category would be "least publishable unit" (LPU) papers, where an author advances their own work by the smallest possible amount and the program committee wants more. Finally, there are "noncompetitive" papers, where the paper would have no chance at publication in any respectable venue.
Overloaded Reviewers. As submission rates have gone up, program committees must decide between huge workloads per reviewer, or adding PC members to the point where most members are completely disconnected from most papers' discussions. This appears to increase the degree of randomness in whether a paper gets in or is rejected.
Resubmission. What happens with all these rejections? Many are inevitably resubmitted. Major conferences try to coordinate their accept/reject announcement dates with subsequent conferences' submission dates, but these can still be quite tight. (For example, there was only one week, in early 2010, between IEEE Security & Privacy's notification date and the USENIX Security Symposium's submission date.) Consequently, similar content is reviewed again and again.
Journal Latencies. In many other fields, conferences only take short papers and the "real" work is submitted to journals. Journals offer the benefit of having the same set of reviewers through each phase of a paper's life cycle. The reviewers can insist on improvements and can then agree that the authors satisfied their requirements. In computer science, much work is never submitted to journals, and, at least in my experience, journals often receive a large volume of noncompetitive submissions, consuming reviewers' time. Furthermore, a latency of one year from submission to publication is entirely normal, and it can be far longer.
Short Incremental Work. Our current system of promotion and tenure strongly incentivizes authors to collect as many publications as possible, resulting in many different papers for any one given idea. Many current academics bemoan the good old days when you could have a group working on an involved, complex project, and have only one or a small number of groundbreaking papers. Given all these disparate concerns, can we evolve or redesign our way to a better structure for our academic processes?
A Clean-Slate Solution
Here, I describe the high-level design of a clean-slate solution, called CSPub (clean-slate publication, or perhaps, ambitiously, computer science publication). CSPub is, at its core, a mashup of conference submission and review management software with technical report archiving services like arXiv and with bibliographic management and tracking and search services like DBLP and Google Scholar.
Technical Reports on Steroids. Today, computer scientists have three different mechanisms for disseminating technical reports: their own personal or laboratory home pages on the Web, their departmental "official" technical report services, or centralized services like arXiv. Efficient publication and dissemination is the first and most fundamental service that we would have in CSPub. Ultimately, every paper published in our field can and should be available via this one mechanism, regardless of whether it is a "technical report," a "preprint," a "conference" paper, or a "journal" paper. Furthermore, when a paper is submitted to a conference or journal, the mechanism should be that the paper is submitted to CSPub, for all the world to see, and it should be flagged for the target conference or journal. CSPub could easily support a variety of submission mechanisms, including double-blind manuscripts for a conference linked to an optional public copy with the authors' full affiliation.
If all of academic computer science's scholarship were available in CSPub, a variety of new features would become feasible and relevant. First and most obviously, search engines could efficiently compute simple citation counts and h-indices as well as more sophisticated PageRank-like metrics. CSPub's ranking function(s) should be public and well documented, giving us a clear understanding of the incentives to adjust our publishing behavior. (The field of "bibliometrics" is dedicated to the design and evaluation of such systems, and those scholars would, no doubt, participate in the design of CSPub's metrics. Nature recently published a special issue on the topic.1) Alternately, CSPub could publish enough raw metadata that third-party services could apply their own metrics. If we can agree on metrics that favor smaller numbers of better publications, CSPub can help incentivize the community to change its publishing behaviors.
Given all these disparate concerns, can we evolve or redesign our way to a better structure for academic processes?
Publication status and awards (including "best paper" awards given by a conference or even "test of time" awards given in retrospect for papers that have had a significant impact) are metadata that could either be provided by the author or the conference steering committee. Such annotations help when somebody is searching CSPub for a paper to cite on a particular topic, and they may also contribute directly to how a paper is ranked. Professional curation would be necessary to prevent authors falsely giving themselves awards and to deal with related issues, including plagiarism, as they might arise. This could be aided by CSPub users "reporting" spam as well as automated antispam and anti-plagiarism systems.
With CSPub, other helpful features become easy. If Alice releases "version 2" of her paper, she could add bidirectional links, so "version 1" states that it has been superseded by "version 2", and "version 2" links to the earlier edition. This would allow ranking accumulated by preprints or tech reports to be applied to later conference and journal publications. Similarly, if Alice's paper has made a splash and she gets invited to give talks at a number of universities, those invited talks are, in effect, additional metadata links that are endorsed by the institution that invited Alice to give a talk, and which will improve Alice's ranking. (In CSPub, a university's colloquium series could be represented much like a journal, linking to existing papers.)
Problem Solving. With CSPub, the top papers will still get in, as always. "Bubble" papers will, at the very least, get the proper priority date of their initial submission and will start getting citations. Conferences might introduce "accepted without presentation" distinctions, allowing more papers to be recognized and to avoid the need for subsequent resubmission. Today, authors of rejected papers must choose whether to edit and resubmit to a top conference, resubmit to a lower-ranked conference, or abandon a paper. With CSPub, these decisions can be delayed. If the paper turns out to be popular and starts gaining citations, then its authors will be motivated to update it and resubmit it. If the paper turns out to be poorly received, its authors can rationally abandon it and move on. By reducing resubmission rates, conference program committees will have fewer papers to consider and can do a better job.
When a paper is rejected, and the author receives feedback from the rejecting conference, that feedback would also be in CSPub, presumably (but not necessarily) private to the author. This creates an opportunity for the author to choose to give this feedback to a subsequent program committee, along with a statement about how the previous committees' comments were addressed. This moves the treatment of the manuscript closer to the consistent handling available through the journal process, yet with the speed of the conference process. An anonymous reviewer suggested that rejected papers might be indelibly tagged as such, in public, as a disincentive to authors submitting poor work to conferences. The idea of "negative" metadata, permanently associated with one's name and reputation, would be seen as offensive by many researchers. Certainly, CSPub could support such features if they were desired.
If all of academic computer science scholarship were available in CSPub, a variety of new features would become feasible.
CSPub can also enable new models for how conferences operate. "Unpublished" papers would be easy for program committees to discover on their own and "pull" into a conference. One-time workshops might be built purely around thematically linked, previously unpublished papers.
Journals. In CSPub, a journal is nothing more than an organization that adds metadata notations to papers in the system. As such, anybody can start their own journal for almost no cost. Some journals would have calls for papers, as conferences do, and authors would indicate a submission in their metadata when posting a manuscript. Other "journals" would be nothing more than collections of thematically related papers, perhaps put together by graduate students as part of their related work search. Of course, if a senior academic puts together a collection with a catchy title ("Alice's List of Seminal Papers in Blah-Blah Theory"), and Alice is a highly ranked professor, the collection would help increase the included manuscripts' rankings, both directly, due to Alice's strong personal ranking, and indirectly, by leading more academics to read and cite the papers on Alice's list.
CSPub offers a number of improvements to the journal latency problem. It allows early drafts to be seen and cited, while simultaneously being under review, and it completely eliminates printing latencies. Accepted papers "appear" immediately. CSPub also trivially supports new models, such as the hybrid journal/conference approach being taken by VLDB.7
Without a doubt, the biggest challenge of CSPub is getting the ball rolling. Computer science scholarship is published under a variety of professional organizations including the ACM, IEEE, AAAI, USENIX, ISOC, IACR, and many more. It is enough of a challenge to imagine any one of these organizations moving to the wholesale adoption of a new publication model, much less all of them at once.
The only feasible path is for one organization to develop CSPub for itself and start using it one conference at a time. Initially, anybody could submit a paper, as in arXiv or the Crypto ePrint server, and for the pioneering conferences, this would be the exclusive mechanism for submitting a paper to be considered for inclusion. By making this mandatory, at least for the authors at the pioneer conferences, the system will be populated by those papers and will have its initial users. CSPub's initial implementation could certainly build on the existing arXiv service, which already hosts many CS papers in its Computing Research Repository (CoRR).6 According to Joseph Halpern, who runs CoRR, CoRR grew 35%40% per year for several years running; he anticipates 10% growth (over 7,000 new papers) for 2011. CoRR is increasingly hosting archival papers from major journals and conferences. Elsevier also now allows authors to post accepted journal paper "pre-prints" on the Web and on CoRR. However, Elsevier does not allow authors to redistribute the final, camera-ready version of their work.
Already, many conferences and journals provide free, open access to their publications. All USENIX conference publications are available freely on USENIX's Web site. Similarly, Logical Methods in Computer Science (see http://www.lmcs-online.org) is a paperless journal published under the auspices of the International Federation of Computational Logic, with no cost to publishers or readers. Authors retain their copyright while agreeing to have their work distributed by the journal under a Creative Commons license. LMCS also distributes its publications through CoRR.
Authors who presently serve PDF files of their papers from personal or lab Web pages could incrementally migrate to using CSPub instead. CSPub could easily generate dynamic HTML that can be included in personal Web pages, research group pages, and so forth. By providing such convenient services, academic authors may well upload all of their papers to take advantage of CSPub's features and increase their work's visibility. Inevitably, the switch will occur one research area at a time as CSPub matures and individual communities adopt it.
By providing such convenient services, academic authors may well upload all of their papers to take advantage of CSPub's features and increase their work's visibility.
The largest concern with CSPub would be the loss of revenue from journal subscriptions and digital libraries hosted by our existing professional societies. Regardless, virtually any current manuscript can be found on one of its co-authors' home pages, free of charge for the reader. "Paywalls" between our papers and their readers will inevitably go away. CSPub, by virtue of institutionalizing this practice, would require the ACM, IEEE, and so forth to forgo this income as their authors adopt it. Consequently, conference registration fees will inevitably go up.b However, if we save our institutional libraries from the costs of journal subscriptions, that money could be redirected in many ways including scanning and entering old work into CSPub. Given that the bulk of U.S. computer science research is supported by the National Science Foundation, it is not unreasonable that the NSF could underwrite CSPub.
A related issue is ownership. New manuscripts can adopt a Creative Commons-style license where authors grant CSPub nonexclusive rights to redistribute their work. Older manuscripts often have their copyrights assigned to legacy publishers, who will certainly be reluctant to give up their lucrative franchises. We control our professional societies; we can vote for new policies. Other publishers may well put up a fight or go out of business as their authors abandon them. Ultimately, academic authors are incentivized to have their papers widely read and cited. Cost-free open-access to our manuscripts, whether through CSPub or any other mechanism, is the obvious way to accomplish this.
This Viewpoint began through informal discussions with many of my peers. I would like to thank Drew Dean, Joseph Halpern, Mike Herf, Peter Honeyman, Carol Hutchins, Chris Jermaine, Dave Johnson, Chris Kelty, Eric Rescorla, Moshe Vardi, Suresh Venkatasubramanian, Ellie Young, and the anonymous Communications reviewers for their feedback and commentary. A longer version of this Viewpoint, with detailed citations, appears at http://www.cs.rice.edu/~dwallach/pub/reboot-2010-06-14.pdf.
1. Abbott, A. et al. Metrics: Do metrics matter? Nature 465 (June 2010), 860862; http://www.nature.com/news/2010/100616/full/465860a.html.
2. Birman, K. and Schneider, F.B. Program committee overload in systems. Commun. ACM 52, 5 (May 2009), 3437; http://cacm.acm.org/magazines/2009/5/24644-program-committee-overload-in-systems/fulltext.
3. Crowcroft, J., Keshav, S., and McKeown, N. Scaling the academic publication process to Internet scale. In Workshop on Organizing Workshops, Conferences, and Symposia for Computer Science (WOWCS '08), (San Francisco, CA, Apr. 2008); http://www.usenix.org/events/wowcs08/tech/full_papers/crowcroft/crowcrofthtml/, also reprinted in Commun. ACM 52, 1 (Jan. 2009).
4. Fortnow, L. Time for computer science to grow up. Commun. ACM 52, 8 (Aug. 2009); http://cacm.acm.org/magazines/2009/8/34492-viewpoint-time-for-computer-science-to-grow-up/fulltext.
6. Halpern, J.Y. and Lagoze, C. The computing research repository: Promoting the rapid dissemination and archiving of computer science research. In Proceedings of the Fourth ACM International Conference on Digital Libraries (Berkeley, CA, Aug. 1999); http://arxiv.org/ftp/cs/papers/9812/9812020.pdf.
7. Jagadish, H.V. The conference reviewing crisis and a proposed solution. SIGMOD Record 37, 3 (2008), 4045; http://portal.acm.org/citation.cfm?id=1462582.
8. Korth, H.F. et al. Paper and proposal reviews: Is the process flawed? ACM SIGMOD Record 37, 3 (2008):3639, 2008; http://doi.acm.org/10.1145/1462571.1462581.
9. Vardi, M.Y. Revisiting the publication culture in computing research. Commun. ACM 53, 5 (Mar. 2010), http://cacm.acm.org/magazines/2010/3/76297-revisiting-the-publication-culture-in-computing-research/fulltext.
a. Networking conference statistics are tracked by Kevin Almeroth (http://www.cs.ucsb.edu/~almeroth/conf/stats/), who links to statistics for other disciplines as well.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2011 ACM, Inc.