The education and research enterprise is leveraging opportunities to accelerate science and discovery offered by computational and data-enabled technologies, often broadly referred to as data science. Ten years ago, we wrote that an "accurate image [of a scientific researcher] depicts a computer jockey working at all hours to launch experiments on computer servers."8 Since then, the use of data and computation has exploded in academic and industry research, and interest in data science is widespread in universities and institutions. Two key questions emerge for the research enterprise: How to train the next generation of researchers and scientists in the deeply computational and data-driven research methods and processes they will need and use? and How to support the use of these methods and processes to advance research and discovery across disparate disciplines and, in turn, define data science as a scientific discipline in its own right? An identifiable discipline of data science would encourage and reward research that fosters the continued development of computational and data-enabled methods and their successful integration into research and dissemination pipelines, as well as accelerating the generation of reliable knowledge from data science.
This article offers an intellectual framing to address these two key questions—called the Data Science Life Cycle—intended to aide decision makers in institutions, policy makers and funding agency leadership, as well as data science researchers and curriculum developers. The Data Science Life Cycle introduced here can be used as a framing principle to guide decision making in a variety of educational settings, pointing the way on topics such as: whether to develop new data science courses (and which ones) or rely on existing course offerings or a mix of both; whether to design data science curricula across existing degree granting units or work within them; how to relate new degrees and programmatic initiatives to ongoing research in data science and encourage the development of a recognized research area in data science itself; and how to prioritize support for data science research across a variety of disciplinary domains. These can be difficult questions from an implementation point of view since university governance structures typically separate disciplines into effective siloes, with self-contained evaluation, degree-granting, and decision-making authority. Data science presents as a cross cutting methodological effort with the needs of a full-fledged science including: communities for idea sharing, review, and assessment; standards for re-producibility and replicability; journals and/or conferences; vehicles for disciplinary leadership and advancement; an understanding of its scope; and, broadly agreed-upon core curricula and subjects for training the next generation of researchers and educators.