Data science is a new interdisciplinary field of research that focuses on extracting value from data, integrating knowledge and methods from computer science, mathematics and statistics, and an application domain. Machine learning is the field created at the intersection of computer science and statistics, and it has many applications in data science when the application domain is taken into consideration.
From a historical perspective, machine learning was considered, for the past 50 years or so, as part of artificial intelligence. It was taught mainly in computer science departments to scientists and engineers and the focus was placed, accordingly, on the mathematical and algorithmic aspects of machine learning, regardless of the application domain. Thus, although machine learning deals also with statistics, which focuses on data and does consider the application domain, up until recently, most machine learning activities took place in the context of computer science, where it began, and which focuses traditionally on algorithms.
Two processes, however, have taken place in parallel to the accelerated growth of data science in the last decade. First, machine learning, as a sub-field of data science, flourished and its implementation and use in a variety of disciplines began. As a result, researchers realized that the application domain cannot be neglected and that it should be considered in any data science problem-solving situation. For example, it is essential to know the meaning of the data in the context of the application domain to prepare the data for the training phase and to evaluate the algorithm's performance based on the meaning of the results in the real world. Second, a variety of population began taking machine learning courses, people for whom, as experts in their disciplines, it is inherent and essential to consider the application domain in data science problem-solving processes.
Teaching machine learning to such a vast population, while neglecting the application domain as it is taught traditionally in computer science departments, is misleading. Such a teaching approach guides learners to ignore the application domain even when it is relevant for the modeling phase of data science, in which machine learning is largely used. In other words, when students learn machine learning without considering the application domain, they may get the impression that machine learning should be applied this way and become accustomed to ignoring the application domain. This habit of mind may, in turn, influence their future professional decision-making processes.
For example, consider a researcher in the discipline of social work who took a machine learning course but was not educated to consider the application domain in the interpretation of the data analysis. The researcher is now asked to recommend an intervention program. Since the researcher was not educated to consider the application domain, he or she may ignore crucial factors in this examination and rely only on the recommendation of the machine learning algorithm.
Other examples are education and transportation, fields that everyone feels they understand. As a result of a machine learning education that does not consider the application domain, non-experts in these fields may assume that they have enough knowledge in these fields, and may not understand the crucial role that professional knowledge in these fields plays in decision-making processes that are based on the examination of the output of machine learning algorithms. This phenomenon is further highlighted when medical doctors or food engineers, for example, are not trained or educated in machine learning courses to criticize the results of machine learning algorithms based on their professionalism in medicine and food engineering, respectively.
We therefore propose to stop teaching machine learning courses to populations whose core discipline is neither computer science nor mathematics and statistics. Instead, these populations should learn machine learning only in the context of data science, which repeatedly highlights the relevance of the application domain in each stage of the data science lifecycle and, specifically, in the modeling phase in which machine learning plays an important role.
If our suggestion, to offer machine learning courses in a variety of disciplines only in the context of data science, is accepted, not only will the interdisciplinarity of data science be highlighted, but the realization that the application domain cannot be neglected in data science problem-solving processes will also be further illuminated.
Don't teach machine learning! Teach data science!
Orit Hazzan is a professor in the Technion's Department of Education in Science and Technology; her research focuses on computer science, software engineering, and data science education. Koby Mike is a Ph.D. student at the Technion's Department of Education in Science and Technology; his research focuses on data science education.