One of the cool things about being in academia and research is how every now and then you discover how your approaches to new ideas and topics resonate with those of other researchers in the field. This post was inspired by the recent webinar series hosted by the Raspberry Pi Foundation on AI and Machine Learning (AI/ML) in school education. Our approaches to introducing children to AI/ML indeed resonate with those shared by Schulte, Tedre, and others in the RPF seminar series.
Context: This past summer, we embarked on an effort funded by the SaTC-Edu program of the National Science Foundation as part of a new series of projects that explore cybersecurity education in the age of AI. In our project, we are exploring and innovating on how we can teach AI and Machine Learning (ML) to 13-15 year olds through situations/issues set in the context of cybersecurity in ways that "lift the hood" on how ML models are designed, how they they work, and the impact of human decisions in this process.
The goal of our exploratory research is to innovate on learning design and pedagogy to bring together AI and cybersecurity topics, and integrate them in systematic and cogent ways that are accessible to early teen learners. We hope to push the boundaries of AI education in K-12 through developing code examples, abstractions, and coding experiences that help make fundamental ML concepts accessible without requiring mastery of the underlying (often complex) mathematical concepts. Instead of simply playing with AI models, we want early teen learners to really get a sense for the sauce in the ML models and be able to examine the underlying algorithms in order to build deeper understandings and intuitions of how AI/ML works. It is in these ways that we believe our curriculum goes beyond several current AI learning experiences where students play with pre-built models using Google or other platforms, or simply use a library/API that trains a classifier. Useful though such experiences are, they don’t necessarily demystify the process of exactly how the machine "learned" how to classify an image or bot or (anomalous) occurrence.
Here, we share key ideas and intuitions that we believe are foundational to AI/ML learning. In our curricular design, these ideas are a throughline in the progression of activities designed to expose students to a variety of techniques such as rule-based AI, decision trees, classification, case-based reasoning, and generative adversarial neural networks. Students build these intuitions through examples set in the context of cybersecurity issues such as DDoS attacks (detection and prevention), Twitter bots, cyberbullying, credit card fraud, anomaly detection, and deepfakes.
Data, its "features"/"labels," and how the data and these chosen features impact the training of models. In our work, we use NetsBlox, a blocks-based programming environment based on Snap! (that Brian Broll created as part of his doctoral dissertation at Vanderbilt University). NetsBlox is designed to make distributed computing, and other complex concepts, accessible to young learners. One of the coolest aspects of NetsBlox is its extensibility and the ways one is able to integrate web services and APIs into the easy-to-use base block-based programming environment of Snap!. This enables students to explore exciting new capabilities through integrations like RoboScape (for cybersecurity education) and PhoneIoT (for IoT Education) or investigate topical datasets such as COVID-19 rates provided by Johns Hopkins University and climate data from NOAA. NetsBlox also supports integration with the Common Online Data Analysis Platform, or CODAP to facilitate gaining insight into data using the powerful dynamic data exploration techniques that CODAP provides. This is particularly powerful in the context of understanding the impact of data on the training of machine learning models. An example of how we use CODAP to allow learners to explore the canonical iris dataset is shown below. NetsBlox not only embeds the CODAP UI but also allows selected tables to be pulled into the programming environment as multidimensional lists for subsequent coding.
Dynamic data exploration helps the students explore features in a dataset and gain hands-on experience with concepts like linear separability. Easily separable classes, like the orange class shown above, provide an opportunity to teach some of the fundamental machine learning concepts such as feature selection and classification. More challenging classes enable us to introduce probabilistic concepts such as model confidence in classification.
Exploration of possible features around which the data can be classified allow students to then build a classifier using techniques such as decision trees described below.
Machine "Learning" is essentially an optimization problem. It is important for learners to understand the "learning" problem in machine learning as essentially an optimization problem, where an objective, fitness, or error function is defined and the goal of the algorithm is to either maximize or minimize the given function. Although students of this age/ability are not in a position to code complex optimization, we believe they _can_ develop intuitions through carefully selected activities, games, or code examples that bring home the fundamental ways in which optimization works. For example, if abstractions are presented to make gradients accessible without requiring mastery of calculus, gradient descent is a relatively simple concept that can be made accessible to early teen students.
We have designed one such game-like activity in NetsBlox to facilitate gaining intuition about gradient descent. After completing the activity manually, students can write code to automate their search process and discover the relationship between learning rate and the gradient commonly used in gradient descent algorithms. We believe that intuition gained from this type of hands-on experience with these concepts should provide a foundation for rich discussion and critical thinking and extension into questions such as What would optimization mean if the function represented the cost of a product over time? What if it was the error of a machine learning model?
Understanding the relationship between optimization and machine learning empowers students to think critically about AI and bias in ML models (see below), and, importantly, avoid falling prey to widespread anthropomorphization of machine learning and AI.
Adversarial thinking is a useful skill in both cybersecurity and AI/ML. Thinking about how models can be fooled through adversarial examples is one rich application of this skill which challenges the students to critically examine the ML model for vulnerabilities from a cybersecurity perspective. Interpretable models, like decision trees, provide an easy introduction to the topic. Building intuition through hands-on experience can be relatively simple. In our curriculum, we extend the Twitter Bot classification exercise (which classifies a twitter account as a bot (B) or NotBot (NB)) with a follow-up one: Can you construct an example that is classified as "NB"? From there, it is a small step to investigate how to change an existing bot to be misclassified.
Decision tree for classifying a Twitter account as a Bot (B) or NotBot (NB)
This is more difficult for differentiable, black-box models as they depend on an understanding of concepts like differentiation and gradients from calculus. However, with the appropriate abstractions, we believe hands-on experience in this context is also possible with young learners. After developing intuition in simple domains, the fundamental concepts can be used to better understand complex techniques with applications in the real world such as adversarial examples used on images for fooling self-driving cars.
Although adversarial examples are perhaps the most obvious application of adversarial thinking, it is useful far beyond this single topic. In online learning scenarios, users can often influence the training data (either directly or indirectly) of the ML model. Exploring these implications including the robustness of the model to noise in the training set and how it handles biased data provide more opportunities for rich and topical discussion on ethics and bias in AI/ML.
Generalization & Overfitting/Underfitting. The ability of a model to generalize to unseen points is critical when training ML models and is complementary to adversarial examples. Models that generalize poorly are often easy to fool. This is no surprise after exploring the learned decision tree (as shown above); the individual parameters that have been learned may seem somewhat arbitrary. Although new data may not be handcrafted to try to fool the model, we certainly would like to ensure that the model is able to perform reasonably on these points. Enabling students to train both interpretable and black-box machine learning models empowers them to gain hands-on experience with over/underfitting as well as investigate the impact of different data sampling approaches on the resultant models.
Understanding Bias and Critical Interrogation of the Impacts of ML Models. Although AI/ML algorithms are not inherently biased, bias in a dataset or decisions related to feature selection or optimization can have serious consequences when an ML model is put to use in decision-making that impacts people and real-world situations. As the usage of AI/ML models becomes increasingly ubiquitous, the impact of bias and understanding of what a model has in fact learned has become increasingly important. Building intuition through hands-on experiences that "lift the hood" on machine learning enables students to gain a deeper understanding about the impact of the dataset and training algorithm.
When combined with the other key ideas listed above, students are able to have deeper insight about potential causes and impacts of bias. For example, the first key idea facilitates simple early questions about the features used to represent the data points. Learning about generalization can be a catalyst for interrogating the origins of a dataset. How might that impact the under- or overrepresentation of various types of data in the dataset? How will this affect generalization? Viewing machine learning as an optimization problem raises questions about the quantity that is being optimized. How does this compare to the way the model is going to be used when deployed? If we are training the model to predict on historical data, are we sure that the past data is something we want to try to replicate? What if there were social or cultural issues that disenfranchised some demographic?
Our project is in its early stages, and we are now preparing for a series of teacher workshops in December 2021 (some of which coincide with #CSEdWeek) that aim to get teacher inputs on the suitability and age-appropriateness of planned activities and projects. Derek Babb of University of Nebraska, Omaha will help lead our AI & Cybersecurity for Teens camp next summer. Meanwhile, we welcome reader feedback and opportunities to collaborate with others involved in similar work. Please visit our website, email the project team, or contact us directly ([email protected]; [email protected]).