I taught my first course in human-computer interface design last summer. We started out reading Don Norman's Design of Everyday Things. He begins Chapter 5, "Human Error? No, Bad Design" with this quote:
Most industrial accidents are caused by human error: estimates range between 75 and 95 percent. How is that so many people are so incompetent? Answer: They aren't. It's a design problem.
If the number of accidents blamed upon human error were 1 to 5 percent, I might believe that people were at fault. But when the percentage is so high, then clearly other factors must be involved. When something happens this frequently, there must be another underlying factor.
Is Norman right? If failure happens often, is it a design problem, or is it something else? When do we decide that our design is flawed? I’ve been thinking about these questions in two contexts: Programming languages and education.
The error rate in text-based programs written by new students is nearly 100%. Even if you ask first-time students to simply type in completely bug-free code (which is something that I always do to start), the error rate will be well over 50%. New students simply don’t see all the punctuation that is critical to programming but is overlooked in natural language. Your odds of getting working code are higher with blocks-based programs, but that still doesn’t mean that your error rate (getting the computer to successfully do what you wanted it do) is zero. When students are learning, they make errors.
At what point would you decide that your programming language design is flawed? How much error is acceptable when students are learning to program, and how much is due to flaws in the programming language design?
There was a Dagstuhl seminar held a couple of weeks ago that raised these kinds of questions: Evidence About Programmers for Programming Language Design. Andy Ko wrote up a terrific blog post summarizing the five days of meetings — see page here. Andy points out that programming language designers work mostly from mathematical theory and their intuition about programmers. There are researchers calling for stronger empirical evidence, but it’s hard to gather meaningful data about a tool as flexible and as long-lasting as a programming language. Some languages may be difficult to get started with, but are highly productive and beloved once the programmer gets past the initial learning challenges.
The problem is that our intuition about what’s difficult in programming is often wrong. The best example I have of our poor intuition about programming error is Neil Brown’s excellent paper (with Amjad Altadmri) that compared what errors Java educators think are most common, and what errors actually occur for over 100K students using BlueJ gathered through the Blackbox (see link to ACM DL here). There is some correlation, but educator experience didn’t make people better at estimating student errors.
The relationship between educator accord (i.e. agreement with the Blackbox data) and years of being an educator is shown in Figure 4. The Spearman’s ρ correlation was not significant at the 5% level, ρ = − 0 .180, p = 0 .202. As a follow up analysis, we also examined whether years of experience teaching introductory programming or teaching introductory programming in Java had an effect (alpha corrected to 0.025 for multiple comparisons). The result for correlating years spent teaching introductory programming in any language with accord was not significant, ρ = −0.151, p = 0.192, and neither was the result correlating years spent teaching introductory Java programming with accord: ρ = 0 .04, p = 0 .972. Thus, there was no effect of educator experience (in any measure we tried) on an educator’s level of agreement with the Blackbox data.
It’s hard to design a programming language well. Some error is inevitable. We over-rely on intuition though, because even educators with many years of experience have poor intuition about what students get wrong in programming. We need a better theory for design of programming languages, and it needs to be grounded in empirical data.
We have lots of education systems with more "error" than 1 to 5 percent. One kind of education error is the failure rate, which is the rate of students not completing the course with a passing grade or without successfully mastering some end-of-course assessment. MOOCs typically have a failure rate of around 90%. The failure rate in introductory CS courses is often in the 30-50% range, as Bennedsen and Caspersen found in 2007 (see paper here). (They are currently replicating their study, and you can contribute your school’s data here.)
Why don't we critique education systems in the same way as Norman critiques user interfaces? What amount of error/failure should we tolerate in an educational system before we call the system badly designed?
Don Norman’s goals and the goals of an educator are inherently different. Norman is aiming to design computing systems that users can successfully use to achieve their goals, while educational systems have an explicit purpose to change the students and their goals. Human-computer interface designers are trying to make the user experience comfortable, predictable, and productive. Educators need to make the user’s experience effortful to lead to successful learning and more errors can be crucial for more learning.
Still, we have to decide when a course is badly designed and needs fixing. Again, we have to use empirical data. Carl Wieman has gathered data from multiple universities in multiple institutions showing that we can teach STEM better than we currently do (see review of his new book here). Leo Porter, Cynthia Lee, and Beth Simon have shown that they can halve failure rates (i.e., dramatically reduce error) using peer instruction, and their evidence comes from multiple CS courses at multiple institutions (see paper here). There are better ways to teach. The challenge is when do we know that our current design is flawed.
Conclusion: Humans, not Econs
The general answer is what Norman writes in the same chapter:
"The problem with the designs of most engineers is that they are too logical. We have to accept human behavior the way it is, not the way we would wish it to be.
People who use our programming languages and take our classes are Humans, not Econs. They’re not logical. Intuition and rationality is not going to accurately predict their behavior. We have to measure what’s going on, to figure out when we need a better approach.