Last month, “Evaluating a new exam question: Parsons problems,” a 2008 paper by Paul Denny, Andrew Luxton-Reilly, and Beth Simon, received the 2024 ACM SIGCSE Test of Time Award. In this post, I’ll summarize the paper and share some thoughts.
The paper proposes adding Parsons problems — named after Dale Parsons, who co-authored this 2006 paper with Patricia Haden — to CS1 exams. A Parsons problem is the following: the student is given a set of code fragments and a problem, and the student must select and order a subset of the fragments into a correct solution for the problem. (People have studied various variants of Parsons problems; here is a survey from 2020.)
Denny et al. report on two studies about Parsons problems: the first was an interview of 13 students who recently took CS1, and the second was a quantitative analysis of 74 students’ performance on three tasks: code writing, code tracing, and a Parsons problem.
Study 1: Interviews
The 13 students including 6 women, were North American undergraduates “approximately four weeks into a second computing course following a Java, objects-first CS1.” One group wrote code while thinking aloud, solved Parsons problems, and wrote code again; the other did the same except instead of writing code (at the beginning and end), they traced code.
Clearly, there wasn’t a ton of data, but the authors still had some observations. I won’t describe them all; here are a few that stuck out to me:
Many students, especially the ones who performed better, strongly preferred writing their own code to solving Parsons problems. They felt that Parsons problems reduced their “freedom” by forcing them down a path toward one particular solution.
Students had “a surprisingly strong uniformity of dislike towards tracing questions on exams” because tracing code gets “annoying” and “tedious.” To them, Parsons problems are “not so bad” compared to tracing code. Perhaps relatedly, they weren’t great at the code tracing problems, and the correlation between performance on the those and Parsons problems was quite low. One student even “refused to trace the code.”
Some students, even after correctly solving a Parsons problem, did not fully understand the solution to the problem. For example, they knew that “(r+c)%2==0” is a line in the correct solution, but only because they traced an iteration or two, not because they truly understood what the line does.
So far, the case for Parsons problems does not strike me as extremely strong: students thought they were were worse than code writing and better than code tracing, but that’s clearing a low bar. (One student said, “In the real world, you wouldn't trace the numbers, you'd run the program.”) And based on the authors’ observations, the Parsons problems were not effective at imparting “a full comprehension” of the code.
Study 2: CS1 Exam
As you can probably guess, the second study paints a rosier picture of Parsons problems. This one had 74 students in CS1 at the University of Auckland take a final exam worth 60% of their final grade.1 The exam had 11 questions, including one on code writing, one on code tracing, and a Parsons problem. The three questions were conceptually related and all of them involved looping through an array. Figure 1 from the paper is the Parsons problem from the exam.
One benefit of Parsons problems is that they help instructors categorize errors by type. For example, in the Parsons problem above, some pairs (2, 4, 6) contain a syntax error while others (1, 3, 5) contain a logic error. If instructors observe that logic errors are more common than syntax errors, then that can inform their teaching. In contrast, in a solution to a code writing problem, it’s often unclear if an error is a minor typo or something more significant.
Parsons problems also allow students who struggle with the code writing problem to demonstrate some understanding of the material. In Figure 9 from the paper, we see that students in the lower quartile (according to full final exam score) scored higher in the Parsons problem than in the code writing.
Furthermore, among (1) code writing, (2) code tracing, and (3) the Parsons problem, the two that had the highest correlation in performance were (1) and (3). To the authors, the r-squared values “suggest that code writing and Parsons problems require the application of similar skills, whereas code tracing requires a quite different set of skills.” If that’s true, then Parsons problems would be a reasonable substitute (or supplement) for code writing on exams, with the benefit that grading the former is easier and less subjective. Indeed, the authors' rubric for the Parsons problems was much simpler than their rubric for code writing, and it resulted in fewer discrepancies among the three authors after they individually graded the submissions.
The authors acknowledge that students can guess their way through a Parsons problem, but state that this effect can be ameliorated by asking multiple Parsons problems. They also point out that in their interviews, the students who couldn’t begin answering the code writing question performed better on the Parsons problem than they would by purely guessing, which again suggests that Parsons problems allow students who struggle with code writing to demonstrate some knowledge of the material.
Thoughts
The challenges addressed by Parsons problems are familiar to me. In my experience, it’s unusual for a student to make partial progress when solving a problem in mathematics or a related subject like algorithms.2 Instead, it often feels like “you either get it or you don’t,” which could discourage students and reduce their self-efficacy. Parsons problems could help mitigate these effects by giving students an additional opportunity students to display some understanding.
Additionally, I’ve often felt that grading open-ended problems (e.g., “Design and analyze an algorithm that solves the following problem…”) can take quite some time and energy, and it’s impossible to make a completely objective rubric. Since Parsons problems have a smaller number of discrete outcomes, they’re easier to grade and are less prone to subjectivity.
So I think it’d be beneficial to incorporate Parsons problems in other CS courses such as algorithms. In fact, the paper suggests mathematical proofs as an “area of possible potential similarity.” Additionally, in a 2023 paper by Erickson et al. that documents their effort to “develop auto-graded scaffolding exercises for an upper-division theoretical computer science class,” the authors describe one of their components as “similar to Parsons programming problems but at the level of pseudocode instead of executable code.”
Finally, I find results on Parsons problems interesting from a more psychological level because they question the intuition that the best way to measure students’ ability at X (e.g., writing code) is to ask them to do X (e.g., on an exam). It makes me wonder if we can similarly question the intuition that the best way to improve at X is to do X.
For example, the scores on Problem A2 of the 2022 Putnam Mathematical Competition were quite bimodal: out of 10 points, 114 people scored 10, 124 people scored 0, and only 37 people scored 4-6 points.