Productive Failure (PF) “is an instructional approach where students initially engage with problems targeting concepts they have not yet learned, followed by a phase of consolidation where these concepts are taught,” as put by Hussel Suriyaarachchi, Paul Denny, and Suranga Nanayakkara in their SIGCSE 2025 paper, “Investigating the Use of Productive Failure as a Design Paradigm for Learning Introductory Python Programming” (“the paper” in the rest of this post).
In contrast, under the more traditional approach of Direct Instruction (DI), students are taught a concept before they attempt to solve a problem using that concept. Of course, people debate over which approach is more effective. The paper cites some evidence for PF but also notes that “considering the infancy of PF in computing education and its known potential for robust learning, there is a need for more empirical work to reliably determine its efficacy.”
The paper itself contains some experimental evidence in favor of PF, but unfortunately, the number of participants was quite small. Basically, there were 16 students taking a CS1 course, and each student was randomly assigned to either the DI group (n = 9) or the PF group (n = 7). In both groups, students did the following (see figure below):
Heart Rate Variability (HRV) baseline (more on that later),
a lesson — for the DI group: a 15-minute instructional unit followed a 30-minute programming task (“Weather”); for the PF group: the same thing, but in reverse order,
a programming task (“Heart-rate”) isomorphic to the one from Step 2,
(two weeks later) the same task from Step 3,
another lesson, in the format they didn’t experience in Step 2, and
a survey about their experiences with DI and PF.

The results are summarized in the table below:

The DI group (again, with the caveat that n is small) performed much better than the PF group on the initial task, which is unsurprising since the former received instruction before doing the task. But two weeks later, the PF group outperformed the DI group, which led the authors to posit, “By grappling with novel problem-solving tasks before receiving instruction, students may be more likely to internalise concepts and apply them effectively in the future.”
HRV and Cognitive Load
The authors also wanted to measure participants’ cognitive load since “experiencing a reduced cognitive load tends to improve student learning.” To do this, they asked participants to wear a sensor on their forearm that tracked their Heart Rate Variability (HRV) throughout the experiment.1 According to the authors, who cite these two papers, “HRV is a reliable marker of cognitive load, with a decrease in HRV indicating an increase in cognitive load.”
Based on the HRV data, both groups (DI and PF) experienced a reduction in cognitive load when they moved from the practice task to the programming task (see figure below). But, unsurprisingly, the PF group experienced a larger reduction than the DI group because their initial cognitive load (i.e., during the practice test) was higher.

The authors’ discussion of this result is interesting:
The lowered cognitive load induced by DI and PF in the respective tasks after instruction likely contributed to the learning outcomes discussed in Section 5.1.3. Notably, students who received instruction after PF had the highest decline in cognitive load and consequently demonstrated the best overall learning performance. As hypothesised in Kapur et al.’s theory of Productive Failure, appropriately subjecting students to an initially heavy cognitive load may be fruitful for learning [12].
When I first heard about cognitive load, my impression was that it should be minimized at all times. But unless I’m mistaken, the authors are suggesting that increasing students’ (initial) cognitive load could be beneficial because it allows for a larger reduction in cognitive load. Surely, this doesn’t mean that teachers should introduce every topic by presenting a super complicated example filled with unnecessary details! But as they say, some extra load “may be fruitful.”
Student Perceptions
Finally, what did the students think about PF and DI? The paper states, “Despite indications supporting more robust learning with PF, sentiment was more or less evenly split between the approaches, with 9 of the 16 students choosing PF as their favoured strategy.” Some preferred the efficiency of DI (i.e., they could “reach the objective faster”) and thought it inspired confidence, while others preferred the effortful nature of PF (“try and fail was a much more effective method”) and thought it was more rewarding. Students’ preferences and teaching effectiveness are not always perfectly aligned, and I think finding an appropriate balance is a perpetual challenge.
Participants wore a heart-rate sensor while doing a programming task about heart rates. Very meta!