The effect of embedded structures on cognitive load for novice learners during block-based code comprehension

Hao, Xiaoxin; Xu, Zhiyi; Guo, Mingyue; Hu, Yuzheng; Geng, Fengji

doi:10.1186/s40594-023-00432-9

Research
Open access
Published: 15 June 2023

The effect of embedded structures on cognitive load for novice learners during block-based code comprehension

Xiaoxin Hao¹,
Zhiyi Xu¹,
Mingyue Guo¹,
Yuzheng Hu³ &
…
Fengji Geng ORCID: orcid.org/0000-0003-2391-0579^1,2

International Journal of STEM Education volume 10, Article number: 42 (2023) Cite this article

2037 Accesses
2 Citations
2 Altmetric
Metrics details

Abstract

Background

Coding has become an integral part of STEM education. However, novice learners face difficulties in processing codes within embedded structures (also termed nested structures). This study aimed to investigate the cognitive mechanism underlying the processing of embedded coding structures based on hierarchical complexity theory, which suggests that more complex hierarchies are involved in embedded versus sequential coding structures. Hierarchical processing is expected to place a great load on the working memory system to maintain, update, and manipulate information. We therefore examined the difference in cognitive load induced by embedded versus sequential structures, and the relations between the difference in cognitive load and working memory capacity.

Results

The results of Experiment 1 did not fully support our hypotheses, possibly due to the unexpected use of cognitive strategies and the way stimuli were presented. With these factors well controlled, a new paradigm was designed in Experiment 2. Results indicate that the cognitive load, as measured by the accuracy and response times of a code comprehension task, was greater in embedded versus sequential conditions. Additionally, the extra cognitive load induced by embedded coding structures was significantly related to working memory capacity.

Conclusions

The findings of these analyses suggest that processing embedded coding structures exerts great demands on the working memory system to maintain and manipulate hierarchical information. It is therefore important to provide scaffolding strategies to help novice learners process codes across different hierarchical levels within embedded coding structures.

Introduction

Coding has become an integral part of STEM education as it not only supports the development of technical proficiency (the “T” component), but also enables the interdisciplinary connections with science, engineering, and mathematics (Liu & Schunn, 2020; Tucker-Raymond et al., 2019; Ye et al., 2023). Computational thinking (CT), primarily facilitated through coding education, is recognized as a trans-disciplinary competency that empowers individuals to address real-life problems and confront challenges within the STEM domains (Li et al., 2020a, 2020b; Ntemngwa & Oliver, 2018; So, 2023). Recently, block-based coding languages (e.g., Scratch) have become increasingly popular because they allow students to drag and drop code commands, reducing the burden of dealing with complex syntax involved in text-based programming (Hu et al., 2021; Weintrop, 2019; Xu et al., 2019). However, novice learners still face many challenges when learning block-based coding (Qian et al., 2020; Wang et al., 2021; Wiggins et al., 2021). One such challenge regards processing nested structures, where one control structure (e.g., repeat) is placed inside another (Fig. 1). Such structures are also referred to as embedded structures. Novice learners generally have greater difficulty in using nested structures compared to sequential structures that organize two or more control structures in a flat form (Fig. 1) (Bers et al., 2019; Kelleher & Hnin, 2019; Mladenović et al., 2018; Yamashita et al., 2016). To better understand and address this challenge, this study amis to explore the cognitive mechanism underlying the processing of nested coding structures.

Theoretical framework

According to hierarchical complexity theory (Commons, 2007; Commons et al., 1998), nested and sequential control structures differ in their hierarchical complexity and horizontal complexity, which can be clearly depicted using nodes and edges rooted in graph theory (Uddén et al., 2020; West, 2001). As shown in Fig. 1, there are two types of nodes. The first type refers to commands (e.g., “Transform color”) that explicitly specify operations (represented by ●), while the second type represents commands (e.g., “Repeat 2 times”) that indicate the number of times to execute each individual command within a loop body (represented by ○). Hierarchy is formed when the latter type of commands are placed at the higher level to coordinate the former type of commands at the lower level.

Processing coding structures with multiple hierarchical levels is expected to generate substantial cognitive load, which refers to the cognitive demands imposed on learners’ working memory system (Anmarkrud et al., 2019; Paas & Van Merriënboer, 1994). Based on previous studies (Badre & Nee, 2018; O’Reilly & Frank, 2006), there are three critical cognitive demands associated with processing hierarchical structures. The first demand involves rapid encoding of information at the lower levels of a hierarchy (e.g., the commands represented by • in Fig. 1). The second demand involves robust maintenance of information at the higher levels of a hierarchy (e.g., commands as represented by ○ in Fig. 1). The third demand involves selective updating of specific information in working memory while simultaneously maintaining others across different hierarchical levels (Murty et al., 2011). For example, when executing codes within nested structures, participants need to selectively update the number of repetition times for the inner loop while simultaneously maintaining the repetitions of the outer loop.

Nested structures possess greater hierarchical complexity than sequential ones when other factors are well-matched because they organize commands across more hierarchical levels (Fig. 1). Thus, executing commands in nested structures necessitates more iterative switches between different hierarchical levels compared to sequential structures. The execution of these iterative switches engages the working memory system to maintain and selectively update hierarchical information (e.g., the repetition times for the inner and outer loops). In contrast, sequential structures present greater horizontal complexity than nested structures (Fig. 1). While processing horizontal complexity also consumes working memory resources, such demand primarily arises from the rapid encoding of stimuli rather than the maintenance and updating required for processing hierarchical complexity. Therefore, we aimed to test whether processing nested coding structures compared to sequential structures would result in greater cognitive load due to the different demands on the working memory system (Prediction 1). Meanwhile, as there are great individual variations in working memory capacity (Baddeley, 1992; Barrett et al., 2004), we also predicted that the amount of extra cognitive load generated by processing nested versus sequential coding structures was significantly related to individual working memory capacity (Prediction 2).

In addition, we further examined the extra cognitive load induced by greater horizontal complexity in both nested and sequential conditions. A different number of code commands were inserted in these two conditions to manipulate horizontal complexity. Unlike hierarchical complexity, processing horizontal complexity mainly poses cognitive demands on rapid encoding. Therefore, we predicted that increasing the number of inserted commands would lead to extra cognitive load (Prediction 3). However, such cognitive load would not be associated with working memory capacity, as measured by tasks emphasizing the maintenance and selective updating of stimuli information (Prediction 4). Moreover, the rapid encoding of more inserted commands may interfere with information maintenance and updating, especially for nested conditions. Accordingly, we predicted that the difference in cognitive load between nested and sequential conditions would be modulated by the number of inserted commands (Prediction 5).

In summary, testing the aforementioned predictions would enhance our understanding of how individuals process nested structures and provide valuable insights to guide the teaching and learning of coding. Novice learners often face challenges in learning nested coding structures due to the complex hierarchical information involved. Processing such hierarchical complexity is thought to impose significant demands on the working memory system for information maintenance and manipulation, thus leading to increased cognitive load. Hence, effective teaching or learning strategies should be incorporated into educational practices to support novice learners in processing the hierarchical information involved in nested coding structures.

Literature review

Nested structures exist not only in coding, but also in many other domains, including natural language, artificial grammar, and music (Hochmann et al., 2008; Koelsch et al., 2013; Lakretz et al., 2020). Studies in these domains have widely investigated the cognitive and neural mechanisms involved in the processing of nested structures. The current study mainly drew on the findings in these fields to investigate the cognitive mechanism underlying novices’ comprehension of block-based codes within nested structures. To be consistent with the research in other domains, the term “nested structures” is replaced by the term “embedded structures” below.

Embedded structures in the domain of natural language

In the domain of natural language, many studies have focused on the difficulty in processing embedded sentences. Behavioral studies have consistently found that processing sentences in embedded structures (e.g., [The boy [the girl chased] kicked the ball]) is more difficult than processing nonembedded sentences (e.g., The boy kicked the ball on the grass), as evidenced by slower responses (Holmes et al., 1987) and lower accuracy (Opitz & Friederici, 2007). Additionally, neurological studies indicated that the processing of embedded sentences, compared to nonembedded sentences, activated the left inferior frontal gyrus to a greater extent, suggesting increased cognitive demands (Meyer & Friederici, 2016; Shetreet et al., 2009). Furthermore, cognitive efforts induced by other factors involved in embedded sentences have been examined. For example, a neuroimaging study compared the processing of embedded sentences with varying dependency distance, which signifies the distance between the subject noun and its verb in the main sentence. The results showed that processing embedded sentences with a long dependency distance (e.g., “Maria who loved Hans who was good looking kissed Johann”) enhanced the functional coupling between the left inferior frontal gyrus and other brain regions compared to embedded sentences with a short dependency distance (e.g., “Maria who cried kissed Johann and that was yesterday night”). This finding suggests that processing long embedded sentences is so demanding that greater interaction is required between different brain regions (Makuuchi et al., 2009).

Embedded structures in the domain of grammar learning

In the domain of artificial grammar, the neural mechanism underlying the processing of embedded structures has been widely explored using two types of symbolic sequences: nonembedded sequences following the adjacent dependency rule (AB)ⁿ (e.g., A1B1A2B2) and embedded sequences following the hierarchical dependency rule AⁿBⁿ (e.g., A1A2B2B1) (Fitch & Hauser, 2004; Levelt, 2020; Perruchet & Rey, 2005; Poletiek et al., 2021). For example, an electrophysiology study used auditory sequences organized according to the AⁿBⁿ rule to measure infants’ ability to process embedded structures with different levels of complexity: 5 versus 7 tones. Each level included standard sequences conforming to the embedded rule (e.g., A1A2CB2B1) and deviant sequences violating the embedded rule (e.g., A1A2CB1B2). The results showed that mismatch responses to deviant tones within the 7-tone embedded sequences occurred approximately 90 ms later than those within the 5-tone embedded sequences, indicating that processing embedded structures with greater complexity recruited more cognitive resources compared to those with less complexity (Winkler et al., 2018).

Embedded structures in the music domain

In the music domain, previous studies have also explored the cognitive complexity of embedded transposition chords (Koelsch et al., 2013; Ma et al., 2018). For example, an electrophysiology study separately compared the difference in neural responses of music experts and nonexperts when processing musical chords with and without embedded transposition (Ma et al., 2018). The results indicated that nonexperts exhibited larger amplitudes of the early right anterior negativity (ERAN) and the late N5 components when processing embedded chords compared to nonembedded chords, suggesting that the difficulty in interpreting embedded structures appeared at both early (i.e., a larger ERAN) and late (i.e., a larger N5) processing stages. In contrast, experts showed significant differences between embedded and nonembedded conditions in beta activity, which have been regarded as an indicator of top-down cognitive effort (Bressler & Richter, 2015; Wang et al., 2012). These findings suggest differences in the processing of embedded musical chords between experts and nonexperts.

Embedded structures in the programming domain

In the programming domain, the difficulty in processing embedded coding structures has been repeatedly observed (Asenov et al., 2016; Cetin, 2015; Ginat, 2004; Kelleher & Hnin, 2019). For example, when processing embedded structures, one common mistake made by students was that they ignored the embedded relations between inner and outer repeats but just executed them separately (Izu et al., 2016; Mladenović et al., 2018). Another study analyzed the code scripts generated by novice learners to solve computational problems (Chao, 2016). The results indicated that novices preferred sequential over embedded control structures. Once embedded structures were involved, they debugged the code scripts more frequently, suggesting more errors and greater difficulty.

In addition, a study conducted by Cetin (2015) indicated that novice learners experienced different stages when learning nested loops. In the early stage, students tend to execute each command within a loop explicitly. As they advanced to the late stage, students can conceptualize all commands within a loop as a single function or procedure, thereby eliminating the need for step-by-step execution to obtain an output. Furthermore, given the difficulty in learning embedded coding structures, teaching or learning strategies have been proposed to help novice learners (Cetin, 2020; Yamashita et al., 2016). However, these strategies are not yet well grounded in theory due to the limited understanding of the cognitive mechanism underlying embedded coding structures processing.

The current study

Based on hierarchical complexity theory (Commons, 2007; Commons et al., 1998) and previous studies in other domains (e.g., Makuuchi et al., 2009; Winkler et al., 2018), this study aimed to investigate the cognitive mechanism underlying the comprehension of embedded coding structures among novice learners. Two experiments were conducted in which participants were required to perform a code comprehension task that incorporated both embedded and sequential conditions, each with a different number of commands inserted. Consistent with previous studies (Brünken et al., 2010; Paas et al., 2003), the cognitive load generated in each condition was measured based on task performance, as indexed by accuracy and response times. Furthermore, we measured individual working memory capacity using a behavioral task that targets the components of maintenance and updating. In each experiment, we used these measures to test the three hypotheses derived from the predictions in the Theoretical Framework.

H1

Novice learners would exhibit slower responses and lower accuracy in embedded versus sequential conditions, as well as in the conditions with more versus fewer inserted commands.

H2

Working memory capacity would be negatively related to the differences in response times and accuracy between embedded and sequential conditions, but unrelated to the differences in response times and accuracy caused by the increasing inserted commands.

H3

The differences in response times and accuracy between embedded and sequential conditions would be more significant when there were more inserted commands.

Experiment 1

Method

Participants

Experiment 1 involved a total of 73 participants (mean age = 21.570 years, SD = 1.930, 49 females) who were recruited from Zhejiang University, which is a top-ranked comprehensive university in China (https://www.topuniversities.com/university-rankings/world-university-rankings/2023). Among 71 participants who reported their coding experience, 31 students had never learned coding, and the others had more or less coding learning experience (two students: < 1 month, ten: 1–3 months, eleven: 3–6 months, four: 6–12 months, ten: > 12 months). Finally, 58 participants were included in statistical analyses after excluding 15 participants (i.e., two students did not report coding experience, ten students learned coding longer than 12 months, and three students failed to pass the practice). The effect of prior coding experience has been examined in Additional file 1. All participants signed the informed consent form before participating in the experiment and were reimbursed for their time and travel. This study was approved by the Research Ethics Committee of Zhejiang University.

Code comprehension task

In this task, we created two experimental conditions by organizing two repeat blocks in an embedded or sequential form. Except for this difference, the two conditions were exactly matched in other dimensions (e.g., the number of inserted commands and repeat blocks). As shown in Fig. 2, the sequential structures were composed of two adjacent repeat blocks, with one or two inserted commands placed outside the repeat blocks. In contrast, the embedded structures were designed to nest one repeat block within another, with one or two commands inserted between the outer and inner repeats (Fig. 2). Each command (i.e., “Transform color”, “Transform shape”, or “Number ± 1, 2, 3”) with a size of 2.8 cm × 0.5 cm was displayed to instruct the transformation in the color (i.e., red, blue, yellow, green, and purple), shape (i.e., alternation of color parts), or number (i.e., 1–15) dimension.

Corresponding to the commands within each code snippet displayed on the left side of a 14-in. screen with a resolution of 1920 × 1080 pixels, an initial circle and 14–16 transformed circles, each with an area of 4 cm², were displayed on the right side of the screen (Fig. 2). These transformed circles were arranged into 4 rows. Among them, there were 3 or 4 probing errors, suggesting that the changed dimension (i.e., color, shape or number) in the new circle was inconsistent with that instructed by the command. For example, when the command was “Transform color”, the transformation applied to the new circle occurred in the number dimension. The adjacent circles were designed to differ only in one dimension to avoid obvious errors that could be detected without processing commands. Additionally, probing errors were not presented consecutively. Participants were instructed to identify all probe errors by checking the boxes underneath the circles.

The task was programmed using E-prime 3.0 software (https://www.eprime.info/). Overall, the task used a 2 (Structure: sequential versus embedded) × 2 (Inserted Command: 1 versus 2) experimental design that yielded 4 types of blocks. Each block included 8 trials. The code snippet in each trial was randomly selected from 16 predesigned code snippets. Participants were asked to respond as fast and accurately as possible within 50 s. After executing all commands in a trial, participants just clicked the “Next” button to proceed to the next trial. The cognitive load in each condition was quantified by the average accuracy of all trials and the average response times of trials with no error.

The experimental procedure for the code comprehension task consisted of three phases. During the learning phase, participants watched an instructional video where the instructor illustrated the execution order of code commands within different structures and provided an example to explain the task. Then, participants practiced the code comprehension task. In total, they practiced 8 trials with 2 in each condition. Only when the practice accuracy was greater than 80% would participants enter the formal testing. Finally, participants performed the formal test with short breaks between different conditions.

Working memory task

We used the n-back task to measure working memory because this task specifically measures the maintenance and selective updating of information in mind (Gajewski et al., 2018; Rac-Lubashevsky & Kessler, 2016). In this task, a series of letters (i.e., A, B, C, D, E, F, G, and H) were presented one by one on the screen (Braver et al., 1997). Participants were asked to memorize the letters and press the space bar when the current letter was the same as the first one (i.e., n = 0) or the one presented n trials ago (e.g., n = 3), as shown in Fig. 3. After learning these rules, participants performed the practice test. Only when they correctly identified more than 3 out of 6 target letters with no more than 2 nontarget letters inaccurately responded at each level would they enter the formal test. In the formal test, the 0-back level contained 10 blocks with 11 trials in each block, whereas the 3-back level contained 10 blocks with 12 trials in each block. The percentage of targets that needed responses in each block was 27.2% for the 0-back level and 25% for the 3-back level. Each trial lasted 2000 ms with a stimulus duration of 500 ms and an interval between stimuli of 1500 ms.

Individuals’ working memory capacity was indexed by the difference in discriminability (d-prime) and response times between 0- and 3-back conditions. For each condition, d-prime was calculated using the following formula (Haatveit et al., 2010): d-prime = Z (HIT) − Z (FA). The HIT refers to the proportion of targets that are correctly identified, whereas the false alarm (FA) is the proportion of nontargets that are incorrectly identified as targets. Additionally, the mean response times of correctly identified targets were calculated for each condition. Finally, working memory performance was calculated using the difference in d-prime (3-back minus 0-back) and mean response times (0-back minus 3-back), which was separately represented as WM_dprime and WM_rt below. In this study, greater WM_dprime and WM_rt indicated better working memory capacity.

Statistical analyses

To compare the difference in cognitive load between conditions, a series of linear mixed-effects models were built with performance in the code comprehension task (i.e., accuracy or response times) as the dependent variable and with Structure (sequential versus embedded) and Inserted Command (1 versus 2) as independent variables. The basic models only contained the main effects of Structure and Inserted Command. Such basic models were further compared with the new ones that contained both the main effects and the Structure × Inserted Command interaction. The new models could be chosen only if the Akaike’s Information Criterion (AIC) value decreased more than 2 compared to the basic models. For all models, the Satterthwaite approximation was adopted to estimate the degrees of freedom. If there was a significant Structure × Inserted Command interaction, further analyses were conducted to interpret this interaction.

Additionally, to test the relations between working memory and cognitive load, we constructed separate models for working memory as indexed by discriminability (WM_dprime) and response times (WM_rt) (see the details about these models in Additional file 5). Specifically, the basic models contained the main effects of Structure, Inserted Command, and Working Memory and the Structure × Inserted Command interaction. In addition to these variables, the new models further included the Structure × Working Memory and Inserted Command × Working Memory interactions. We selected the basic or new models based on the AIC value mentioned above. Further analyses were conducted if there was any significant interaction involving working memory. The significance level for results involving working memory was adjusted to 0.025 to reduce the Type I error rates.

In the above analyses, the subject factor was included in all models to test the random intercept effect. If this effect was not significant, the subject factor was removed, and the analyses were continued using fixed-effects models.

Results

Cognitive load in the code comprehension task

For accuracy, we selected the basic model that contained only the main effects of Structure and Inserted Command. However, there was no significant difference in accuracy between embedded and sequential conditions as well as between conditions with 1- versus 2-inserted commands (ps > 0.379). For response times, the selected model included the main effects of Structure and Inserted Command as well as the Structure × Inserted Command interaction. The results indicated that both the main effects and the interaction were significant (Structure: β = − 2.271, SE = 0.733, t (169) = − 3.100, p < 0.01; Inserted Command: β = − 3.308, SE = 0.733, t (169) = − 4.515, p < 0.001; Interaction: β = 3.680, SE = 1.033, t (169) = 3.561, p < 0.001). Such main effects suggested that responses were faster in sequential versus embedded conditions, as well as in conditions with 1- versus 2-inserted commands. Then, we conducted further analyses to interpret the significant interaction. The results indicated that when there were two inserted commands, responses were faster in sequential versus embedded conditions (β = − 2.214, SE = 0.642, t (56) = − 3.447, p < 0.01), but this difference was reversed when there was one inserted command (β = 1.421, SE = 0.711, t (57) = 2.000, p = 0.05). Additionally, when there were embedded relations between control structures, the responses were slower in the conditions with 2- versus 1-inserted command (β = − 3.277, SE = 0.920, t (112) = − 3.562, p < 0.01). In contrast, such condition difference was not significant when control structures were organized sequentially (Fig. 4).

Relations between cognitive load and working memory

Since the main effects of Structure and Inserted Command on accuracy were not significant, response times were selected to test the relations between cognitive load and working memory. In addition, the results indicated that when there was one inserted command, responses were faster in embedded versus sequential conditions, which was inconsistent with our hypothesis. Therefore, we only examined the relations of working memory to the difference in cognitive load between embedded and sequential conditions when there were two inserted commands. Accordingly, we compared the basic model that only included the main effects of Structure and Working Memory to the new model that included both the main effects and the Structure × Working Memory interaction. The new model was then selected, but there were no significant results involving working memory (ps > 0.401).

Similarity, only in embedded condition, the difference in response times between conditions with 1- versus 2-inserted commands was significant. Therefore, we examined the relations of working memory to the difference in cognitive load induced by different numbers of inserted commands in embedded condition. We compared the basic model that included the main effects of Inserted Command and Working Memory to the new model that also included the Inserted Command × Working Memory interaction. With the new model selected, we did not find significant results involving working memory (ps > 0.144).

Interim discussion

Consistent with our hypotheses, the findings indicated that when there were two commands inserted between repeat control structures, participants responded slower in embedded versus sequential conditions, suggesting greater cognitive load. However, when there was one inserted command, responses were slower in sequential versus embedded conditions. The findings might be associated with the use of cognitive strategies. Specifically, as reported by participants, when there was only one inserted command within embedded structure, all commands in the repeat control block could be easily memorized as a chunk containing information about the transformed dimensions and their execution orders. For example, in embedded condition with 1-inserted command (Fig. 2), after executing the commands (i.e., “Transform shape”, “Transform color”, and “Number-3”) within the inner control structure for the first time, the three commands and related information were memorized, and their subsequent execution did not require a shift in attention to the code snippet on the left side of the screen. In contrast, in sequential condition, the commands contained in the upper and lower repeat blocks were different. Therefore, if the chunking strategy was ever used, participants had to chunk the two repeat blocks separately, which might contribute to the longer response times in sequential than in embedded conditions.

In addition, the using of chunking strategy may be affected by the number of inserted commands. With more commands inserted into embedded condition, greater challenge might be imposed on the working memory system. Previous studies have indicated that such a challenge may prevent the use of chunking strategy or reduce the benefit of using this strategy (Janssen & Brumby, 2010; Schorr et al., 2003). Accordingly, we speculated that the effect of the chunking strategy might be compromised when there were two commands inserted between control structures in embedded condition. Such speculation was supported by the results of this study, which indicated that when there were two inserted commands, the comprehension of embedded structures led to slower responses than sequential structures.

Furthermore, there was no significant relation between working memory and the difference in response times between embedded and sequential conditions. One possible reason may be that listing all transformed circles corresponding to the commands on the screen reduced the cognitive load associated with the mental representation of embedded relations. For example, in embedded condition with 2-inserted commands (Fig. 2), the execution of each external repeat included eight commands that were exactly represented by two rows of circles. According to the number of circles that had been processed, participants could easily count the times the inner and outer loops had been repeated. Therefore, we conjectured that the organization of circles concretized the hierarchical representation of embedded relations and reduced the cognitive cost induced by switching between inner and outer repeats. Another possible reason for the absence of a relation between working memory and cognitive load might be related to the use of different colors to distinguish inner and outer repeat control structures only in embedded condition. This color difference may help participants build the representation of hierarchical relations, which may reduce the cognitive load generated from processing the code commands in embedded condition.

To summarize, the design of the code comprehension task in Experiment 1 might lead to the unexpected use of cognitive strategies and the decrease in cognitive load associated with the processing of hierarchical relations in embedded condition. Therefore, we further conducted Experiment 2, in which the code comprehension task was redesigned to exclude the possible impacts of strategy use and stimuli presentation on cognitive processing (see details below).