How are primary school computer science curricular reforms contributing to equity? Impact on student learning, perception of the discipline, and gender gaps

Early exposure to Computer Science (CS) and Computational Thinking (CT) for all is critical to broaden participation and promote equity in the field. But how does the introduction of CS and CT into primary school curricula impact learning, perception, and gaps between groups of students? We investigate a CS-curricular reform and teacher Professional Development (PD) programme from an equity standpoint by applying hierarchical regression and structural equation modelling on student learning and perception data from three studies with, respectively, 1384, 2433 and 1644 grade 3–6 students (ages 7–11) and their 83, 142 and 95 teachers. Regarding learning, exposure to CS instruction appears to contribute to closing the performance gap between low-achieving and high-achieving students, as well as pre-existing gender gaps. Despite a lack of direct influence of what was taught on student learning, there is no impact of teachers’ demographics or motivation on student learning, with teachers’ perception of the CS-PD positively influencing learning. Regarding perception, students perceive CS and its teaching tools (robotics, tablets) positively, and even more so when they perceive a role model close to them as doing CS. Nonetheless, gender differences exist all around with boys perceiving CS more positively than girls despite access to CS education. However, access to CS-education affects boys and girls differently: larger gender gaps are closing (namely those related to robotics), while smaller gaps are increasing (namely those related to CS and tablets). This article highlights how a CS curricular reform impacts learning, perception, and equity and supports the importance of (i) early introductions to CS for all; (ii) preparing teachers to teach CS all the while removing the influence of teacher demographics and motivation on student outcomes; and (iii) having developmentally appropriate activities that signal to all groups of students.


Introducing computer science and computational thinking for all from an equity perspective
The past decades have seen a growing international consensus regarding the importance of teaching Computer Science (CS) and Computational Thinking (CT) to ensure that students are digitally literate (Webb et al., 2017).Computing is increasingly ubiquitous in today's societies, thus leading to CS being more and more often considered as a subset of STEM education which must be rendered as available to students as mathematics or science education (Guzdial & Morrison, 2016).Introducing CS into formal education is also considered to foster Computational Thinking (CT), an essential skill for everyone in the twenty-first century (Jiang & Wong, 2022;Zhang et al., 2023) which is as important as reading, writing, and arithmetics (Wing, 2006).Teaching CT is not only considered by researchers to benefit STEM-related disciplines (Hurt et al., 2023;Swaid, 2015), but is also considered transversal.The benefits of CT are thought to go beyond CS or mathematics (Denning & Tedre, 2021;Li et al., 2020;Mannila et al., 2014;Weintrop, 2021;Weintrop et al., 2016;Zhang et al., 2023), extending to arts (Zhang et al., 2023), with new evidence even showing that young students employ CT during free play (Kotsopoulos et al., 2022), thus providing an additional lever to introduce both CS and CT to all.Although studies on K-12 CS education and CT have increased significantly in recent years (Apiola et al., 2023;Bers et al., 2022b;Hsu et al., 2018), introducing CS and CT into curricula has been a challenge internationally.
Ottenbreit-Leftwich and Yadav (2022) recently expressed the importance of a "system-wide implementation of CT" from an equity perspective to ensure that all students are introduced to CT, and not just those of a select number of teachers who choose to teach CT.This is echoed by Bers et al. (2022b) who advocate that exposure to CS and CT should happen in early foundational years (ages 3-8) "from a social equity perspective to prevent stereotypes and ensure [that] all young children receive equal opportunities to develop their digital literacy".Two key points emerge from this discourse and must be addressed to broaden participation and promote equity in these fields: • Structural barriers are access-related and limit (early) CS and CT experiences for all, but can be addressed through curricular reforms (Ottenbreit-Leftwich & Yadav, 2022).• Social barriers, often stereotype (and therefore gender) related, arise despite equal access and regardless of socioeconomic status (Wang & Hejazi Moghadam, 2017), but can be addressed through early exposure to mitigate the effects of existing stereotypes (Bers et al., 2022b).
The consequence of social and structural barriers is that disparities are present at multiple levels, including performance (i.e.learning) and attitudes towards CS (i.e.perception).Such disparities ultimately contribute to having under-represented groups in CS and CT-related fields, and must therefore be addressed in order to increase the likelihood that a more diverse and inclusive set of people persist in these fields.In the following sections, we delve into the literature and highlight the disparities that exist due to such barriers, particularly in terms of learning and perception.

The influence of social and structural barriers on learning-related equity
Several studies have shown that unequal access to (highquality) CS education (Bers et al., 2022b;Wang & Hejazi Moghadam, 2017) contributes to performance gaps.In particular, a recent large-scale analysis of performance with 46,000 students from 14 countries conducted by Karpinski et al. (2021) found that socioeconomic background, and therefore access, was related to persistent gaps in CT performance.Their findings indicated that students from "less advantaged backgrounds had lower levels of computer skills [...], especially in CT" (Karpinski et al., 2021).Unfortunately, regardless of access, several studies have found that boys perform better than girls (El-Hamamsy et al., 2022c;Kong & Lai, 2022b;Polat et al., 2021;Román-González et al., 2017), even in kindergarten (Sullivan & Bers, 2016), due to the existence of stereotypes (see " The influence of social and structural barriers on equity related to the perception of the discipline").
Although access to developmentally appropriate CS & CT education can increase students' skills from a young age (Bers et al., 2014;2022a, b;Hall & McCormick, 2022;Relkin et al. 2021), several studies suggest that perception of the discipline can also influence performance (Hinckle et al., 2020;Rachmatullah et al., 2022;Sun et al., 2022).Rachmatullah et al. (2022) for instance found that the gender-performance gap was more prevalent in countries where the "socio-cultural context" tends to promote such stereotypes and "influenc[e] gender diversity in the CS field".Their findings are corroborated by Hinckle et al. (2020) who found that student learning was not directly influenced by prior experience, but was mediated by their perception of CS.Numerous studies in higher education have also found that motivational and affective factors influence performance and participation in the field (Lishinski et al., 2022), and that they are influenced by gender and ethnicity (Lishinski et al., 2022;Warner et al., 2022).These studies confirm the importance of: • developing CS and CT initiatives that broaden participation to all students, • considering their impact on performance and perception to verify whether the gaps between different groups of participants are decreasing.

The influence of social and structural barriers on equity related to the perception of the discipline
Perception-related biases are considered to contribute to disparities and under-representation in CS for women (Rachmatullah et al., 2022;Wang & Hejazi Moghadam, 2017), and more generally for under-represented minorities (Lishinski et al., 2022;Warner et al., 2022), due to stereotype threat (i.e.conforming to/inducing a stereotype simply because you know it exists).Unfortunately, the developmental literature has found that basic stereotypes develop in children as young as 2-3 years old (Bers et al., 2022b).This is confirmed by multiple studies that identified CS-related stereotypes in young children (e.g. starting 6 years old, Master et al. 2021, and even kindergarten, Sullivan & Bers, 2016).The result is that when students are exposed to negative CS-stereotypes, students in the stereotyped group (e.g.girls in this case) tend to endorse those beliefs (Plante et al., 2013;Vandenberg et al., 2021) which negatively impacts their performance, motivation, and career intentions (Master & Meltzoff, 2020;Plante et al., 2013;Vandenberg et al., 2021).For instance, Cheryan et al. (2013) found that women who were presented non-stereotypical views on computer scientists were more likely to express an interest in majoring in CS.Therefore, students may make early career decisions informed by such stereotypes, contributing to an early gender gap (Wang & Hejazi Moghadam, 2017), and longterm disparities in the fields of CS and engineering (Master et al., 2021).
As gender-related stereotypes are prevalent, it is not surprising that numerous studies find that girls perceive CS more negatively than boys (El-Hamamsy et al., 2023c;Kong et al., 2018;Vandenberg et al., 2021;Witherspoon et al., 2016), contributing to a lower sense of belonging (Cheryan et al., 2013(Cheryan et al., , 2017;;Opps & Yadav, 2022; Vandenberg et al., 2021), self-efficacy (Beyer, 2014;Kong et al., 2018;Vandenberg et al., 2021), and interest (Beyer, 2014;Master et al., 2021).Provided the importance of such factors for academic achievement and career decisions (Bandura, 1993;Beyer, 2014;Howard et al., 2021a;Olivier et al., 2019), the consequence is that CS "suffers from the lowest participation of girls than other science, technology, engineering, and mathematics (STEM) subjects (Cheryan et al., 2017)" (Hinckle et al., 2020;Jiang & Wong, 2022).As prior experience may positively affect attitudes toward CS (Hinckle et al., 2020), researchers have suggested that engaging early in CS-related activities that "signal equally to both girls and boys that they belong and can succeed" (Cheryan et al., 2017) in CS, may increase girls' interest, and ultimately contribute to addressing gender equity in the field (Cheryan et al. 2017;Hinckle et al., 2020;Jiang & Wong, 2022).Therefore, in the rest of the article we refer to perception-related equity as the reduction of the influence of stereotypes around CS & CT that lead to biases between groups of people (namely gender) and may influence their motivation, engagement, participation and persistence in these fields.
How are CS and CT curricular reforms having an impact and contributing to equity in these fields?
Early CS and CT opportunities for all students are essential to address structural and social barriers, broaden CS participation, and promote equity in the field.An increasing number of initiatives have therefore sought to include CS and CT in compulsory K-12 worldwide (Balanskat & Engelhardt, 2015;Bers et al., 2022b;Bocconi et al., 2022;European Union and Education, 2019;Hubwieser et al., 2015;Voogt et al., 2015;Webb et al., 2017).In this context, it is essential to establish how such initiatives affect students (Guskey, 2002).This should extend beyond learning to include perception, and investigate how these dimensions interrelate (Hinckle et al., 2020) to ensure that expanding CS to K-12 "neither exacerbates existing equity gaps in education nor hinders efforts to diversify the field of CS" (Wang & Hejazi Moghadam, 2017).The student-level impact of widespread CS and CT curricular reforms, and professional development (PD) programmes, is however seldom evaluated."Studies that relate student's learning achievement and teachers' capacity building are still rare in the literature of CT (Mason & Rich, 2019)" (Kong & Lai, 2022a).This is likely due to the difficulties countries face implementing CS & CT reforms, including adequately training a sufficient number of teachers to teach the new concepts (Bocconi et al., 2022;El-Hamamsy et al., 2021b).Difficulties of assessing teachers' mastery of Computational Pedagogical Content Knowledge (Hickmott & Prieto-Rodriguez, 2018), and what is implemented after PD programmes (El-Hamamsy et al., 2022a) also exist, despite their direct influence on student learning (Kong & Lai, 2022a).To the best of our knowledge, only Kong & Lai (2022a) linked 81 teachers' content knowledge with 3226 students' achievement in their evaluation of a PD programme.However, these teachers chose to participate in the PD programme and were required to teach a year-long curriculum.This differs significantly from mandatory curricular reform contexts, where the PD programme is imposed on all teachers, resulting in teachers who implement the pedagogical content to varying degrees, if at all.Since a prerequisite to achieving equity is that CS-related reforms have an impact, the lack of studies evaluating the impact of CS reforms means that there is little insight into whether these reforms are contributing to equity and reducing learning and perception gaps between different groups of students.
Since a "K-12 curriculum is a zero-sum game, where adding a subject means [removing] something" (Ottenbreit-Leftwich & Yadav, 2022), it is essential to establish the effectiveness of implementing CS & CT curricula in formal education.Evaluating a reform's effectiveness is critical given: • the need to improve corresponding PD programmes and curricula (Hickmott & Prieto-Rodriguez, 2018), • the objective of sustaining the reform in teachers' practices (Hubers, 2020), • the importance of alleviating concerns of funding agencies and government bodies regarding the impact of the reform and PD programme on teachers (Hickmott & Prieto-Rodriguez, 2018) and students.
Studies evaluating the impact of reforms are even more pressing since recent findings indicate that teachers are not necessarily convinced that their students are learning as a result of teaching these novel curricula (El-Hamamsy et al., 2023b;Toh, 2016) .Establishing the benefits at the student-level is therefore not only necessary to have a complete evaluation of reforms (Avry et al., 2022;El-Hamamsy et al., 2023b;Guskey, 2000), but is essential if the objective is to promote teachers' decisions to continue to implement a new practice in the long term (Howard et al., 2021b;Klingner et al., 2001).

Problem statement and research questions
The present study therefore looks to contribute to understanding the influence of CS curricular reforms on student learning and perception and determining to what extent they contribute to equity with respect to: (i) gender, i.e. reducing significant differences between boys' and girls' perception and performance; (ii) performance, i.e. reducing significant differences between initially low and high performers; and (iii) self-efficacy, i.e. reducing significant differences between students who have low or high self-efficacy.Please note that although the main focus of the article is on the former equity dimensions, one must not neglect the importance of equity in terms of socioeconomic status (Vandenberg et al., 2021;Wang & Hejazi Moghadam, 2017), a dimension which we did not have access to in the present context.We propose to address the overarching question of equity in two steps: first investigating whether and how the reform significantly influences perception and learning (impact), and then how the results differ according to student populations (equity).To that effect, we investigate the impact of a mandatory CS curricular reform and teacher PD programme (see "Context: a computer science curricular reform for all to promote equity starting primary school") to understand whether and how the primary school Computer Science curricular reform is contributing to reaching equity goals (i.e.broadening participation in the field to a larger number and a more diverse set of people).We therefore consider the following research questions: (RQ1) How does teaching CS pedagogical content1 impact student learning?And how does it impact learning-related gender-and performance-equity?(RQ2) How does teaching CS pedagogical content impact students' perception of CS and the tools used to teach it (i.e.robots and tablets)?And how does it impact perception-related self-efficacy and gender-equity?
To answer these questions, we employ data collected between January 2021 and June 2022 in the context of a mandatory primary school CS-curricular reform that is presently being deployed to all grade 1-6 teachers in the region after a piloting phase.The data stem from three studies (see Table 1), the first on student learning (RQ1), the second on perception of the discipline and performance (RQ1, RQ2), and the third on perception of the discipline (RQ2).These studies involved, respectively, n 1 = 1384 , n 2 = 2433 and n 3 = 1644 grade 3-6 students (ages 7-11) and their n 1 = 83 , n 2 = 142 and n 3 = 95 teachers.The data are analysed through hierar- chical linear modelling for student learning, and Structural Equation Modelling for perception, to establish the link between teaching CS and these key outcome variables.

Context: a computer science curricular reform for all to promote equity starting at primary school
The research is part of a large-scale project seeking to introduce Digital Education (also referred to as Computing Education) as a new discipline for all students in the Canton of Vaud in Switzerland (El-Hamamsy et al., 2021b).The curricular reform relies on the collaboration between four institutions in the region (the department of education, the university of teacher education, a higher education university and the technical university) within a research practice partnership to develop the curriculum and corresponding mandatory teacher-PD programme for CS, Information and Communication Technology and Digital Citizenship.To ensure the sustainability and scalability of the reform, the project began with a piloting phase with 10 representative schools from the region (hereby referred to as CS-schools) before large-scale deployment.The CS-curriculum and teacher PD-programme was piloted for the first time and iteratively adjusted for grades 1-4 in 2018-2019, and for grades 5-6 in 2019-2020, with all the teachers from the 10 CS-schools (approximately n grades1−4 = 350 , and n grades5−6 = 180)2 .This resulted in a reference manual3 containing pedagogical activities (for CS, n grades1−4 = 13 , n grades5−6 = 12 ) that the teachers can choose from to achieve the curricular objectives (in terms of algorithms and languages, machines and networks, information and data, and the impact of CS on society).The teachers were trained to teach these activities during a mandatory CS-PD that they participated in prior to the present study and were encouraged to teach the novel discipline which is now part of the regional study plan.They were however not required to do so.Given that in primary school there is no dedicated hour in the grid for Digital Education (and thus CS), and that the discipline is not evaluated, this leads to a large variability in both what and how much is taught.This therefore required analysing the student-level impact of the curricular reform, and the influence being taught specific pedagogical content by teachers (which we refer to as adoption).While the initial focus was on student learning (see study 1 in "Study 1: student learning and the link with what teachers from the CS-schools implemented"), a parallel pilot study in grade 9 (ages 13-14) in Spring 2021 indicated that there were already significant perception-related gender gaps (El-Hamamsy et al., 2023c).This lead to the introduction of a student perception survey in Fall 2021 (see studies 2 and 3 in "Study 2: student perception, the link with what teachers from the CS-schools implemented, and correlations with performance" and "Study 3: student perception between CS-schools and schools where teachers were not yet trained to teach computer science") to determine when gender gaps appear and whether teaching CS contributes to closing these gaps.

Participants and data collection (study 1)
The first study follows all the grade 3-4 students from 7 CS-schools over 6 months to evaluate learning in a pre-post-test design.These students were all introduced to CS for the first time during the 2018-2019 academic year and therefore had approximately 2 years of prior CS experience.The objective of the study was therefore to see to what extent these students progressed over that time period in relation to what they were taught.Given the scale of the study, the objective was to focus on a subset of the learning objectives that could be measured in a valid and reliable way, and at a large scale, in grades 3-6.We therefore chose to focus on the CT-concepts defined To that effect, we employed the competent Computational Thinking test (cCTt, El-Hamamsy et al., 2022c), a 25-item CT-concepts' assessment (see example questions in Fig. 1) originally developed and validated for grades 3-4 that evaluates CS concepts of sequences, loops, ifelse statements and while statements.This instrument was later validated for grades 3-6, including a Differential Item Functioning analysis which demonstrates that the cCTt is not biased towards genders (i.e. it is gender fair) and can therefore be used to measure significant differences between boys' and girls' responses (El-Hamamsy et al., 2023d).The student-learning data were complemented by data on teachers' perception of CS and the CS-PD acquired in January 2021, and data regarding what teachers taught (which we refer to as adoption) between January and June 2021 (see Table 2).The adoption data are based on the activities that the teachers were introduced to during their CS professional development programme and is collected in the form of a number of periods per activity which we are then able to convert into boolean values and derive the amount of CS activities taught.
Please note that the data sets include missing data due to (i) students not being present for either the full preand/or post-tests, (ii) teachers not administering the test, or (iii) teachers not answering the pre-and/or postteacher survey.As the analyses combine multiple data sets, a synthesis of the number of students and teachers for which the full responses are available with respect to the data subsets considered is provided in Table 3.Finally, while it would have been interesting to have a control group to be able to infer how learning compared between students who had access to CS courses and those who did not, the administration of a performance assessment to students in non-CS-schools was not authorised due to ethical concerns.Nonetheless, given the variability in what the teachers taught, 4 grade 3 classes and 6 grade 4 classes did not receive any CS education and thus provide an interesting point of comparison.As the second data subset (test + adoption data) constitutes the core of the analysis, we provide more detailed demographics information in Appendix A.1 in Table 10.

Analysis methodology (study 1)
The student learning data are analysed in three stages.
First, the January and June test data ( n = 1319 ) are analysed using multiple ANOVA with Benjamini-Hochberg p-value correction to reduce the false discovery rate (study 1a).The results are reported as significant (i.e.p < 0.05 ) only if the minimum effect size (Cohen's D 5 ) required to achieve a statistical power of 0.8 is reached with α = 0.05 .Dunn's post hoc test is then applied for multiple comparisons when significant.When comparing responses between groups of students (according to the dependent variables) the delta between the average 4 Brennan and Resnick (2012)'s operational definition of CT decomposes CT into CT concepts (i.e. the concepts that computer scientists engage with), practices (i.e. the processes they employ to resolve computational problems) and perspectives (i.e.their perception of CT).Please note that at the time of the study there were no valid, reliable and scalable instruments to measure CT-practices and perspectives. 5Cohen's D is a means of quantifying the difference between the means of two samples ( µ 1 , µ 2 ) all the while accounting for their standard deviations ( sd 1 and sd 2 ).Cohen's D is therefore computed as the difference between the two sample's means divided by the pooled standard deviation ( s p ). There- fore, Cohen's D = µ1−µ2 sp where s p = . The rule of thumb to interpret Cohen's D is as follows: if around 0.2 the effect is considered small, if around 0.5 the effect is considered medium and if around 0.8 or above the effect is considered large.2.17 ± 0.69 Cronbach's α is provided for each sub-scale.Please note that the sample size (n = 67) was too small to validate the measurement model through Confirmatory Factor Analysis scores on the cCTt's scale is provided ( ), in addition to the F-value, degrees of freedom, corresponding p-value and effect size using Cohen's D. The ANOVA considers the students' scores as the dependent variable, and the interaction between the following independent variables: time (pre-test or post-test), grade (3 or 4) and gender (boy or girl as indicated on the school's records 6 ).Second, the data set that introduces the adoption data ( n = 989 ), i.e. what the teachers taught between the pre- and post-test, is analysed through hierarchical linear modelling which nests students in classes and classes in schools (study 1b).
Finally, to determine whether teacher-level variables (see Table 2) influence student learning, the third data set (study 1c) that includes teacher perception is analysed through a correlation analysis with averaged class-level student scores ( n = 67 ), prior to a hierarchical linear modelling at the student-level ( n = 752 ).The hierarchi- cal linear modelling done in these two stages was conducted in R (version 4.2.1, R Core Team, 2019) with nlme (version 3.1-157, Pinheiro et al., 2022;Pinheiro & Bates, 2000) and sjstats (version 0.18.2, Lüdecke, 2022).

Results: the impact of teaching CS on student learning (study 1) Student learning and the influence of gender and when the test was taken (study 1a)
The ANOVA indicates that all independent variables and their interactions significantly influence the test score (see Appendix A.3 Table 12 for a synthesis of the effects) and the following trends emerge.
Table 3 Number of students participating in study 1 on student learning structured according to the number of complete observations according for each data subset considered: pre-(January) and post-test (June) data, teacher adoption data (at the time of the post-test, June), teacher perception data (at the time of the pre-test, January) 6 Please note that we never asked students to relate their gender throughout the data collection process to avoid biasing students' responses and performance as a result of stereotype threat.Indeed, as we could not guarantee that all students would participate in all the data collections which were conducted over multiple sessions, and therefore could not solely rely on collecting the gender information at the end of the final data collection, we relied on the gender information obtained from the school's records.This information is provided by students' parents to the schools and therefore most likely aligns with the students' sex, without a guarantee that this corresponds to up-to-date information regarding the way students identify themselves.Furthermore this gender information was provided by the schools in a binary format.Although we acknowledge that gender relates to a person's identity, differs from biological sex (Risman, 2018), and is increasingly recognised as being non-binary, this was not yet fully the case in the country where the study was conducted at the level of formal primary education and at the time of the data collection.Indeed, at the time of the data collections, gender at the level of primary school and formal education more broadly was mainly considered as a binary construct.Nonetheless, most international studies find that the proportion of people who identify as transgender is generally inferior to 1.5% (e.g.0.6% of the population aged 13 or older in the US, Herman et al. (2022); between 0.5% and 1.3% for children, adolescents and adults according to Zucker 2017's international review).
The potential discrepancy between the gender information on the school's records and students' gender identity represents therefore at most a 1.5% error which is below the level of significance which would affect the validity of the findings with a confidence level α = 0.05 .Therefore, in order to align with the current practice in the STEM education community which often employ the term gender and gender biases when actually gathering and analysing binary or biological sex data (e.g.Jensen et al., 2023;Sung et al., 2023;Malespina & Singh, 2023), we maintain the term gender, gender-biases and gender-gaps when referring to our data and our analyses.Are there gender biases and are these closing?
The results that account for the students' gender alone show that there is a significant main effect of students' gender on their performance.In particular, boys have significantly higher scores than girls overall with a small effect size (boys > girls, = 0.551pts , p = 0.0015 , Cohen's D = 0.109 ).Considering the two-way interac- tion effects, we observe the following tendencies.Over all students, the gender gap is significant in the pretest (January boys > girls, = 0.664pts , p = 0.0079 , Cohen's D = 0.131 ) but decreases and is no longer sig- nificant by the post-test (June boys ∼ girls, = 0.438pts , p = 0.0744 , Cohen's D = 0.091 ).Considering the two- way interactions, these gender differences are significant in grade 3 (grade 3 boys > girls, = 0.725pts , p = 0.004 , Cohen's D = 0.145 ) , but not in grade 4 (grade 4 boys ∼ girls, = 0.469pts , p = 0.0604 , Cohen's D = 0.098 ) .The three-way interaction between these variables thus helps shed some light on the trends observed (see Fig. 3) to draw conclusions: To complement these findings we consider the student learning data from study 2 (see "Study 2: student perception, the link with what teachers from the CS-schools implemented, and correlations with performance") that was conducted in November 2021 (5 months after the post-test of study 1) in the same schools and includes students from grades 3-6 (7-11).This is a particularly interesting cohort of students because students in grades 3 and 4 in study 2 are the first group of students to have had access to CS education starting first grade.Analysing the student performance data confirms that students continue to progress in terms of CT-concepts when moving on to grades 5 and 6 (see Fig. 4).Indeed, the differences between grades 3 and 4 are significant ( = 2.87pts , p < 0.0001 , Cohen's D = 0.566 ), as well as those between grades 4 and 5 ( = 1.35pts , p < 0.0001 , Cohen's D = 0.266 ), although there is no significant difference between students in grades 5 and ).This is consistent with the fact that students increase in maturity faster when they are younger (Hartshorne & Germine, 2015).As such, students in grades 3 and 4 differ more significantly in terms of their cognitive abilities than students in grades 5-6.
Evaluating the difference between boys' and girls' scores per grade in November 2021 (study 2) indicates that the results are non-significant across grades (see Fig. 4).As these students were in their 3rd or 4th year of CS education, this would appear to corroborate the previous findings: students who have had early and prolonged access to CS education are less likely to exhibit CS-performance gender-gaps.

Student learning and the influence of the CS-education received (study 1b)
To understand how teaching the CS-pedagogical content from the curriculum may have influenced student learning, we consider the data from 989 students for whom the pre-and post-tests, and teacher adoption data (i.e.what the teachers taught, see "Participants and data collection (study 1)") are available.We implemented multiple hierarchical linear models while nesting students in classes and classes within schools to account for the different ways of considering student learning and adoption 7 These models consistently indicated that there was no direct link of adoption on students' post-test scores.For instance the model considering how the delta between the post and pre-tests is influenced by the students' grade, gender and the number of CS activities taught estimates a non-significant effect of the number of CS activities taught on the progress students made with b = 0.122 , df = 45 , t = 0.442 , and p = 0.661 (see Table 11 in Appen- dix A.2).Only the pre-test score significantly predicts the progress made in the post-test, with students performing lower at the pre-test progressing more.While the lack of a significant influence of CS activities taught on learning may appear surprising, visualising the trends between teaching and not teaching CS pedagogical content, as well Random effects: classes within schools.Please note that random effects are not the main focus of the analysis but still need to be included in the hierarchical linear model in order to account for their influence on the dependent variables.We therefore do not estimate the impact of each school or class on the outcome but rather control for them in order to avoid drawing erroneous conclusions.
as according to the number of activities taught, confirms the lack of an evident trend (see Fig. 5).

Student learning and the influence of teacher demographics, perception and the CS-PD received (study 1c)
Given the link between access to CS education and performance, and the lack of a direct link between what the teachers taught and student learning, it would appear that there are additional factors at play when affecting learning.Therefore in a final phase, the teachers' aggregate (i) perception of the PD programme, (ii) perception of CS, (iii) autonomous motivation to teach CS8 and the (iv) demographic data collected at the same time as the pre-test were put in relation to the student learning results.First, the students' results were averaged per class to obtain a class performance and correlated with the teacher-level variables.As the perception data are on a 7-point Likert scale and non-normally distributed, Spearman's rank correlation was used.All the correlations with class performance were non-significant (whether in terms of teacher demographics, prior experience or CS perception), with the exception of the training evaluation (Spearman's rho = 0.33, p = 0.007).
As adoption was found to be not significantly related to student learning (study 1b), we compared two hierarchical linear models at the student-level, one with and one without adoption variables, with both including studentlevel, teacher perception-level and teacher demographiclevel variables.An analysis of variance between the two models indicates that the difference is non-significant (p = 0.768).The more parsimonious model which does not include the adoption data (see Table 4), and which also relies on a larger set of complete data (i.e.1027 vs. 752 observations) should therefore be preferred.The resulting hierarchical linear model at the student-level confirms the trend observed in the correlation analysis, and indicates that the following dependent variables predict the delta between the pre-and post-test scores, with no influence of teacher demographic variables (including teaching and ICT experience): • The pre-test score predicts the delta negatively ( p < 0.0001 , β = −0.35 ), i.e. students performing lower at the pre-test progressed more.• The average PD programme evaluation score predicts the delta positively ( p = 0.0053 , β = 1.02 ), i.e. stu- dents of teachers who positively viewed the CS-PD progressed more.All grade-differences are significant, excepted the one between grades 5 and 6 while the gender-differences per grade are non-significant

Synthesis and limitations of study 1
The students progress in terms of CT-concepts over time, with grade 3 students achieving a year's worth of CT-development in a 6-month window (study 1a, positive impact).However, the results of the hierarchical linear modelling indicate that there is no direct effect of what was taught with the progress students made (study 1b, no impact and therefore negative for equity).The only factors that appear to influence learning are: (i) the students' scores in the pre-test, with students who have lower scores progressing more thus contributing to performance-equity; (ii) the teachers' perception of the PD programme (study 1c, positive impact).There is additionally no influence of teachers' demographics on what the students have learnt, indicating that the PD programme helped prepare teachers to teach CS pedagogical content, irrespective of their prior teaching experience and ICT experience.This contributes once  There are, however, limitations due to the lack of a true control group that has never had access to CS education.Indeed, the students in the present study were not compared to students who had not done any CS education between the pre-and post-tests, or since the start of their schooling.The fact that students with lower pre-test scores progress more may also be due to the existence of a "ceiling effect" for already higher performing students (either cognitively, with respect to what the cCTt measures, or what is attainable with the pedagogical content taught).In terms of teacher and class data, while the teachers were asked what they taught and for how long, this does not indicate their mastery of the content, the implementation fidelity (i.e. to what extent they put emphasis on the CS concepts in these activities) or whether they taught other activities that were not part of the PD programme that may be linked to CS education or grid based concepts which are also part of the maths curriculum.Finally, the assessment used: • focuses on CT-concepts, although there are other elements of CT that may be positively affected by access to CS education which are not measured (in addition to other dimensions of the CS curricular reform including those pertaining to machines and networks, data and information and the impact of CS on society); • is used in both the pre-and post-test due to the fact that (i) at the time of the studies there existed no valid and reliable assessment of CT-concepts in primary school for these grades; (ii) no validated assessment proposes isomorphic variants which have been proven to have the exact same difficulty and can therefore be reliably employed in the comparison of pre-post test design.To the best of our knowledge this remains true today as only Parker et al. (2022) has begun investigating how to create an isomorphic version of their instrument (the ACES test) and analysed what types of changes to the questions could truly be considered isomorphic in this context.This is important because "seemingly superficial changes in an item's context can cause students to recruit different knowledge and cognitive processes when solving a problem" (Parker et al., 2022).This study extends the first by evaluating students' mastery of CT-concepts and their perception of the discipline.The data collection was conducted in November 2021 and involved all students from grades 3-6 in the 7 CS-schools involved in the first study (see Table 5).The students first responded to a perception survey, before being administered the cCTt (which was shown to be adapted for grades 5-6 in El-Hamamsy et al. 2023d) to assess their mastery of CT-concepts.
The perception survey (see Table 6) targeted three dimensions.
The first dimension is the students' perception of Computer Science, including who they perceive as doing CS, called "informatics" in the region, a scalable alternative to the draw-a-computer-scientist test (Pantic et al., 2018).
Students were asked whether they perceived certain role models (e.g.influencers such as parents and teachers, Wang & Hejazi Moghadam, 2017), someone else, or nobody, as doing CS.One hypothesis is that students who have access to CS-education are more likely to perceive their teachers as role models.As primary school teachers are mainly women, they can be considered female role models, an element that is key to engaging girls in the field (Cheryan et al., 2017;Kong et al., 2018).Another hypothesis is that perceiving people "close to them" as doing CS (i.e.related to the idea that CS is becoming ubiquitous and accessible to all), will contribute to improved perception of CS overall.
The second dimension is how students perceive robots, as robotics is a means of teaching CS (El-Hamamsy et al., 2021a), and CS and engineering tend to be subject to stronger stereotypes than science and maths among young students (Master et al., 2017).Interestingly, recent studies have found that there is a link between students' perception of robots and their "aspirations to pursue a career in science" (Giang et al., 2023), with introductions to educational robotics affecting their perception of robots.
The third dimension is how students perceive tablets and other digital devices which are also employed as means of teaching CS (and ICT) in the curricular reform.
For each of these dimensions (CS, robotics, tablets), the emphasis is placed on three factors that are "different but related aspects of motivation" (Master et al., 2017) and can be considered as predictors of academic achievement in general (Bandura, 1993;Howard et al., 2021a;Olivier et al., 2019), educational choices, and career decisions (Blotnicky et al., 2018;Mason & Rich, 2020;Wang et al., 2020), in addition to being the most prominent in surveys evaluating students' (at all levels of education) perception of CS, coding or STEM (Mason & Rich, 2020): • Interest, i.e. "how much the individual likes or is interested in the activity" (Mason & Rich, 2020), is a key component of intrinsic motivation in self-determination theory (Ryan & Deci, 2020) and expectancy-value theory (Eccles & Wigfield, 2020).Several studies have found that boys are more interested in CS than girls, as in most STEM-related disciplines (Mason & Rich, 2020).Comfortingly, researchers have also found that interest increases after access to CS experiences, in particular for girls, which contributes to closing the interest gender gap (Master et al., 2017).• Self-efficacy (Bandura, 1993;Kong et al., 2018), i.e.
"a person's belief that they can complete a particular task or fulfil a particular role within a specific domain" (Mason & Rich, 2020).Similarly to interest, self-efficacy has been found to be higher for boys than girls in STEM-related domains.Selfefficacy has also been found to increase with computing experience, in some cases even contributing to closing the gender gap (Mason & Rich, 2020), whether related to programming (Gunbatar & Karalar, 2018), or robotics (Master et al., 2017).
Please note that, as domain-specific self-efficacy may be related to general self-efficacy, we also consider a school-related self-efficacy variable in the survey (i.e.how well students believe they are able to perform in school in general).• Perceived utility (Eccles & Wigfield, 2020;Wigfield & Eccles, 2000), a component of expectancy-value theory referring "to how a task fits into an individual's future plans" which is considered to "directly [influence] a person's achievement-related choices, and is influenced by a person's experiences, perceptions, goals, and self-schemata" (Mason & Rich, 2020;Wigfield & Eccles, 2000).
Given that the same survey was administered from grades 3-6 in conjunction with the test, the survey needed to be short to account for the students' age and attention span (see Table 6).Cronbach's α measurement of internal consistency of scales is provided for all Likerttype questions employing an analogue visual scale (see Fig. 6).This is complemented by a Confirmatory Factor Analysis to confirm the adequacy of the complete measurement model (see "Results: perception, the influence of what was taught since the start of the year on perception, and the link with performance (study 2)").Finally, the student survey was accompanied by a teacher adoptionsurvey that asked each teacher the amount of time spent teaching each of the CS, ICT and robotics activities proposed in the PD-programm.
Please note that the survey was initially intended as a pre-post administration to be put in relation with what the students did in between (as in study 1, see "Study 1: student learning and the link with what teachers from the CS-schools implemented").However, the positively skewed results (see "Results: perception, the influence of what was taught since the start of the year on perception, and the link with performance (study 2)") indicated that the students' perception of the discipline was possibly impacted by the CS-education received in prior years.It was thus essential to compare with students who had not yet received any CS-education.Unlike administering an assessment of CT-concepts to students who had not received any CS-education, administering a perception survey to a control group was accepted from the ethical standpoint (see study 3 in "Study 3: student perception between CS-schools and schools where teachers were not yet trained to teach computer science").

Table 6 Student perception survey items translated from French
Cronbach's α CS, Robotcs, Tablets = 0.67 for the 9 items consisting of 3 sub-scales using the 5-point Analogue Visual Scale (5PT-AVS) and is considered to be between acceptable and good (George & Mallery, 2003), and to have between moderate and high reliability (Hinton, 2004) Please note that the items pertaining to the usage of robots and tablets are not investigated in the present article

Analysis methodology (study 2)
The analysis is conducted in three stages: 1. Descriptive analysis of students' perception of the discipline.2. Structural Equation Modelling (SEM) to assess the impact of student demographic variables (gender, grade, general school-related self-efficacy), class-level variables (with respect to CS-, robotics-and ICTrelated education received since the start of the year) on students' perception of the discipline (see Fig. 7).3. Introducing student performance variables into the previous SEM to see how perception of the discipline may influence performance (see Fig. 8).
To assess the models' goodness of fit, both the measurement model (CFA) and the structural models (SEM) must be validated.Hu & Bentler (1999) recommend employing multiple complementary fit indices.Therefore, we employed local and global fit indices, namely the ratio between the χ 2 statistic and the degrees of freedom, the comparative fit index (CFI), the Tucker-Lewis index (TLI), the root mean square error of approximation (RMSEA) and the standardised root mean square residual (SRMR).While the χ 2 statistic should be non-significant (Alavi et al., 2020;Prudon, 2015), this is rarely the case, which is why researchers have recommended employing the ratio between the χ 2 statistic and the degrees of freedom (df ).The ratio χ 2 /df should be inferior to 5 for acceptable fit, and inferior to 3 for good fit (Kyriazos, 2018).The CFI and TLI should be above 0.9 for acceptable fit and above 0.95 for good fit (Byrne, 1994;Schumacker & Lomax, 2004;Xia & Yang, 2019).The RMSEA on the other hand should be below .08 for acceptable fit and below .06 for good fit (Chen et al., 2008;Hu & Bentler, 1999;Xia & Yang, 2019).Finally, the SRMR should be below 0.08 (Hu & Bentler, 1999;Xia & Yang, 2019).

Results: perception, the influence of what was taught since the start of the year on perception, and the link with performance (study 2)
Students' perception of CS, robots and tablets is highly positive and nearly saturates ( M = 1.55 ± 0.84 on the −2 to +2 scale, see Fig 9).An ANOVA, however, indicates that there are small significant gender differences.
To gain better insight into how the student-factors interact (demographic variables, perception of CS, tablets and robots, CS role models), and are influenced by what teachers taught, we employed SEM (n = 2116, November 2021) using Robust Diagonally Weighted Least Squares estimator (WLSMVS).The lack of influence between teachers' adoption of CS pedagogical content and perception appears conjointly with a lack of influence between perception and performance.Indeed, the SEM that includes students' scores (n = 1583, see Fig. 8 in "Analysis methodology (study 2)") to see how performance is influenced by perception and demographics indicates that there is no significant link (see Table 7).The only variables that significantly influence the score are the grade (older students have higher scores) and their general self-efficacy (students that are more confident in their capacity to succeed in school have higher scores).

Synthesis and limitations of study 2
The students have a positive perception of the discipline, and the tools employed to teach it in schools with access to CS education.Although the results are nearly saturated, the structural equation models help identify that: • Gender influences the way the discipline is perceived, as girls have a more negative perception of the discipline then boys (in particular where robotics is concerned) which is aligned with stereotypes in these fields (social barriers).• Having a role model close to the students as doing CS positively influences the perception of CS and the overall discipline, but those perceived as doing CS differs according to gender since girls perceive the teacher more often, and boys the father more often as doing CS.• There is no influence of the CS education received from the start of the year on perception.• There is no link between students' perception of the discipline and their performance on the assessment (positive for equity).• Students' general school-related self-efficacy positively correlates with the perception of the discipline and with students' performance on the cCTt.
There are, however, several limitations to this study, mainly that (i) the students were at least in their third year of CS education by the time the study was conducted, (ii) their perception was positively saturated and (iii) there was no control group.It would have been interesting to have access to a pre-test prior to their first CS lecture and to compare the evolution of perception over time.Where the link between perception and what the teachers taught is concerned, as for study 1, some of the findings may be biased by the fact that teachers may be teaching CS pedagogical content that was not included in the PD program and are not accounted for in the analyses.
In terms of the perception survey itself, while the CFA analysis indicates that the perception survey is a short and valid instrument that can be employed to measure grades 3-6 students' perception of the discipline and the tools used to teach it, this is not without its limitations.Indeed, the survey measures interest, utility and self-efficacy concepts with only one item for each dimension (CS, robotics, tablets).Ideally, for each concept and dimension, there would be at least 3-4 items (for interest, utility, and self-efficacy) in order to improve the reliability of the instrument.This owes to our requirement of being able to administer the CS perception survey to grades 3-6 students before the cCTt (and not after to avoid having their performance bias their perception), without taking too much in-class time for both (i.e. the perception survey had to be short and take less than 20 min overall with grade 3 students).Nonetheless, researchers have investigated the reliability of single-item items and have shown that it is possible to have reliable measures with only single items (see Hoeppner et al. 2011).

Methodology (study 3) Participants and data collection (study 3)
To extend study 2, the perception survey (see Table 6) was administered to all students in grades 3-6 ( n = 1644 , see Table 8) from 3 schools with access to CS education (which we refer to as CS-schools, n ∼= 831 ) and 2 similar schools without access to CS education (which we refer to as non-CS-schools, n ∼= 813 ) which were selected to be representative of the demographics of the region.Non-CS-schools here therefore refer to schools where, at the time of the study, the teachers were neither trained to introduce the new discipline into their practice nor had access to the material resources, infrastructure or support they require to teach the discipline, an element which was confirmed by an accompanying teacher survey.The objective was to compare the students' perception of the discipline between the two conditions (CSschools and non-CS-schools) as students in CS-schools had been in contact with the discipline for multiple years and perception was positively saturated in study 2.

Analysis methodology (study 3)
The comparison between both groups is established using Structural Equation Modelling by constraining the models to have equal factor loadings, and allowing the regression parameters to vary between the two groups (gender, grade, general self-efficacy).By comparing the intercepts of the two SEMs, it is possible to establish the effect of having received several years of CS-education on perception.By comparing the regression parameters, it is possible to establish whether there are interaction effects between the student variables (e.g.gender) and access to Table 7 Unstandardised regression parameters for the perception and background to performance SEM (n=1583, November 2021, χ 2 (124) = 221.462, p < 0.001 , chi 2 /df = 1.79 , CFI = 0.951 , TLI = 0.923 , RMSEA = 0.022 , 90%ci = [0.017, 0.027] , SRMR = 0.026) Please note that on the smaller sample CS utility did not correlate highly with interest and self-efficacy and had to be removed from the model.For the full set of model parameters see Table 14   and thus determine if gender-related gaps are indeed closing with the introduction of the novel curriculum.

Results: perception and the influence of having access to CS education on perception (study 3)
The SEM to compare the groups (CS-schools vs. non-CSschools) constrained the loadings and thresholds, while leaving the intercepts and regression parameters free to vary between groups9 .We thus compare the intercepts and regression coefficients between the groups (for the full SEM, see Table 15 in Appendix C.1).
The intercepts for both groups indicate that students' responses positively saturate for both groups for nearly all CS, robotic and tablet perception items (see Fig. 13).Nonetheless, students in CS-schools appear more interested generally, and evaluate the robotics generally more favourably.However, CS and tablet utility and self-efficacy are lower for students in CS-schools.Students in CS-schools perceive the teacher more often as doing CS, which is coherent with the fact that their teachers over the past few years have been teaching CS pedagogical content.On the other hand, students in CS-schools perceive their mothers and other students less often as doing CS, possibly indicating that the students have a better awareness of what it means to "do" CS (Pantic et al., 2018), and that it is not only related to using a computer or tablet.
The significant impact of general self-efficacy and gender on student perception is shown in Figs.14, and 15.
Figure 14 shows that general school-related self-efficacy positively influences CS self-efficacy ( b CS = 0.16 , p CS = 0.001 , b no-CS = 0.2 , p no-CS < 0.001 ) and robot- ics self-efficacy ( b CS = 0.11 , p CS = 0.033 , b no-CS = 0.14 , p no-CS = 0.016 ) of all students.This reveals that stu- dents who consider themselves less capable of doing well in schools also think that they are less able to do CS and robotics, although the influence is less pronounced when students have received CS-education.Access to CS education may thus contribute to a wider range of students considering that they are capable of doing CS and robotics.On the other hand, for tablets, while there is no significant influence of school-related self-efficacy in non-CS-schools ( p no-CS = 0.054 ), it is present in CS- schools ( b CS = 0.08 , p CS = 0.016 ) which may indicate that students realise the range of possibilities (beyond merely passive activities) and that this may require more competencies to be able to make use of.Nonetheless, general self-efficacy does not influence interest or perceived utility in CS-schools ( p CS > 0.05 ), contrary to non-CS- schools for CS interest ( b no-CS = 0.1 , p no-CS = 0.036 ), CS utility ( b no-CS = 0.14 , p no-CS = 0.001 ), and Robotics' util- ity ( b no-CS = 0.11 , p no-CS = 0.044 ).It would thus appear that access to CS-education helps reduce these biases.
Where gender is concerned (see Fig. 15), all gender gaps that are identified as significant confirm the stereotypes that boys perceive the discipline more favourably than girls.Some gender gaps are only present in CSschools (CS and tablet interest and self-efficacy, robots utility) suggesting that access to CS-education increases these gaps which, interestingly, are initially the smaller or non-significant gaps in non-CS schools.There are nonetheless some gaps that are smaller in CS-schools, all the while remaining present in both types of schools: robotics interest and self-efficacy, as well as perceiving the teacher as doing CS in CS-schools, which interestingly are the initially larger gaps in non-CS schools.Only the CS-interest gap is present in both schools and stronger in CS-schools.

Synthesis and limitations of study 3
Students' perception of the discipline is highly positive and affected by gender biases (social barriers) in both schools with CS education and schools without.However, access to CS education leads to: • Positive impacts through: increased interest in CS and the associated tools, a more positive perception of robotics on all dimensions, teachers being more often perceived as doing CS.• Negative impacts through: lower self-efficacy with respect to CS and tablets.• Positive outcomes for equity through: closing larger gender-gaps in terms of robotics interest and selfefficacy (gender-equity), a lesser influence of general self-efficacy on several perception dimensions (CS interest, utility, self-efficacy; robotics utility and selfefficacy).• Negative outcomes for equity through: increasing smaller gender-gaps in terms of CS and tablets selfefficacy (gender-equity), a higher influence of general self-efficacy on tablets' self-efficacy.
As in the case of studies 1 and 2, this study has its limitations.Firstly, the sample is relatively small to do a comparison between groups (even when constraining parameters to be equal).As such, the minimum effect size that can be detected is smaller than in the case of study 2. This analysis would therefore benefit from a replication at a larger scale.As mentioned for study 2, there is no view on how the perception evolves over time within these groups, and at the point where students gain access to CS education the first time.Therefore, it would be interesting to have access to a sample of students just before they began having access to CS education and then follow up over time, and compare with a group that has no access to CS education.This type of analysis has temporal constraints and must be planned for at the start Fig. 13 Comparison of the SEM intercepts between schools that had access to CS-education and schools that did not Fig. 14 Comparison of the SEM regression coefficients for general self-efficacy between schools that had access to CS-education and schools that did not.Please note that only significant regressors are shown of the reforms and prior deployment to all schools if the objective is to be able to compare for an extended period of time.

Discussion
This article investigates whether a large-scale mandatory primary school CS curricular reform and accompanying PD programme has an impact, and contributes to achieving equity goals, in terms of learning and perception.
As indicated in the introduction, achieving equity goals requires addressing structural (i.e.access related) and social (i.e.stereotype related) barriers that lead to underrepresentation in the field by influencing performance and perception early on.While equity in terms of access is ensured by the fact that the reform is being deployed to all teachers in the region, two main questions drive the study: (RQ1) How does teaching CS pedagogical content impact student learning?And how does it impact learning-related gender-and performance-equity?(RQ2) How does teaching CS pedagogical content impact students' perception of CS?And how does it impact perception-related self-efficacy-and gender-equity?
We provide a visual synthesis of the findings in Fig. 16 based on the learning and perception data drawn from 3 studies that were conducted over 2 years and involved, respectively, n 1 = 1384 , n 2 = 2433 and n 3 = 1644 grade 3-6 students (ages 7-11) and their n 1 = 83 , n 2 = 142 and n 3 = 95 teachers.The findings are further discussed in the following subsections.

Impact of the curriculum reform on student learning, and learning-related performance-and gender-equity (RQ1) Student learning impact
The findings of studies 1 and 2 indicate that the students progress in terms of CT-concepts (sequences, loops, if-else statements, while statements) over time, consistently with other studies that have found that students' algorithmic skills improve as they age (Kanaki & Kalogiannakis, 2022;Piatti et al., 2022).In particular, we observe that grade 3 students achieved a year's worth of CT-development in the 6 months that separated the pre-and post-tests (positive impact).Indeed, the grade 3 students' post-test scores were equivalent to the grade 4 students' pre-test scores.However, there is no direct link between learning and the amount of CS education received (absence of impact).There is, on the other hand, a positive influence of the teachers' perception of the PD programme (positive impact), which may be due to the Pygmalion (or Rosenthal) effect according to which a teachers' expectations may act "as a self-fulfilling prophecy" (Rosenthal, 2010).While teachers' perception of the PD-programme likely acts as a mediating variable for teachers' assimilation of the underlying CS-concepts and their appropriation of the pedagogical content, it does indicate the need to find means of motivating teachers to introduce CS into their practices (El-Hamamsy et al., Fig. 15 Comparison of the SEM regression coefficients for gender between schools that had access to CS-education and schools that did not.Please note that only significant regressors are shown 2022a) and to ensure that see the utility of doing so (El-Hamamsy et al., 2023b).
The lack of a direct link between what the teachers taught (i.e.adoption) and learning could be due to two main factors and their interaction: the adequacy of the content with respect to the targeted concepts, and the teachers' appropriation of the CS pedagogical content.We have synthesised the corresponding hypotheses in Table 9 depending on whether either or both of these factors are indeed at play in the present context.As a reminder: the teachers were trained to introduce the specific CS-pedagogical activities which were designed by experts in CS and pedagogy from multiple institutions.Therefore, considering conjointly these elements, and the link between student learning and the teachers' perception of the PD-program, it appears likely that the second hypothesis is true.More specifically: the lack of direct link with adoption could be partially or entirely due to teacher-level factors (their mastery of the concepts, and how they are teaching the pedagogical activities), although we may not presently rule out the other hypotheses.
To better understand the impact of teaching CS on learning, it would be important to investigate the various hypotheses by considering teacher assessments to gain insight into their mastery of the concepts, classroom observations to gain insight into teachers' implementation fidelity, and comparing with students in non-CS-schools.Such an approach would not only make it possible to assess each of the pedagogical activities individually, but would also give the opportunity to provide guidelines regarding how best to teach the pedagogical content to promote learning.Doing so, however, requires getting past certain barriers in the field, whether in terms Fig. 16 Visual synthesis of the study's findings and how these relate to impact and equity.Each factor considered is indicated in a rectangle in bold.For the student learning results that are based on hierarchical linear models, the identified effect of said factor on the outcome variable is indicated in plain text in the same rectangle.For the student perception results that are based on structural equation modelling, the impact of general school-related self-efficacy and gender on a given factor are indicated in the factor's rectangle in plain text.In both cases, the impact of the measured effect (or lack thereof ) on equity is colour coded (blue for a positive impact, red for a negative impact, purple for a mixed impact and black for an absence of impact).Please note that we only indicate significant links/effects (i.e.p > 0.05 ) which does not reflect on the strength of the effect detected (for that please refer to the results section and see the effect sizes and regression coefficients) of teacher reticence towards classroom and evaluation (Hickmott & Prieto-Rodriguez, 2018), or in terms of acceptability for policy and decision-makers (e.g.access to a control group for performance assessments).However, it is only by gaining such insight that it will be possible to adapt the CS curricular reforms so that it is successfully implemented and sustained in teachers' practices.This could include adapting the CS pedagogical content, PD program, or even considering a different strategy to introduce CS into formal K-12 education.The latter could involve having specialised teachers, or introducing CS transversally to support other disciplines thus contributing to "build computational literacies in all students" (Peel et al., 2022), all the while accounting for time struggles (Ottenbreit-Leftwich & Yadav, 2022) which according to Fofang et al. (2020) would provide pedagogical and equity benefits (but may also run the risk of decreasing the impact of the curricular reform, Suessenbach et al., 2022).We therefore argue that a complete assessment of CS and CT curricula would benefit greatly from expanding to other dimensions of CT (e.g.CT-processes, Brennan & Resnick, 2012), and evaluating the impact that CS-pedagogical content may have on learning in other disciplines (El-Hamamsy et al., 2022b;Ottenbreit-Leftwich & Yadav, 2022), transversal and twenty-first century skills (Barr et al., 2011;El-Hamamsy et al., 2022b;Gretter & Yadav, 2016;Nouri et al., 2020).

Student learning equity
The findings of study 1 indicate that students performing lower at the pre-test progress more in the 6 months before the post-test.This indicates that the performance gap is closing and contributing to performance-equity and is consistent with Vygotsky and Cole (1978)'s concept of the Zone of Proximal Development (ZPD).The ZPD is determined by the learning activity and its relation to what students are capable of doing along and with a specific instruction.Therefore, it would appear that the content is adapted to all students because: • Students with low scores on the pre-test progress more, indicating that the pedagogical content is within their ZPD.• Students with high scores on the pre-test may already master the concepts and therefore not progress more with the instruction provided.
Provided the additional lack of influence of teacherdemographics (including ICT experience, teaching experience and age which have been found to impact student achievement in various contexts Burroughs et al., 2019;Croninger et al., 2007;Kini & Podolsky, 2016;Ladd & Sorensen, 2017), their perceived utility of CS and their autonomous motivation to teach CS, on student learning, this would appear to indicate that the PD-program contributes to fostering student learning, and learning equity more generally.
The findings of study 1 also indicate that a marginally significant gender gap exists in grades 3-4 (likely due to stereotypes and social barriers), and that it appears to be closing over time (positive for gender-equity).This is corroborated by the data from study 2 (from the following academic year) where students who have had more access to CS education do not exhibit gender gaps.These Table 9 Hypotheses related to the absence of direct links between CS-education and student learning The teachers' appropriation of the CS-pedagogical content is aligned with the curricular objectives

True False
The CS-pedagogical content is adequate with respect to the targeted concepts True H1: The students have reached the limit of their cognitive abilities and are not capable of progressing more, irrespective of the additional content and CS education received H3: The teacher, while teaching the CS-pedagogical activities is not teaching the CS-concepts well (H3.1)either because they do not have sufficient mastery the concepts themselves; or (H3.2) because they do not put the emphasis on the CS concepts while teaching and focus on other facets, such as disciplinary links (e.g.maths or verbalisation), coherently with the differences between intended, enacted and attained curricula that are present generally van den Akker (2003) and in the context of CS Falkner et al. (2019).The PD program should be revised.
False H2: The CS-pedagogical content is either (H2.1) not developmentally appropriate Ottenbreit-Leftwich & Yadav (2022); Bers et al. (2022b), or (H2.2) does not go sufficiently in depth for students to progress beyond what they are acquiring without the CS-education, and should be revised.

H2 + H3
findings therefore confirm the importance of providing prior CS to address performance-related gender gaps.As the study did not include grade 1-2 students, it would appear relevant to follow up on the cohort of students over multiple years (and from the start of their schooling) to see how these differences appear and evolve over time.
The findings therefore appear to support that the CS-curricular reform contributes to achieving learning equity goals.This would align with the findings of a recent independent study conducted in Germany to evaluate the impact of the introduction of "informatics" into the curriculum throughout the country.In a longitudinal study, Suessenbach et al. (2022) found that (i) lower secondary schools students' ICT competence increased with access to informatics education; (ii) the gap between students with low and high socioeconomic backgrounds decreased; (iii) gender gaps were closing with girls catching up with boys' performance; and (iv) the impact was stronger in the case of informatics as its own discipline rather than having informatics transversally integrated into other subjects.

Impact of the curricular reform on student perception and perception-related self-efficacy-and gender-equity (RQ2) Student perception impact
Students' perception of the discipline and the tools employed to teach it is globally positive in primary school, whether in CS schools or not (studies 2, 3), as the results are positively saturated.Nonetheless, students' overall perception of the discipline is influenced by access to CS education.Indeed, access to CS education contributes to increased interest in CS and the associated tools, with a more positive perception of robotics overall (positive impact).Perceived utility and self-efficacy towards CS and tablets are however lower (negative impact).The latter may be indicative of a better understanding of what CS is, and the extent of the applications that are possible with tablets, contributing to more realistic expectations (Pantic et al., 2018), possibly addressing a key issue identified when introducing CS education in secondary school (El-Hamamsy et al., 2023c).As the results remain globally positive, they appear promising for both CS and robotics, particularly since interest, self-efficacy and perceived utility are key motivational factors that influence academic performance and career choices.Future studies should therefore i) continue to monitor how these factors evolve and how they relate to students' decision or not to pursue studies in these fields (which in the present educational system, begins at the end of 8th grade) and ii) investigate using qualitative methodologies why certain trends are observed.

Student perception equity with respect to the effect of gender
Gender gaps are present already in grades 3-6 with boys having a more positive perception of the discipline than girls on nearly all criteria, coherently with Master et al. (2021)'s and Sullivan and Bers (2016)'s findings, and despite access to CS-education from grade 1 (study 2).Robotics in particular appears to be subject to the largest gender gaps (study 2, 3).Nonetheless, the perception of CS-role models, and in particular influencers (Wang & Hejazi Moghadam, 2017) such as teachers and parents being perceived as doing CS has a positive influence on the perception of the discipline, but is subject to gender biases (study 2).As access to CS-education contributes to more students perceiving their teachers as doing CS (study 3), and these teachers are mainly women in primary school, the introduction of CS for all in schools may contribute to addressing social perceptions and counter the creation of early gender gaps evoked by Wang and Hejazi Moghadam (2017).The model selection further indicates that the influence of gender on perception varies with access to CS education (impact on genderequity).While for interest and self-efficacy the gender gap appears to be closing for Robotics in CS-schools (positive for gender-equity), the gap is increasing for CS and tablets (negative for gender equity).Different trends along these dimensions indicate that the CS pedagogical activities might need to be adjusted to "provid[e] students with early experiences that signal equally to both girls and boys that they belong and can succeed" (Cheryan et al., 2017) (e.g. by adopting more collaborative settings, Sullivan and Bers, 2016, or introducing social aspects).The findings further indicate that introducing robotics as a means of teaching CS may also contribute to broadening participation in STEM by reducing robotics-related gender biases.This complements a prior study that found that employing robots to teach CS benefited both CS and robotics at the teacher and PD-level (El-Hamamsy et al., 2021a).As such, robotics to teach CS benefits both the PD-, teacher-, and student-levels.Robotics and STEM more broadly could therefore take advantage of ongoing CS-curricular reforms worldwide to broaden participation and engage more students in these fields.

Student perception equity with respect to the effect of school-related self-efficacy
The influence of general self-efficacy varies between CS and non-CS schools (study 3).On the one hand, in CSschools, there is a positive influence of general self-efficacy on tablet-related self-efficacy, which is not the case for non-CS-schools.Once again, this may be due to students having a better awareness of what it means to "do" CS and more creative activities with tablets (Pantic et al., 2018).Indeed, the tablet is a ubiquitous tool in the region which students easily have access to.Their traditional usage of this tool differs from the type of activities that are proposed in the curricular reform which tend to be active and creative.The learning objectives of these tasks push students to adjust their "imagination" around this tool (Flichy, 2001).Concretely, we believe that the students are forced to reconsider the affordances and the potential of this tool, thus reevaluating their own competencies beyond the more traditional use involving playing games, texting, taking photos and watching videos.
On the other hand, CS perception (interest, self-efficacy and utility), and robots perception (self-efficacy, and utility) are positively influenced by general self-efficacy in both CS and non-CS schools, but to a lesser degree when students have received CS education.Contrary to tablets, CS and robotics are novel, with students having little to no access in schools (or at home) where the CS education curricular reform has not yet occurred.Therefore, we believe that the positive influence of general self-efficacy on domain-specific self-efficacy is consistent with Bandura (1986)'s sociocognitive theory on autoevaluation: people's belief in their efficacy to do a task is developed through vicarious experience, i.e. by comparing themselves to others.However, it is also built through mastery experiences: by experiencing CS and roboticsrelated activities, the students are more influenced by their own CS-and robotics-specific experiences, and less by their overall assessment of their capacity to succeed in school.Therefore, given the influence of self-efficacy on students' choices and career decisions in the long term, such experiences may ultimately contribute to broadening participation in the field to a wider range of students, and namely to not only those who believe they are good in school.

Student perception equity with respect to the link between performance and perception
There is no evident link between student performance and perception of the discipline (study 2, positive for equity), such as those found in other studies in middle school (Hinckle et al., 2020); Rachmatullah et al., 2022).However, student performance is related to students' general self-efficacy, with students who consider that they are better at school performing better on the test.This would suggest that there may be a link between students' performance on CT-concepts and other disciplines, irrespective of how students perceive the discipline.This may be indicative that perception is not yet biased by performance and inversely.Nonetheless, given the role that perception (and stereotypes) has been found to play on academic and career decisions (see "Introduction and related work"), it is important to continue to monitor how students' perception evolves over time and establish at which point this may influence their sense of belonging and career decisions.
Globally, the trends observed confirm not only the importance of introducing the discipline in formal education for all, but also the complex interactions that this introduction may have on students' perception.The latter indeed may not necessarily contribute to closing all perception-related gaps but may also exacerbate others.Therefore, in addition to conducting the study with a larger sample to be able to detect smaller effect sizes, it would be important to complement the results of the study with qualitative data to gain better insight into how students perceive the discipline, how this differs, and why, between students with and without access to CS-education

Conclusion
Early exposure to Computer Science (CS) and Computational Thinking (CT) for all is important to broaden participation and promote equity in the field.This is contingent on addressing structural related barriers (lack of access) and social barriers (stereotypes) in order to reduce performance and perception gaps which affect sense of belonging and career decision.Addressing these barriers requires a system-wide implementation of CS & CT curricula for all students starting early foundational years.That is why numerous countries are introducing CS & CT into their curricula starting primary school.The question is therefore: are these curricular reforms contributing to learning and reducing performance gaps?Curricular reforms and professional development programmes are seldom evaluated at the student-level despite the importance of establishing their effectiveness in terms of student learning and perception.Therefore, in the present article, we evaluate the implementation of a regional CS-curricular reform in order to determine if the reform contributes to achieving equity goals.More specifically, we study how the implementation of the CS curriculum by teachers impacts and contributes to equity in terms of student learning (with respect to gender and performance gaps, RQ1) and perception (with respect to gender and self-efficacy gaps, RQ2).To answer these questions, the analysis employs hierarchical linear modelling and structural equation modelling using data from three studies involving, respectively, n 1 = 1384 , n 2 = 2433 and n 3 = 1644 grade 3-6 students (ages 7-11) and their n 1 = 83 , n 2 = 142 and n 3 = 95 teachers.
In terms of student learning impact, the students are progressing over time.There is however no direct link between what the teachers taught (i.e.adopted) over an extended period of time and student learning.Although certain studies have suggested that perception may play a mediating role on performance, this is not the case in the present study.There is however a link between student learning and how teachers perceived the CS-PD program.Teacher perception may thus be acting as a mediating variable or be confounding with other dimensions such as teachers' assimilation of Technological Pedagogical and Content Knowledge (Mishra & Koehler, 2006) obtained during the PD, their appropriation of the content and the depth of the associated change in their practice, supporting the need to gain better insight into how the content is taught.As there are known differences between intended, enacted and attained curricula (van den Akker, 2003), the findings indicate the need to investigate not just whether, but how teaching the discipline, and individual pedagogical content, influences learning.In terms of student learning equity, the findings indicating that (i) the performance gap between lower and higher achieving students are closing and that (ii) pre-existing gender gaps appear to be closing.Whether in terms of impact or equity, it would be important to expand to other dimensions of learning that may be influenced, whether to have a more complete evaluation of CT (by including practices and perspectives, Brennan & Resnick, 2012), or by looking more generally into the impact on learning in other disciplines, or in terms of transversal competences.
Where student perception is concerned, in terms of impact, the results are relatively straightforward: students in both CS and non-CS schools perceive CS and the tools involved with teaching CS positively.Interest in the discipline and perception of robotics is nonetheless more positive in schools with access to CS-education which may contribute to broadening participation in the field.The findings in terms of equity indicate that there are gender gaps which indicate that boys have a better perception of the discipline than girls.However, whether in schools with access to CS education or not, the perception of role models close to them as doing CS contributing to student's positive perception of the discipline.As teachers are mainly women in primary school, introducing CS as a discipline taught by all teachers contributes to teachers being more often perceived as doing CS, and may ultimately contribute to gender-equity.Comparing students in schools with and without access to CS education indicates that there are differences in how the discipline is perceived in both types of schools and that there are interaction effects with gender: ii) initially smaller gender gaps are widening (e.g.CS and tablet interest and self-efficacy, robots utility) while initially higher gender gaps are closing (e.g.robotics interest and self-efficacy, perceiving teachers as doing CS in CS-schools) with access to CS-education.Teaching CS thus has a complex influence on perception which requires investigating more deeply why students perceive the discipline the way they do and how it is influenced by access to CS-education.Monitoring this perception over time is also critical in order to understand how it evolves over time and influences long-term career decisions.
Answering the overarching question "how does the curricular reform impact student-level outcomes and equity in the field?" is therefore not as straightforward as it seems.On the one hand, introducing CS for all in the curriculum and being taught CS has a positive impact and affects equity by: • Promoting student learning and contributing to performance-equity by reducing (i) differences between initially high and low performing students; (ii) the performance gender gap; and (iii) the impact of teacher demographics on student learning.• Contributing to perception gender-equity by reducing the largest gender-related perception gaps (namely those pertaining to robotics).
On the other hand, the curricular reform does not automatically lead to improvements on all fronts.The impact is neither direct, as shown by the student learning results which lack a direct link between what was taught and learning; nor straightforward, as shown by the fact that there is an interaction effect between gender and access to CS education, with initially smaller (or not initially present) gender gaps increasing.
The findings of the study therefore demonstrate that the following elements are important to achieve equity and broadening participation in the field: • Introducing CS for all students starting the first years of formal education.• Preparing the teachers to teach CS, removing the influence of teacher demographic and teacher motivational factors on student learning.• Having activities that signal to girls and boys equally and that are in students' Zone of Proximal Development in order to help all achieve the desired learning objectives.• Investigating the impact of CS curricular reform and PD program implementations at the student level, and including teacher-level insight, all the while considering that the complex dynamics that may be involved in CS-education..002

Fig. 1
Fig. 1 Two cCTt question formats: grid on the left and canvas on the right (figure taken from El-Hamamsy et al. 2022c) use digital technologies (1 = after most of my colleagues, 2 = as the same time as most of my colleagues, 3 = before most of my colleagues, I am an early adopter, 4 = before anybody else, I am an innovator)
In grade 3 there is a small marginally significant gap in the pre-test (grade 3 pre-test boys ∼ girls, = 0.764pts , p = 0.0526 , Cohen's D = 0.161 ) and a small significant gap in post-test (grade 3 posttest boys > girls, = 0.687pts , p = 0.0422 , Cohen's D = 0.139 ) , with the effect sizes indicating that the gap is getting smaller, but has not yet closed.• In grade 4 there are small marginally significant differences in the pre-test (grade 4 pre-test boys ∼ girls, = 0.727pts , p = 0.0624 , Cohen's D = 0.151 ) and no significant differences observed in the posttest (grade 4 post-test boys ∼ girls, = 0.211pts , p = 0.5046 , Cohen's D = 0.046 ) , indicating that the gender gap has closed.

Fig. 2
Fig. 2 Student performance distribution according to grade and whether in the pre-or post-test

Fig. 3
Fig. 3 Student performance distribution according to grade, gender and whether in the pre-or post-test

Fig. 4
Fig.4Student performance distribution according to grade and gender using data from the second study (n = 2226, November 2021).All grade-differences are significant, excepted the one between grades 5 and 6 while the gender-differences per grade are non-significant

Fig. 5
Fig.5Student normalised change distribution according to grade, access to CS-education (left) and the number of CS-activities taught (centre for grade 3, right for grade 4).A two-way ANOVA between the grade and what was taught does not identify any significant differences between groups in terms of access to CS education ( F(2) = 1.05 , p = 0.35 ).A one-way ANOVA per grade did not identify any significant differences according to the number of activities taught (grade 3 F 3 (1) = 0.13 , p 3 = 0.72 ; grade 4 F 4 (1) = 0.89 , p 4 = 0.35)

Fig. 6
Fig. 6 Analogue Visual Scale employed for the student survey's Likert questions.The labels in French (original survey language) were established with teachers and validated in a pilot run with two classrooms

Fig. 9
Fig. 9 Students' perception in schools that had been teaching CS for three years (n = 2433, November 2021)

Fig. 11
Fig. 11 Students' perception of who does CS in schools that had been teaching CS for three years (n = 2433)

Fig. 12
Fig. 12 Perception SEM (n = 2116, November 2021) path diagram with standardised variables for the measurement model that meets the requirements for adequate fit displaying only significant links in the model.Please note that all standardised factor loadings are above 0.3

Appendix B: Appendix for study 2 B. 1 :
Structural equation model for the effect of student-related variables on their perception of the discipline (study 2)Table12ANOVA of student learning data with Benjamini-Hochberg p-value correction and minimum effect size (Cohen's D) that can be detected with the sample

Table 1
Synthesis of the three studies evaluating the impact of the CS-curricular reform at the student-

Table 4
HierarchicalSignificant variables are highlighted in bold.R 2 = 0.279 , AIC = 5386 , BIC = 5474 , RMSE = 3.04 .Random effects σ 2 = 9.72 , τ class = 0.57 , τ school = 1.37 Please note that (i) the classes had an average of 18 ± 2 students per class (minimum 14, maximum 22); (ii) the schools had an average of 8 ± 5 classes (i.e. 8, 8, 1, 17, 6, 7, 8 classes) who participated in the three data collections required for this analysis.These numbers are coherent with the relative sizes of the schools, with the exception of the third where the majority of teachers chose not to participate in the data collection more to equity by ensuring that all students have access to quality CS education, irrespective of the teachers' background (structural barriers).Finally, the findings indicate the existence of gender gaps (study 1a, likely due to social barriers) but that these get smaller the longer students are in contact with CS education (positive for gender-equity).
linear model for student learning with respect to student-, and teacher-level variables (dependent variable: delta between pre-and post-test scores, n = 1027 students in 57 classes in 6 schools)

Table 5
Number of students participating in the first perception survey and the third test (study 2, November 2021) and their intersection with the teacher adoption survey in Appendix B.2

Table 8
Number of participants in the second student perception survey (study 3, May 2021)