Building science through questions in Content and Language Integrated Learning (CLIL) classrooms

Background: The growing population of students that are learning science through a Content and Language Integrated Learning approach (CLIL) has led to concerns about these students’ ability to fully participate in a rich classroom discourse to develop content knowledge. A lack of information about science development through classroom discourse in CLIL settings brought us to focus on the questions teachers ask in upper secondary CLIL biology classrooms. Our aim was to understand how these questions effect science content learning opportunities. A multiple-case study design was implemented to examine and understand the complexities of teacher-student interactions. Data were collected in three case studies, each located in a different school (two in Germany and one in Italy), where CLIL senior secondary science classrooms were observed and audio-recorded. Classroom talk transcripts were analyzed using a mixed methods approach to discourse analysis. Results: Findings suggest that a teacher’s strategic use of questions has the potential to promote both science understanding and science language development. Questioning contingent on students’ answers was observed to both promote content understanding and to lessen the linguistic demand on CLIL students by splitting both reasoning processes and language production into more manageable units. In addition, a higher level of cognitive engagement was present only when students managed to participate in the classroom discourse with answers longer than single utterances. To allow students to actively participate in the classroom discourse, teachers were observed adopting and promoting translanguaging practices, that is, the flexible use of more than one linguistic code. Furthermore, teachers asked language-related questions that promoted both understanding and use of disciplinary language. Conclusions: The questioning practices observed in this study offer both practitioners and researchers ways of understanding issues of content access in upper secondary CLIL science classrooms. We recommend STEM teachers in CLIL settings not to oversimplify the linguistic demand on students, as this leads to an oversimplification of content. To lessen the linguistic barriers, practical strategies are recommended to support both cognitively and linguistically productive questioning in STEM CLIL classrooms.


Introduction
Since its introduction in the 1990s, the pedagogical approach called Content and Language Integrated Learning, or CLIL, has been rapidly spreading all over Europe. CLIL consists of the teaching of any non-linguistic subject through the medium of a foreign language both at primary and/or secondary level education. Originally applauded for promoting foreign language learning and plurilingualism (Marsh et al., 2001;Milán-Maillo & Pladevall-Ballester, 2019), CLIL is also viewed as a catalyst for didactic innovation and transdisciplinary collaboration by many educators (Darvin et al., 2020).
Science is a popular choice of subject to be taught through a CLIL approach (Meyerhöffer & Dreesmann, 2019). Consequently, science content acquisition is likely to be strongly affected by this pedagogical approach. However, current evidence on content acquisition tend to be inconsistent and inconclusive (Bonnet & Dalton-Puffer, 2013;Nawrot-Lis, 2019). Existing studies tend to contradict each other by claiming either positive results (e.g., in Surmont et al., 2016) or negative effects on content learning (see, for instance, Hartmannsgruber, 2014). Although there are studies that attempt to gauge CLIL effects on general science acquisition (see, for instance, Hughes & Madrid, 2020;Virdia, 2020), on specific aspects of STEM education such as the fostering of scientific citizenship (Garzón-Díaz, 2021), or on students' motivation (Vlasenko et al., 2020). However, science understanding, in particular, is a research area that needs to be further explored (Bonnet, 2015;Piesche et al., 2016).
Students learning science are required to understand science concepts but also to develop an ability to actively participate in the classroom discourse (Tang, 2020). Nevertheless, understanding and using the language of science is a difficult task per se . In a CLIL setting, the use of the spoken format is further challenged by the use of a vehicular language and consequent difficulties have been observed to profoundly affect classroom discourse (Wu et al., 2018). For instance, there is evidence that at tertiary level of education the use of a vehicular language leads teachers to ask lower order questions and to prefer monologic interactions (Pun & Macaro, 2019). What is of interest to us is what happens in secondary level science classrooms, where CLIL is widely implemented but under researched in relation to questions teachers ask and, more generally, in relation to the science discourse developed. In particular, the study examines the following research questions: (1) What type of questions do teachers ask in science classrooms when a CLIL approach is adopted?
(2) What type of student cognitive engagement does a teacher's questioning evoke? (3) How do teachers' questions facilitate science learning and science talking in the CLIL classroom?

Theoretical background
This study is underpinned by both a sociocultural approach to learning (Vygotsky, 1978) and Sfard's theory (2008) maintaining that interpersonal communication and cognitive processes are complementary, and that learning is mediated by language (Vygotsky, 1986). This perspective on learning is reflected in the current emphasis placed on the spoken language in STEM education (National Research Council, 2012) and, in particular, in school science curricula and science standards (e.g., in the NGSS, 2013). Accordingly, classroom discourse plays a crucial role in STEM education (Hora & Ferrare, 2013) and specifically in science education (Tang & Danielsson, 2018). This is reflected in the abundant research concerned with science classroom discourse (mostly in monolingual environments) that has been produced during the last decades (e.g., Juuti et al., 2020;Lemke, 1990). Within classroom discourse, interactions triggered by a teacher's questions play a central role. Teacher's use of questioning is well known to dominate the science classroom discourse and to elicit, guide, and shape students' thinking while constructing science knowledge (Chin, 2007;Kawalkar & Vijapurkar, 2013;Sedova et al., 2019). However, participation in science classroom discourse when using a second language is extremely challenging (Lee et al., 2013). It has been observed that CLIL students often do not have the linguistic resources to effectively participate and interact in productive and cognitively engaging classroom discourse (e.g., in Ó Ceallaigh et al., 2017). These observations resonate with Cummins' (1976) threshold hypothesis, which suggests that a certain level of linguistic competence is necessary for academic and cognitive functioning in bilingual contexts. This issue was observed by Coyle (2007) in CLIL classrooms, often characterized by a mismatch between language-level and cognitive-level. Therefore, assuring fair opportunities for students enrolled in CLIL programs to learn science includes making cognitively demanding tasks linguistically accessible (cf. Lemmi et al., 2019). This paper explores how classroom interactions, and questioning in particular, can generate and develop science knowledge and science understanding.

Theoretical frameworks for analyzing questions
Several approaches and theoretical stances have been adopted to investigate teachers' questioning. Social semiotics in particular has contributed to help better understand questioning in classroom discourse (Halliday, 1978). In a sociocultural context, questions are investigated as elements that operate within an interaction frame. One of the most popular classroom interaction frame is that of Sinclair and Coulthard (1975), the exchange composed of Initiation-Response-Feedback or Exchange (IRF/E), later renamed by Lemke (1990) as the "triadic dialogue" (p. 8). In this frame, questions represent the move that initiates the dialogue. After more than four decades since Sinclair and Coulthard's study was published, science education scholars still agree that the triadic dialogue pattern is not only ubiquitous but also dominant (Geelan, 2012). It has been criticized for its inherent lack of authenticity (e.g., in Lemke, 1990), and a perception that the teacher is merely following a "recitation" script with pre-established answers (Cazden, 2001, p. 31). For other authors, and consistent with our views, IRF questions serve the purpose to stimulate, challenge, monitor, and guide students' learning (e.g., in Bansal, 2018).
From a constructivist-cognitive perspective, questioning in science classrooms has been examined for understanding how it helps construct meaning-making (DePierro et al., 2003), supports inquiry teaching (Hiltunen et al., 2020), or how it fosters understanding (Rahayu et al., 2019). Depending on the level of cognitive engagement they elicit, questions can be categorized into two well-known types, lower order thinking (LOT) and higher order thinking (HOT) questions (Bloom et al., 1956). In science education, LOT questions are usually related to descriptions of events, scientific terms, definitions, facts, or statements and do not usually change already existing knowledge structures. By contrast, HOT questions are praised for guiding and scaffolding the building of understanding in the science classroom (Kawalkar & Vijapurkar, 2013;Rahayu et al., 2019) and, ultimately, for fostering deeper thinking (Yip, 2004). HOT questions rely on a rather sophisticated use of language. In this regard, Pun and Macaro (2019) demonstrated that the use of a vehicular language leads teachers to ask LOT questions. A different categorization considers the functions that questions serve: they can be used to initiate and control the classroom dialogue (Sahin et al., 2002), to check retention of or recall previous knowledge (Ernst-Slavit & Pratt, 2017), but also to extend and probe students' thinking (Ong et al., 2016;Soysal, 2020) and to support students in developing their understanding (Rahayu et al., 2019).
When researching primary science classrooms with English language learners, Boyd and Rubin (2006) found that when questions are responsive to students' contributions and builds on or extends from them, it can elicit elaborate interactions. Similarly, Chin (2006) observed that when teachers build on students' earlier contributions, they can "promote productive talk activity" (p. 1343). In science education, this kind of contingent questioning was first studied by van Zee and Minstrell (1997) who dubbed it "reflective toss" (p. 227). In a reflective toss, responsibility for thinking is "thrown" back to the students in the form of a request for clarification, for rationale or for verification. In line with this, Chin (2006) proposed a questioning-based discourse analytical framework to investigate how teachers use questions in the science classroom; Chin argues that teachers do indeed orchestrate classroom discourse and that a responsive approach to students' answers is a key element in promoting higher order thinking skills.
Overall, teachers have a key role to play in scaffolding thinking skills that are conducive to science learning. The questions that teachers ask in order to structure classroom interactions represent an important part of this scaffolding (Rahayu et al., 2019). The kind of questions that are asked, the reasons they are asked for, and the way they are asked affect the structuring of students' thinking for building scientific understanding (Chin, 2007) and students' participation in the classroom discourse (Carlsen, 2010). This study is designed to address these aspects of classroom questioning in CLIL settings, where verbal participation for building content knowledge is a very delicate teaching goal.

Methods
The present study examines questioning within a CLIL science classroom discourse and the potential to generate opportunities for learning science. An instrumental multiple-case case study design was considered as the most appropriate research design to capture the rich and detailed data required to examine the teacher's discursive interactions. Case studies facilitate the investigation of events in which the researcher has little control over (Yin, 2009), such as the events taking place in a classroom setting, in particular when the focus of the investigation is on the process of learning rather than the outcome. Instrumental reflects the selection of exemplar cases to provide insight into the phenomenon of interest (Stake, 2006). The use of a multiple-case study, instead of a single case-study, provides more extensive and compelling explanations of the phenomenon under scrutiny (Merriam, 1997).

Participating cases
Three case studies were selected and examined as the object of this study. Each case study was in a different secondary school: two Gymnasien in Germany (Hamburg, case study 1 (CS1), and Berlin, case study 2 (CS2), and one Liceo in Northern Italy (Trento, case study 3 (CS3). Each case study was populated by a biology teacher (n = 3) and their senior classes (students aged 15 to 17). A total of 10 class groups (n = 175) participated to this study. A brief profile of each case study is provided in Table 1. These three cases were selected using a purposive sampling strategy. Purposive sampling facilitates the selecting of information-rich cases and allows researchers to make the most effective use of limited resources (Patton, 2002). The criteria for selecting the case studies were (a) the accessibility of samples, (b) the level of insight the samples were expected to bring to the study, and (c) a certain level of homogeneity between the cases, meaning that all student participants attended an upper secondary biology course and were taught biology through English throughout the whole school year. Furthermore, the three teachers all had extended experience in teaching biology and some experience in teaching through a CLIL approach.

Data collection
The data collected for this study consisted of naturalistic classroom observations and audio recordings of a total of 34 science class periods. Data were collected in each case study location over a period of 2 weeks. Observations were conducted by one of the authors of this study and included the recording of information about everything relevant to the research questions that could not be captured on an audio track (cf. Mac Mahon, 2014), such as information on gestures, tools, and activities. The observation schedule was semi-structured because it respected a set agenda of intents and issues to look at, but there was not a predetermined list of categories to consider ( Cohen et al., 2011) which allowed the researchers to capture the variability and unpredictability of human interactions. To preserve the naturalistic approach to data collection, the researchers did not influence the observed environment and acted as nonparticipants in the classroom. During whole-class talk, one audio-recorder was positioned in the back of the classroom. The recorder captured all teacher talk and many of the students' comments during whole-class discussions. The parts of the audio-recordings that were relevant to the study were then transcribed and analyzed. The collection of classroom discourse as a source of information was chosen for its relevance in the building of science knowledge in a situated sociocultural perspective of learning (Lemke, 1990;Mercer, 2004).
Finally, it is interesting to note that collecting data in different countries poses ethical challenges. First, the study received full ethical approval from the Research Ethics Committee of the Irish University where the researchers were based. In addition, access to the German and Italian schools were specifically approved by the local departments of education. For this purpose, complementary ethics approvals were requested and obtained in Hamburg, Berlin. and Trento. In addition, all informed consent forms and information sheets were translated into German and Italian for participants to ensure full understanding of the requirements of the project.

Data analysis
The analysis was conducted on a corpus of approximately 34 h of observational data. During the early steps of the analysis, observational data were time-stamped and integrated with the corresponding classroom discourse transcriptions. This operation also allowed us to cross-check these two sources of information. This data was then analyzed using a mixed methods approach to discourse analysis, where an interpretive analysis of transcribed talk was integrated with quantitative analysis (Mercer, 2010). Transcripts were first coded, then extracts of transcripts were integrated in a descriptive and interpretative narrative focused on answering the research questions. Adhering as close to the raw data as possible was a key consideration in the transcription and the coding processes. In addition, the coding process on the same data fragment was repeated at least twice with relative long lapses of time between each coding session to ensure accuracy. Finally, frequencies of categories or codes were used to enhance the understanding of findings (cf. Mercer, 2004). The use of NVivo software supported this part of the process. This quantitative approach to data analysis was particularly useful for comparing occurrences of codes or categories across cases or across the whole data set. Transcripts were coded mainly on utterance level. The reliability of the analysis of data was enhanced by adopting a framework for analyzing classroom discourse interactions (Table 2). This analytical framework was inspired by the questioning-based discourse analytical framework developed by science educator and discourse analyst, Chin (2006Chin ( , 2007. In particular, the deductive coding scheme proposed by Chin (2006) was modified into a more flexible coding approach-partly deductive and partly inductive-and adapted to capture the linguistically sensitive dimension of CLIL classrooms. In particular, a teacher's questions were classified according to their cognitive demand (as theorized by Anderson & Krathwohl, 2001), and pedagogical functions, the type of knowledge they evoked in the students and the oral production they elicited. The general assumption is that different kinds of questions can stimulate the mind differently. In this study, teachers' questions were grouped into three broad categories: (1) lower order thinking (LOT) questions, (2) higher order thinking (HOT) questions, and (3) language-related questions (LQ). LOT questions refer to questions that request students to recall or retrieve information, describe, and recognize (Anderson & Krathwohl, 2001). HOT questions are those questions that ask students to use higher order thinking skills in order to respond, such as reasoning skills, argumentation, use of evidence, critical thinking, and metacognition abilities (Anderson & Krathwohl, 2001).
Student's answers were analyzed and classified using a coding scheme (Shavelson et al., 2003) that captures evidence of the use of three types of knowledge: declarative (knowing that), schematic (knowing why), and strategic (knowing when). These codes were designed to describe what type of knowledge is used by a student for replying to a teacher's question. Both schematic and strategic knowledge rank high in the scale of cognitive engagement. Wells (2010) describes schematic knowledge in science education as "a student's ability to explain and predict natural phenomena, and to use reasoning in their evaluation of scientific claims regarding those phenomena" (p. 201). Strategic knowledge relies on a student's ability to transfer knowledge in solving new problems and is considered to be the highest order learning level among the cognitive demands (Li et al., 2006). The second aspect under scrutiny when analyzing students' answers was oral production, which refers here to the students' ability to express themselves in the spoken format for participating in classroom discourse.
Overall, the integration of the above described methods for analysing classroom discourse allowed us to understand how science learning is interactionally constituted and made visible in CLIL classrooms. Other analysis frameworks specific for CLIL environments were considered, but their implementation would not have allowed us to keep the focus on science understanding (e.g., Dalton-Puffer, 2013).

LOT questions
The frequency and distribution of the categories and codes resulting from the analysis of the teachers' questions and students' answers is presented in Table 3. Throughout the 34 class periods examined, LOT questions were ubiquitous across the case studies and represented an important part of the total number of questions teachers asked for building science learning. These questions were used to check retention, to recall previously stated information, or to go over previous lessons for connecting them to the current discourse. Apart from this type of questions, LOT questions were also asked to recognize parts and to describe objects and processes (here coded as recognizing and describing) or to forge connections between old and new knowledge (prior knowledge questions). In this study, prior knowledge questions were occasional and confined to when new material was introduced.
The last type of LOT questions found in the data set were labeled as guess what teacher thinks, or GWTT questions. With these questions, students are expected to guess what the teacher has in his/her mind (Young, 1992). Young (1992) notes that this approach to questioning is more an "invitation for conformity […] rather than a provocation to the exploration of a question" (p. 102) and that this pattern of questioning has no educational justification, even though it is very common in many classrooms. This category of questions was exclusively found in CS3 (Table 3). These types of questions always evoked declarative knowledge, which confirms Young's (1992) claim that they do not add much value to understanding. In addition, the GWTT questions were always answered either at a one-word-level or chorally. Choral answers keep students engaged and may alleviate pressure as students do not take responsibility for their own personal answers. However, they also hinder the opportunity to learn science dialogically, as it is the teacher that dominates the dialogue.
Furthermore, teachers' questions were cross-checked with the cognitive engagement of the students' answer they prompted across the whole data set (Table 4). It was observed that, generally, LOT questions score low on cognitive engagement, as they nearly always evoked declarative knowledge (cf. Ernst-Slavit & Pratt, 2017;Smart & Marshall, 2013) and their use in the science classroom has been associated to what Li et al. (2006) call "rote-learning" (p. 303).
However, a finer-grained analysis of the use of these questions reveals a more complex reality. In particular, the transcript reported in Table 5 suggests that LOT questions in a CLIL science context have some important learning potential. In this excerpt, the sequence of questions and answers is initiated by a student, Maia. The teacher, in this case Emma (CS2), takes the lead and tosses the question back to her students. The type of questions the teacher asks are either recall or recognizing and describing. The answers the students provide all employ declarative knowledge. First, Markus defines a scientific term as it appears in the textbook (line 481). Solicited by the teacher, Markus is asked to further elaborate. In both Markus' answers, the cognitive demand resonates with what Li et al. (2006) describe as the "rote-learning of declarative knowledge" (p. 303). Markus, however, only answers half the question (the part on alleles). A little later (line 490), the teacher asks Alvin about the other half of the question (i.e., about antigens). Again, Alvin also makes use of declarative knowledge by answering two recall questions (lines 493-495). It is apparent here that the student is merely remembering also because of his use of metadiscourse "you said" (line 493). At lines 496 and 497, the teacher follows up on Alvin's statements by recasting it. Some lines below the teacher goes back to Maia's original question. Evidently, something has happened between line 463 and 510, because this time Maia is able to answer her own question. Even though Maia is basically making use of declarative knowledge, her answer is cognitively rather complex and linguistically rich. Maia sums up all of what has been said by Markus, Alvin, and the teacher by using her own words. For instance, instead of using the academic term gene she says, "information on the DNA". Maia is making use of what Lemke (1990) calls thematic items which she connects with the right semantic relationships. In other words, Maia is learning science. This is an example of how students make sense in real time, rather than reconstructing from previous thinking. Learning science is in this case collaboratively accomplished through the orchestrated interactions between students and teacher. Here "discourse does not only express meaning. Discourse creates meaning" (Mohan & van Naerssen, 1997, p. 2). The something that happened between line 463 and line 510 is indeed sense-making. These data show that, although LOT questions are not strictly cognitively demanding, their use does not necessarily lead to "rote-learning", as Li et al. (2006) would suggest. The results of this study show that "declarative knowledge also falls under the rubric of science education" (Good et al., 1985, p. 140). Recalling and recognizing and describing questions can lead to invaluable processes of sense-making that nurture science learning The matrix that combines types of teacher questions with type of knowledge evoked in student answers (n = 372). In the intercepting cells, frequencies of occurrence are reported. Data are presented across the whole data set because of the very little or even inexistent differences across case studies in a CLIL context. Finally, not only does Emma's skillful sequencing of questions promote the building of science knowledge but it also facilitates science language production. Emma guides her students to use only a small number of key science vocabulary words at a given time, and by doing so she supports them in using and developing a linguistic tool that is still sub-optimal. This aspect is particularly important in a CLIL setting.

HOT questions
In this study, HOT questions were relatively abundant, particularly in CS1 and CS2 (see Table 3). In science education, HOT questions traditionally hold a privileged position, as they are reported to foster deeper conceptual thinking (Yip, 2004). Similarly, a culture of valuing reasoning and building it through a discursive approach is also common in STEM education (Tofel-Grehl & Callahan, 2016). This approach is confirmed by this study, where HOT questions were observed to mostly elicit schematic and, occasionally, strategic knowledge (Table  4). However, in CLIL settings, the limited linguistic proficiency of students can hinder responding to higher order questions (Pun & Macaro, 2019) and in understanding abstract and complex science concepts though the classroom discourse (Yip et al., 2007). Therefore, it is interesting to investigate how at least two teachers of this study managed to utilize classroom discourse in a way that promoted science understanding.
In the transcript reported in Table 6, Milo is asked to solve a problem, for which he needs domain-specific strategies. In this case, this means possessing the knowledge of how genetic inheritance works, being able to represent the current problem with a Punnett square (a genetic tool for representing Mendelian inheritance) and being creative enough to plan a simple strategy. First, Emma provides her students with extra time to discuss in pairs any possible solution (line 217) before sharing it with the rest of the class. This strategy particularly facilitates students when working in a CLIL setting. Additionally, Emma asks Milo to expand on his answer ("And…", line 221), which forces him to make his reasoning visible to all and to use language to argue his point. Table 7 provides another example where students are asked HOT questions to which they do not know the answer yet. In this excerpt, the process of understanding how antibiotic resistance works is both cognitively and socially constructed. To answer the teacher's question, Ida needs to connect prior knowledge relating to how bacteria reproduce, mutation, ecology, and competition. Ida also capitalizes on the (wrong) contributions of Amelie and Sophie and the clues provided by the teacher. As a last remark, it can also be noted that science understanding is promoted by switching to the students' mother tongue (Table 7, lines 114-115). In this extract, Alexandra switches to German to make sure everybody understands her question properly. Such examples provide evidence that HOT questions can be incorporated into the classroom discourse in CLIL settings and suggest practical ways of how this can be achieved.

Language-related questions
In CLIL classrooms, barriers to content access may be conceptual in nature and also linguistic (Kääntä & Kasper, 2018). In response to this fragility, the teachers of this study were observed engaging in a third category of questions, labeled here as language-related questions (LQ). These questions were observed in all three case studies (Table 3). Overall, LQ signal that the teacher is focusing on the language of the discipline, either for explicitly promoting its use and development or for checking students' linguistic comprehension. Accordingly, LQ comprise two types of questions, coded as (1) parlance questions, and (2) checking for lexical understanding. Parlance questions are questions that "prompt the use of genre specific ways of speaking the language of the discipline" (Ernst-Slavit & Mason, 2011, p. 4) and, in this study, always referred to academic words. Examples of these questions in this study are "Do you remember the posh word?" (where "posh" is used here by the teacher as a synonym for "academic") and "What do we call this?" The checking for lexical understanding questions were always introduced by the question "What does it mean?" Table 8 provides an example of these kinds of questions. Interestingly, Lea seems used to detecting academic terms and she answers by explicitly using metalanguage: "in biology it means" (line 25). It appears here that there is a sort of classroom culture about dealing with words that have different meanings depending on the register they are used with. Overall, the main purpose of language-related questions is to lessen linguistic barriers and to help students develop the language of the discipline. It is important to keep in mind that these science teachers are teaching in a setting where language is a limiting factor. Given that the complex nature of science language generates learning challenges for all students , we speculate that such questions could be useful if implemented in any science classroom discourse.

Questions and oral production
Another aspect that was considered in the analysis of questioning was the length of students' answers. Depending on what the teacher was asking, students produced either very short or extended answers. Two categories of students' answers were considered: oneword answers or very brief expressions (word-level answers) or longer-than-a-word answers (not word-level answers).
The oral production of students' answers was analyzed in relation to the types of questions asked. Results are reported in Table 9, where the double figures in each cell indicate how short and long answers were distributed. It is apparent that short answers dominate when LOT questions (prior knowledge and GWTT questions in particular) and language-related questions are asked (except for checking for lexical understanding questions). GWTT questions (only present in CS3) trigger both low-level cognitive engagement and very limited language production. Slightly different are other low order thinking questions, such as recall and recognize and describe questions. These questions have the potential to trigger long answers and sometimes even schematic knowledge. By contrast, HOT questions should usually only be answered through extended replies, as a single word is unlikely to answer a why question. The need to provide articulated and more extended answers raises the linguistic bar of the talk. As a result, in a CLIL science classroom, linguistic barriers can hinder the implementation of this category of questions.
Overall, a high-level cognitive engagement was never observed without adequate oral production. In upper  secondary science classrooms, it is expected that effective learning demands high cognitive engagement, which mirrors the students' cognitive level. This type of cognitive engagement was observed both in CS1 and CS2, but rarely in CS3 (Table 3). In addition, in CS3, students nearly always answered with one-word answers. The ratio long-answers to short-answers is 4.0 in CS1, 3.5 in CS2 and 0.1 in CS3. This approach lowers the linguistic demand on the students. James decided to primarily deal with the students' linguistic barriers by letting his students answer chorally and by limiting the questioning to that which does not require extensive oral production. However, by doing this, James also lowers the cognitive engagement required to answers his questions. What James faces is a typical situation of many CLIL science classrooms: a mismatch between students' cognitive level and students' linguistic level. For instance, in a study on Physical Education taught in Irish through a CLIL approach, Ó Ceallaigh et al., (2017) found that "[l]ess cognitively demanding PE content was taught as a consequence of the lack of students' receptive and productive skills in Irish" (p. 77).

Discussion
Our study contributes to the understanding of how science learning is interactionally constituted and made visible in CLIL classrooms. This is the first study in science education that closely looks at a teacher's questioning as a tool for promoting science understanding in CLIL settings and builds on what Ernst-Slavit and Pratt (2017) achieved by investigating multilingual primary science classrooms.
Our findings illustrate how science understanding is promoted by teacher-led interactions in the CLIL science classroom, which is consistent with findings from mainstream science classrooms (Chin, 2007;Smart & Marshall, 2013;Yip, 2004) but never before identified in CLIL (or, more generally, bilingual) science classrooms. In particular, it was found that higher order thinking (HOT) questions promote both cognitive engagement (schematic and strategic knowledge) and oral production about science, which contributes to the development of science language. Additionally, the collected evidence suggests that high cognitive engagement only takes place when students are engaged in extensive oral production, as the two aspects-cognitive engagement and language use-were inseparable. This finding confirms Vygotsky's theory of thought and language (Vygotsky, 1986). Summing up, HOT questioning creates opportunities for learning science in the CLIL upper secondary classroom. However, HOT questioning only works when students verbally communicate their thoughts beyond the oneword answer.
Furthermore, teachers of this study have been observed to stimulate and support students' participation in classroom discourse by using several strategies. Among these, there was contingently placing questions based on students' utterances. It appears that science learning in a CLIL classroom is dependent on how skillfully the questioning is crafted within the classroom discourse. This resonates with Boyd and Rubin's assertion that "it is not sufficient to look at the structure or type of question. One must inquire how the question functions within the stream of discourse" (Boyd & Rubin, 2006, p. 166). Overall, these data indicate that structured questioning can indeed support high level cognitive learning in a CLIL science classroom.
It is important to note that CLIL students often do not have the linguistic resources to effectively participate and interact in productive and cognitively engaging Guess what the teacher thinks 0-0 0 -0 2 7 -0 2 7 -0 Prior knowledge questions 1-0 0 -2 2 4 -2 2 5 -4

51-12
Total 31-125 17-60 128-11 164-158 The first number in every cell represents the count of short answers (one-word) and the second number the count of long answers (longer than one-word) Tagnin and Ní Ríordáin International Journal of STEM Education (2021) 8:34 Page 10 of 14 classroom discourse or are not properly supported to do so (see also Pun & Macaro, 2019). In our study, this condition occurred in CS3. This may result in limiting students' opportunities for learning science, as predicated on Cummins' (1976) threshold hypothesis in that only above a given language level that the positive benefits of bilingualism can be experienced. When language threatens to limit students' participation in classroom discourse, some teachers and students of this study were observed to switch to using their home language. This approach to discourse is referred to as translanguaging. As a concept, translanguaging indicates that "both languages are used in a dynamic and functionally integrated manner to organize and mediate mental processes in understanding, speaking, literacy, and, not least, learning" (Lewis et al., 2012, p. 655). More than this, translanguaging is also a theoretical framework that helps to explore how diverse bilingual students use language in creative and dynamic ways to communicate (Garcia & Wei, 2014) and, recently, it has been used as both a theoretical and pedagogical lens in science education research examining bilingual students (Suárez, 2020). In this study, adopting translanguaging practices also meant embracing a language-as-resource orientation (Ní Ríordáin, 2018) for supporting verbal communication in CLIL upper secondary science classrooms. This means that, when translanguaging practices become normalized, they facilitate access to the science dialogue. Translanguaging practices have been observed to create a discursive space where students can verbally elaborate their thoughts, dialogically build meanings, and overcome conceptual gaps. Lastly, teachers were observed to also ask languagerelated questions. These questions signal that the teacher is focusing on the language of the discipline and promoting both science understanding and science language development. Language-related questions can lessen the linguistic barriers that limit content access in a CLIL classroom (see Kääntä & Kasper, 2018) and explicitly help students develop the language of the discipline (see also Ernst-Slavit & Pratt, 2017). In particular, parlance questions reflect the teachers' belief that appropriately naming objects and concepts is important for the sake of science. This approach to science vocabulary offers students a better opportunity to acquire key terms, on which disciplinary understanding is then built (Lemke, 1990). Checking for lexical understanding questions demonstrate a teacher's attitude to not take the comprehension of academic language for granted. The presence of these questions highlights how science language is "symbiotic" to science content (Richardson Bruna et al., 2007, p. 50) and provides CLIL students with opportunities for learning science.

Limitations
The authors recognize that this research study has a few limitations. First, the three participating schools were carefully sampled to reasonably represent science learning with a CLIL approach at upper secondary level. As a result, the three case studies investigated included a range of students with similar academic ability and English proficiency. On the one hand, this was a sought and welcomed condition. On the other hand, this condition greatly limits the potential to generalize from the results. Similarly, the generalizability of the findings is also limited by a relatively small number of observed and audiorecorded class periods. However, the researchers consider the findings present a good description of science learning in CLIL settings, a research area that has been under-researched to date.

Conclusions: implications and future research
Overall, the discourse practices observed in this study are significant in many ways. The questioning practices observed in this study offer both practitioners and researchers ways of understanding issues of content access in upper secondary CLIL science classrooms. Second, the discourse practices emphasize the importance of considering classroom discourse when addressing access to science. These findings could be used to inform inservice science teacher professional development programs relating to CLIL implementation and raise awareness of the role of spoken language in science classroom discourse for building conceptual understanding. Additionally, our results can be extended to bilingual settings other than CLIL. Considering how societies are changing because of increasing global mobility, a growing linguistic heterogeneity in mainstream classrooms is expected. Accordingly, the importance of a deeper understanding of how to facilitate rich and productive classroom interactions through effective and inclusive discourse practices extends beyond CLIL settings.
This study confirms Vygotsky's theory of thought and language (Vygotsky, 1986) in that high cognitive engagement only takes place when students verbally communicate their thoughts beyond one-word utterances. This has important practical teaching implications, as oversimplifying the linguistic task of students (e.g., by only requesting one-word answers) leads to over-simplifying the disciplinary content in terms of cognitive engagement, thereby potentially reducing upper secondary students' engagement with science concepts. Therefore, we recommend STEM teachers in CLIL settings to avoid the use of questions that elicit both one-word answers and choral answers. Instead, by asking higher order questions that force students to make their reasoning visible to all students, teachers can facilitate the understanding of disciplinary content and foster opportunities to make sense of concepts in real time.
However, in CLIL settings, language can be a barrier that prevents students from fully participating in such a discourse. To lessen the linguistic barriers, practical strategies are recommended to support both cognitively and linguistically productive questioning in STEM CLIL classrooms, as evidenced in the investigated classrooms (upper secondary level). For instance, STEM teachers may give their students some extra time to think and discuss their answers with a classmate before entering the classroom discourse. Additionally, teachers may find it interesting to develop and systematically implement contingent questioning strategies. In doing so, we recommend the splitting of the thinking process(es) into cognitively more manageable steps prompted by a series of teacher questions that build on students' previous contributions. As a result, the students' verbal production is also split into more manageable chunks of speech, which lessens the linguistic demand on CLIL students. As effective questioning is something that can be learnt through both experience and observation of other experienced teachers, teachers may greatly benefit from observing colleagues using questions to effectively build science knowledge in CLIL settings.
Furthermore, our findings offer suggestions for incorporating linguistic aspects into a teacher's everyday teaching. For instance, STEM teachers could actively prompt their learners to use the language of their discipline by asking dedicated questions, such as the meaning of academic words, or by prompting the use of academic language. Many STEM teachers may not feel at ease at explicitly teaching language. However, this study offers exemplars of how different teachers, with different professional backgrounds, smoothly incorporated linguistic aspects into their teaching without making it sound at odds with the rigor of science content and processes. These exemplars may be used in teacher education programs. Additionally, when using a foreign language becomes too overwhelming for some students, teachers may include students' first language and everyday language in the classroom dialogue without lowering the cognitive demand of the tasks.
As findings from this study on discourse practices are not generalizable because of methodological limitations, further research is needed to confirm the present findings. Future research with larger samples, settled in other locations, and with learners from different agegroups and with different competences in the foreign language is needed. Moreover, this study suggests that, in CLIL settings, cognitively engaging discursive processes (e.g., HOT questions) are facilitated by translanguaging practices. Further research is necessary to investigate this aspect. Finally, this study was focused on whole-class discourse practices. Further research is necessary to examine other pedagogical approaches, such as peer discourse or science writing, employed in CLIL science classrooms. In addition, a longitudinal approach to research design could monitor the development of CLIL students' language, learning skills, and confidence when engaged in science learning.