The effects of using exploratory computerized environments in grades 1 to 8 mathematics: a meta-analysis of research

The process of problem solving is difficult for students; thus, mathematics educators have made multiple attempts to seek ways of making this process more accessible to learners. The purpose of this study was to examine the effect size statistic of utilizing exploratory computerized environments (ECEs) to support the process of word problem solving and exploration in grades 1 to 8 mathematics using meta-analysis. The findings of 24 experimental pretest and posttest studies (24 primary effect sizes) published in peer-reviewed journals between January 1, 2000, and December 31, 2013, revealed that exploratory computerized environments produced a moderate effect size (effect size (ES) = 0.60, SE = 0.03) when compared to traditional methods of instruction. A 95% confidence interval around the overall mean - Clower = 0.53 and Cupper = 0.66 - indicated nonzero population effect and relative precision. A moderator analysis revealed differences among the effects on student achievement between traditional problem solving approaches and ECEs, favoring the latter. The findings highlight the importance of providing students with opportunities to explore applications of mathematics concepts in classroom especially these supported by computers. A discussion of these findings and their potential impact on improving students’ mathematical problem-solving skills, along with implications for further research, follows.


Background
Advancement in the capabilities of varied technologies has meant that problem solving has become a domain of particular interest. While researchers have examined the use and impact of computers on presenting the content of word problems to learners (e.g., Gerofsky 2004), comparatively little research has focused on learners' use of computers to explore the relations between given problem's variables in the attempt to mathematize and solve it. Despite a wide range of interest in improving students' problem-solving skills, the rate of progress in this domain has not been satisfactory (Forster 2006;Kim & Hannafin 2011). It is still common that 'In typical elementary schools worldwide, the teaching of early arithmetic is predominantly focused on computational proficiency' (Greer et al. 2007, p. 97).
High interactivity of contemporary software that allows for dynamically representing problem contents are not utilized fully in mathematics classrooms yet and 'The royal road to the educational use of computers, software and communication technology within mathematics teaching and learning is still to be discovered, if it ever exists' (Laborde & Sträßer 2010, p. 125). Although more elements of explorations, such as measurement and data analysis, have received substantial attention in the newly developed common core standards (Porter et al. 2011), the process of inquiry organization in mathematics classes is not widely researched. Modern technology provides multiple opportunities for applying mathematical structures to quantify system changes (Arthur & Nance 2007;Pead et al. 2007). Moreover 'what technology has done is to return, after an absence of maybe a hundred or two hundred years, return mathematics back into the full spectrum of science' (Pollak 2007, p. 117).
It is hypothesized that by enriching the process of solving word problem by employing phases such as analysis, model formulation, and model verification, applications of mathematics in real-world settings will be more accessible to students. Consequently, the shift from procedural to conceptual teaching methods in mathematics, advocated by research (e.g., Hiebert 2013), might be initiated. Exploratory computerized environments (ECEs), focusing on supporting mathematical explorations and problem solving, share many commonalities with scientific discovery. Although problem solving can integrate scientific methods, this idea remains absent in the current research. As in science classes, digital technology has been proven to help students with problem-solving techniques (e.g., see Reid et al. 2003;Stern et al. 2008); searching for ways to induce congruent ideas in a mathematics classroom appears to be a promising endeavor.

The role of technologies in mathematics teaching and learning
As emphasis in teaching of mathematics is no longer placed on the knowledge of rules and calculation routines, but on building up mathematical competences such as problem-solving, modeling, and concept formation, this shift of emphasis can be significantly promoted by the use of computers (Hußmann 2007). Technology accessed both offline and online, encompasses a large range of devices, such as calculators, laptops, desktop computers, and interactive whiteboards, as well as a substantial range of software applications, such as graphing devices, data editors, spreadsheets, and dynamic geometry. Technology supporting computations and representations (e.g., geometric figures, graphs of functions, or animations) provides interactive tools and makes key relations for mathematical understanding more transparent and tangible. Several studies (Grouws & Cebulla 2000;Wander & Pierce 2009) suggest that students who develop a mathematical conceptual understanding develop the skills of knowledge transfer that enable them to perform successfully on mathematics applications. Furthermore, technologies enable complex computations and dynamic modeling that lead to more experimental forms of teaching and learning mathematics (Joyce et al. 2009;Li & Li 2009;Passey 2012).
Learning using technology posits certain challenges for software programmers. They must consider what needs to be maintained as visible and how to present the key ideas in learning sequence that will engage and challenge the learners (Bos 2009). It is unlikely that technology per se will affect mathematical development in any significant way; instead, what might signify viable effects is how the technology is designed to support learning and how the mathematics tools within the technology connect with the learners. Passey (2012) formulated several domains in which technology can affect teaching and learning of mathematics: (a) devices, ranging from interactive whiteboards, to desktop computers, to iPods and iPhones; (b) learning activities, ranging from multiple-choice test to interactive videos; (c) the pedagogy embedded, ranging from knowledge acquisition, to finding specific values; (d) the learning settings and interactions, ranging from the teacher and learner alone, to learners in a group or community; and (e) the type of assessment measures used to identify the impact of a specific program on students' learning.

Explorations and problem solving in mathematics
This section summarizes the advantages of utilizing technology, focusing on its use to enhance students' competencies in explorations and problem solving. According to the National Council of Teachers of Mathematics (National Council of Teachers of Mathematics NCTM 2000), 'Technology is essential in learning mathematics ' (p. 3). Applying technology to enhance students' problem-solving skills is an ultimate area of interest. A problem's setup and information component expressed in word format are often difficult for students to comprehend, analyze, and solve (Ngu et al. 2014). Such presented problems also have a low motivational factor, which consequently affects the degree to which a learner engages in the solution process. The advancement of multimedia technology has opened up new possibilities for dynamically expressing a problem's contents and extending its analysis. The process can now be externalized and amplified through digital constructions, showing more explicit properties and structures that were previously silent. Several researchers (e.g., Chen 2010;Merrill & Gilbert 2008) have found that students' wordproblem-solving skills can be significantly enhanced through the integration of computer technologies.

Explorations
Exploration is defined as the act of searching with the purpose of discovery of information or resources (Kuhn 2007). Explorations give students the opportunity to appreciate applications of mathematical tools in real-life situations (Remillard & Bryans 2004). Explorations can take many forms, ranging from analyzing observed phenomena to undertaking more abstract, open-ended investigations. English and Watters (2005) found that young children are capable of exploring situations beyond those involving simple processes of counts and measures. Furthermore, researchers (e.g., English 2004;Lai & White 2012) have recommended that children receive more exposure to situations where they explore informal notions or where they quantify information, transform quantities, and deal with quantities that cannot be observed. Flum and Kaplan (2006) claimed that explorations engage the learner with the environment through definite actions of gathering and investigating information. By inducing the use of terms that are central to scientific inquiry, like observe, identify, and analyze (Slough & Rupley 2010), explorations promote the transfer of knowledge, problem-solving skills, and scientific reasoning (Kuhn 2007). Schwarz and White (2005) advocated that learning about the nature of scientific models and engaging learners in the process of creating and testing models should be a central focus of science education. It is hypothesized that by enriching mathematics curriculum via elements of such inquiry, students' problem-solving skills can be strengthened. These shifts, however, posit certain challenges. At the elementary school level, manipulatives or their interactive replicates have been extensively used to help build conceptual understanding of abstract ideas (Jitendra et al., 2007) and research (Kieran & Hillel 1990;Reimer & Moyer 2005) has proven their positive impact on students' mathematics achievement. Since manipulatives are restricted to geometrical objects, their exploratory character is limited compared to explorations, which provide a far richer context for inducing and practicing more sophisticated mathematical ideas, for instance, the concept of rate of change. Viewed through this lens, interactive exploratory learning environments dominate the previously applied drill-andpractice computer applications, and their use in mathematics has gained momentum over the past decades (Neves et al. 2011). The process of explorations usually concludes with a formulation of a mathematical model. As such, multifaceted cognitive goals are achieved by learners while they undertake such activities. Bleich et al. (2006) concluded that such activities expand students' views of mathematics by integrating mathematics with other disciplines, especially sciences, and engage students in the process of mathematization of real phenomena.

Word problems and problem solving
Situations carrying open questions that challenge learners intellectually (Blum & Niss 1991) are called word problems or story problems. The general structure of word problems is centered on three components: (a) a setup component, which provides the content (for instance, the place or story problem); (b) an information component, which provides data from which to derive a mathematical model; and (c) a question component, which is the main task directed to the solver (Gerofsky 2004). A setup component of a word problem can be externalized by a static diagram, short video, computer simulation, or physical demonstration. With the exception of static diagrams, all of these means, though not yet commonly used in mathematics classes (Kim & Hannafin 2011), assist with the visualization of problem scenarios and thus help with identifying patterns and formulating their symbolic description. Word problem solving is one area of mathematics that is particularly difficult because it requires students to analyze content, transfer it into mathematical representations, and map it into mathematical structures. Therefore, it requires not only a retrieval of a particular problemsolving model from learners' long-term memory but also the need to create a novel solution (Zheng et al. 2011).
While word problems are often considered closed tasks that usually involve simplistic responses, problem solving and explorations gravitate toward mathematical modeling that require students to analyze a given situation, build model, and verify the model before applying it. A major contribution to the field of problem solving was made by Polya (1957), who codified four stages of the process: understanding the problem, devising a plan, carrying out the plan, and looking back. Bransford and Stein (1984) extended Polya's approach by developing a five-stage problem-solving model that encompassed identifying the problem, defining goals, exploring possible stages, anticipating outcomes, and looking back and learning. Among these phases, the phase of exploration, which leads the solver to a model formulation and validation, is of the highest importance (Arthur & Nance 2007). Once the model is validated, it can be used for forecasts, decisions, or actions determined by the problemquestion component. Francisco and Maher (2005) suggested that the stage of exploration or modeling must exist in the problem-solving process for authentic mathematical problem solving to occur. A similar conclusion was previously reached by Gravemeijer and Doorman (1999), who claimed that 'the role of context problems and of symbolizing and modeling are tightly interwoven' (p. 112). The forms of the mathematical models depend on the problem content. At the elementary and middle school levels, they are often externalized by geometrical objects, ratios, and proportions (National Council of Teachers of Mathematics NCTM 2000). Over the past 30 years, the domain on teaching and learning mathematics applications has undergone modifications reflecting research advancements in the area, one of which is a change in the instructional approach to problem solving: from teaching problem solving, to teaching via problem solving (Lester et al. 1994). Some of the main elements of teaching via problem solving include (a) providing students with enough information to let them establish the background of the problem, (b) encouraging students to make generalizations about the rules or concepts, and (c) reducing teachers' role to providing guidance during the solution process (Evan & Lappan 1994). Yimer and Ellerton (2009) proposed an inclusion of a prelude phase, called engagement, whose role is to increase students' motivation and, consequently, their success rate. According to Kim & Hannafin (2011), these stages represent integral elements of contemporary problem-solving methods.
While technologies' engaging factors in improving student motivation have been often researched (e.g., see Lewis et al. 1998;Niss et al. 2007), their interactive features that enable the learner to hypothesize, make predictions, and verify those predictions have not yet been meta-analyzed at the elementary and middle school levels. This study sought to examine these areas and identify moderators that contribute to increasing effects on students' learning. As a result of this undertaking, we hope to formulate suggestions for a learning environment that will advance students' analytic skills and consequently improve the use of technological tools as a means of explorations in mathematics classes.

Synthesis of findings of prior meta-analytic research
The study of problem-solving methods in the domain of mathematics education has been frequently undertaken by researchers and has especially influenced mathematical practices during the past 30 years (Santos-Trigo 2007). Tall (1986) provided an insightful analysis on how computers can be used for testing mathematical concepts, claiming that 'computer programs can show not only examples of concepts, but also, through dynamic actions, they can show examples of mathematical processes' (p. 5). He questioned the formal approaches to mathematical representations used in textbooks, calling them inaccessible to students, and suggested instead using computer programs to visualize the dynamics of the processes.
Computer programs used to support problem solving were one of the moderators in a meta-analysis on methods of instructional improvement in algebra undertaken by Rakes et al. (2010). Using 82 relevant studies from 1968 through 2008, these researchers extracted five categories, of which two contained technology and computers as a medium supporting instruction and learning. Contrasting procedural and conceptual understanding of mathematics ideas, these scholars found that conceptual understanding as a separate construct, appearing initially in research in 1985, produced the highest effect size when enhanced by computer programs. The timeline of this finding corresponded with the emergence of mathematical explorations, which also exemplify mathematics conceptual understanding. In addition, Rakes et al. (2010) found that technology tools including calculators, computer programs, and java applets produced a moderate 0.30 effect size when compared to traditional methods of instruction. Another systematic review of computer technology use and its effects on K-12 students' learning in mathematics classes between 1990 and 2006 was undertaken by Li and Ma (2010). Analyzing the effects of tutorials, communication media, exploratory environments, tools, and programming language, they concluded that exploratory environments produced the highest (ES = 1.32) learning effect size. Li and Ma did not compute the effects of computer technology on mathematics cognitive domains and type of learning objectives; instead they suggested the need for another review focusing on 'the nature of the use of technology' (p. 235) on student achievement. Yet problems of implementation of pieces of (educational) software, learning environments, and use of communication technology are far from being solved (Laborde & Sträßer 2010), and many of the problems relate to improvement of students' problem-solving skills by the use of educational software. Artzt and Armour-Thomas (1992) reported that students' difficulties with problem solving are often attributed to their failure to initiate active monitoring and regulation of their own cognitive processes. Though several potential ways of improving students' initiation of active monitoring have already been researched (e.g., see Grouws & Cebulla 2000;Kapa 2007), in this study, we sought to uncover moderators that had been silent in the previous research. We were especially interested in learning whether extending the exploration stage of the solution process and guiding students through the phases of scientific inquiry could materialize as a construct worthy of investigation. The effect of such organized support might reduce the working memory needs and consequently free students from being overwhelmed at the start. Hart (1996) reported that students find word problems difficult because they lack motivation; thus, presenting word problems in an engaging format might increase learners' motivation factor and drive them to solve the problems. Furthermore, providing some guidance during the solution process might improve their productivity and decision-making (Stillman & Galbraith 1998). However, Blum and Niss (1991) cautioned that providing guidance in the form of ready-made software in applied problem solving may put an unintentional emphasis on routine and recipe-like procedures that neglect essential phases, such as critically analyzing and comparing models. Thus, closely examining how this concern is resolved in newly developed mathematics software was an additional focus of this meta-analysis.
Prior literature has provided many insightful conclusions about the effectiveness of exploratory computer programs on mathematics students' achievement. However, it has also led to many questions on how the content delivery methods or problem-solving settings presented by computer programs will yield the highest learning effect sizes.

Methods
A literature review can take several venues, for example: narrative, quantitative, or meta-analytic. This study took the form of the latter, using the systematic approach proposed by Glass (1976), called meta-analysis, which can further be described as an analysis of the analyses. A statistical meta-analysis integrates empirical studies, investigating the same outcome described as a mean effect size statistic. Meta-analytic techniques were selected for this study because they provide tools to assess effect size considering a pool of studies as a set of outputs collected within prescribed criteria. There are two main advantages of such investigations: (a) a large number of studies that vary substantially can be integrated, and (b) the integration is not influenced by the interpretation or use of the findings by the reviewers (Gijbels et al. 2005). Meta-analysis allows also for conducting a subgroup moderator analysis that provides tools of identifying factors that affect the magnitude of mean effect. A subsequent moderator analysis is anticipated to be employed in this study to answer additional research questions.

Key term descriptions Treatment/instrumentation
The treatment for the study was defined as an exploratory environment that was digitally delivered and displayed on a computer screen or iPod that students used to formulate and mathematize patterns or solve problems. An exploratory learning can be defined as a medium that engages the learner with the environment through the definite actions of gathering and investigating information (Flum & Kaplan 2006) and formulating a general pattern or finding a unique solution. The treatment can include specific software, such as Frizbi Mathematics 4, SimCalcMathWorlds, NeoGeo, or Dynamic Geometry Environment (DGE). For the purpose of identifying which type of treatment produces higher effect sizes, treatments were further classified as being focused on either explorations or problem solving.
Explorations The purpose of explorations is to have students experiment with models and search for underlying structures. An example of such would be having students investigate the properties of polygons through the underlying principles of congruency and similarity.
Problem solving The main component of problem solving is asking the learner to find a specific numerical solution (Gerofsky 2004). Problem solving in this metaanalysis encompassed process associated with solving word problems, story problems, or statement problems that involve developing mathematical concepts and solving mathematical equations to find a specific numerical value.

Outcome variable of the research
The outcome variable whose overall effect size was sought in this meta-analysis was student achievement, defined as scores on solving various mathematical tasks or problems embedded in various mathematical structures, such as equations, ratios, proportions, and formulas, and measured by students' performance on standardized or researcher-or teacher-developed tests expressed numerically as a ratio or percent. Student achievement scores were further expressed as effect size computed using mean posttest scores of experimental and control groups and coupled standard deviation using Hedges (1992) formula: x 1 represents the posttest mean score of the treatment group x 2 represents the posttest mean score of the control group s * represents pooled standard deviation.

Research questions
The research questions reflected the study purpose and they were dichotomized into two groups: main and supplementary. The main research question was the following: What is the magnitude and direction of the effect size of using computerized exploratory environments to support the process of problem solving and explorations when compared to conventional learning methods?
As the main question led to computing the overall effect size of using ECEs, additional questions were formulated to enrich the study objective: Are the effect sizes of student achievement dependent on grade level? Are the effect sizes of student achievement different when problem solving is contrasted with exploration? Are the effect sizes of student achievement dependent on mathematics content domain? How does the type of instructional support (teacher guided or computer based) affect student achievement when computers are used?
While the answer to the main question was assessed via interpretation of the magnitude and direction of the computed mean effect size statistic, the answers to the additional research questions were based on applied subgroup moderator analysis and interpretation of the results.

Data collection criteria and procedures
Several criteria for literature inclusion in this study were established before the search was initiated. Despite the fact that computer programs as a medium supporting learning were introduced into education several decades ago (Joyce et al. 2009), a rapid increase in this field occurred around the year 2000, which was selected as the initial timeframe for the search. Thus, this synthesis intended to analyze and summarize the research published between January 1, 2000, and December 31, 2013, on using computerized programs to support student explorations in elementary and middle school mathematics classes in either public or private schools. The minimum sample size established in this meta-analysis was ten participants. The study included only experimental research that provided pretest-posttest mean scores, standard deviation (SD), F-ratios, t-statistics, or other quantifications necessary to compute the mean effect size. As treatment groups used computerized exploratory environments, the only control groups considered were those provided with traditional instruction, meaning use of traditional teacher-centered methods, where the students are given problems to work and when they seek help when needed as described by Pilli and Aksu (2013). This reduced the confounding of effects due to hybrid treatments.
Publication bias that is a threat to any research attempting to use the published literature (Hedges 1992) was addressed by creating funnel plot for the accumulated studies and by applying Rosenthal fail-safe N test and computing the fail-safe number (Rosenthal 1979). The test addressing so-called file drawer problem estimates the number of unpublished studies required to refute significant meta-analytic means.
Studies investigating the effects of applying exploratory computerized environments and satisfying the above conditions were identified through a search of databases of ERIC, Educational Full Text (Wilson), Professional Development Collection, ProQuest Educational Journals, as well as Science Direct, and Google Scholar. The search encompassed studies conducted globally but published in English language. Due to anticipated high range of variation of sampling methods and study-level variance that produce additional source of random influence (Cooper 2010), a random-effect model is anticipated to be used to calculate the mean effect size.
Key terms were selected by the authors from the literature pertaining this study's theoretical background and prior research. In the process of extracting the relevant literature, the following queries were used: The strings were arranged in a way that allowed maximizing the search engine capabilities. Thus, for example, explorations were disjoined from problem solving but both were combined with student achievement and various grade levels. Respectively, simulations and exploratory environments were joined with mathematics. This search returned 238 articles, out of which 14 satisfied the criteria discussed above.
In order to expand the pool, a further search, including PsycINFO and PsychARTICLES, was undertaken with broader conceptual definitions including synonyms. For instance, [('dynamic investigations' OR 'techniques of problem solving', OR 'computerized animations') AND ('learning' OR 'student achievement')]. These modifications, which allowed for the adjustment of the contexts and strengthening of the relevance of the literature (Cooper 2010), returned 107 studies. The additional search extracted a number of studies that, although very informative (e.g., Chen & Liu 2007;Eysink et al. 2009;Harter & Ku 2008), could not be included in this metaanalysis because computers were used in both the control and experimental groups. After further scrutiny, the pool was enhanced by 11 additional studies. Combing all search, the pool contained 25 primary studies and 25 corresponding effect sizes.
The adherence of the pool to established research criteria was supported by a double scrutiny process at the initial and the concluding stages of the selection process. Any discrepancies were resolved.

Coding features
The coding process was conducted in a two-phase mode reflecting the two-stage analysis. During the first stage, general characteristics of the studies, such as research authors, sample sizes, study dates, research design type, and pretest-posttest scores, were extracted to describe the study features. During the second phase, additional scrutiny took place to more accurately reflect on the stated research questions and seek moderators that might influence the strength of the effect sizes. The majority of the coding features, including study authors, study publication date, locale, and research design type, were extracted to support the study validity. The formulation of other coding, including grade level, instrumentation, and learning type, was enacted to apply moderator analysis that would lead to answering the supplemental research questions.

Descriptive parameters
Descriptive parameters encompassed the following: the grade level of the group under investigation, the locale where the study was conducted, the sample size representing the number of participants in experimental and control groups, the date of the study publication, and the time span of the research expressed in a common week metric.

Inferential parameters
Posttest mean scores of experimental and control groups and their corresponding standard deviations were extracted to compute study effect sizes. If these were not provided, F-ratios or t-statistics were recorded. Although most of the studies reported more than one effect size, for example, Kong (2007) and Guven (2012), who also reported on students' change of attitude toward computers, this study focused only on student achievement, thus reporting one effect size per study.

The research authors
A complete list of authors whose studies were selected was compiled in Table 1. As the analysis of the study progressed, each research study was labeled only by the first author and the year of research conduct.

Publication bias
All studies included in this meta-analysis were peer reviewed and published as journal articles; thus, no additional category in the summaries was created to distinguish the publication mode of the studies. We also examined the authorship and author group membership as a consideration of a possible publication bias in certain studies being over-represented, yet no publication bias was found. Publication bias was quantified and justified through using Rosenthal fail-safe N test and by creating and examining a funnel plot.

Group assignment
This categorization was supported by the way the research participants were assigned to treatment and control groups, as defined by Shadish et al. (2002). During the coding process, two main categories emerged: (a) randomized, where the participants were randomly selected and assigned to the treatment or control group, and (b) quasi-experimental, where the participants were assigned by the researchers.

Type of research design
Only experimental studies that provided pretest-posttest means or other statistic parameters representing the means were utilized in this study.

Type of instructional support
Two subcategories were identified to classify and evaluate the effects of the type of instructional support: (a) teacher-guided support, where the teacher served as a source of providing support during student explorations or problem solving, or (b) computer-based support, where the primary source of support was provided by the software and was available on the computer screen.
In both of these settings, the medium of learning was digitally delivered by the computer.

Length of treatment
Three main categories were established for this moderator: short -2 weeks or less; intermediate -between 2 and 5 weeks inclusively; and long -more than 5 weeks. We were initially interested in examining the magnitude of the exploratory learning environments on improving students' problem-solving skills, specifically focusing on analyzing the effects of scientific inquiry; however, we encountered limited research findings for extracting such features. Thus, the effect of scientific empirical methods on building theoretical mathematical models could not be investigated to the scale it was intended. The objective of the research was then augmented to focus on comparing traditional methods of teaching problem solving and explorations to ones using digital technology as a medium for such.

Homogeneity verification and summary of data characteristics
The data analysis in this study was initially performed using SPSS 21 with verification of homogeneity of the study pool as suggested by Hedges (1992). A standardized mean difference effect size was calculated using posttest means on experimental and control groups. The individual effect sizes were then weighted, indicated by ES in this study, and an overall weighted mean effect size of the study pool was calculated. In studies including multiple independent subgroups (e.g., separate data is presented for girls and boys in the experimental and control condition), first summary statistics for both conditions was recreated, and then this data was used to calculate the effect size. If different learning methods were assessed within the same study, the effects were not combined but instead the type of learning method that best matched the study's goals and research question was selected.
The homogeneity statistics (Q T = 117.78, with d f = 23, p < 0.01) showed that the set of effect sizes varied statistically significantly. This finding further supports the adoption of a random-effect model for the data analysis. If the Q T had not been statistically significant, a fixedeffect model would have been adopted for the analysis. In order to detect potential bias due to underrepresentation of studies with small subject samples, a funnel plot was generated (see Figure 1). The funnel plot visualizes the position of the individual effect sizes, the mean effect size, as well as the confidence intervals for each study around the computed mean of the pool with 95th percentile confidence interval around the mean. Due to a high range of studies populations, the magnitudes of the populations were converted to a logarithmic scale.
The visual inspection of the funnel plot (see Figure 1) shows that the results from smaller sample size studies are more widely spread around the mean than the studies with larger sample size studies which according to Figure 1 Funnel plot for the data. Rothstein et al. (2006) minimizes the existence of publication bias in this meta-analysis. The funnel plot also showed some of the means located outside of the area of the funnel graph (see Figure 1), indicating a lack of homogeneity of distributions within the pool, which was also depicted by the significant p value (p < 0.01). As the main purpose of a meta-analysis is to compute overall effect size (Willson 1983), this deficiency did not undermine the validity of the calculated mean effect; rather, it explicated the characteristics of the studies, revealing that some of them, or their linear combinations, came from different distributions. For instance, Figure 1 illustrates three labeled studies -one by Kong (2007), labeled as 1; another by Erbas and Yenmez (2011), labeled as 2; and a third by Huang et al. (2012), labeled as 3 -whose means fell outside of the funnel graph. While Kong (2007) investigated the effects of digitally presented explorations on fouth grader understanding of fraction operations, Erbas and Yenmez (2011) investigated the effects of digitally presented explorations on sixth grader geometry concept understanding, and Huang et al. (2012) investigated the effects of digital explorations on second grader problem solving skills. These studies do not have specific common features though. Due to their complex study designs and valuable research findings, all of them were included in this meta-analysis, and thus all contributed through their weighted effect sizes to the overall effect size. Table 1 summarizes the extracted general characteristics of the studies. The studies were further aggregated into classes to reflect the objectives of the research questions.
The calculated confidence intervals (CIs) for each study effect size (ES) consist of a range of values that act as good estimates of the ES of an unknown population. The selected 95% level indicates the probability that the CI captures this true population ES given a distribution of sample. It is to note that an ES of 17 of the studies (68%) placed within 95% CI, indicating their high precision. The duration of experimental treatment usually lasted for one unit lesson (45 min) with an average application frequency of twice a week. The majority of the studies (14, or 56%) were conducted quasi-experimentally, while the remaining ten (44%) were randomized. The study duration was expressed in a common (weeks) metric scale, although some of the studies reported the duration in months or by semesters. The highest sample size of 1,621 students was reported for a study conducted by Roschelle et al. (2010), and the lowest sample size of 12 participants was reported by Lai and White (2012). While analyzing the studies from a grade-level point of view, students whose primary level was grade 4 (28%) dominated the pool. Since this study focused on gathering research on exploratory environments provided by computer programs or the Internet, the examined studies were aggregated by their focus on supporting either problem solving or explorations in mathematics. For example, the study conducted by Lai and White (2012) was classified as an exploration because students explored the space and constructed shapes without constraints, and then provided their definitions for the various quadrilaterals. Thus, the explorations led them to formulate general patterns and descriptions. Similarly, a study by Kong (2007) was also classified as an exploration because it used a general partitioning model to have students explore the subject of common fractions and fraction operations. In contrast, problem-solving studies, such as the one conducted by Kapa (2007), followed the traditional four stages of problem solving: (1) understanding the problem, (2) making a plan, (3) executing the plan and thus finding a unique solution, and (4) reviewing the solution. The study-highlighted characteristics were further aggregated into subgroups (see Table 2) and corresponding effect sizes were computed.

Descriptive analysis
The analysis of the data was organized deductively. It began with a synthesis of the general features of the studies, furnished by a descriptive analysis, and then moved to an examination of the differences of the effect sizes mediated by the type of instrumentation, cognitive domain, study duration, grade level, and content domain.
The research pool encompassed 4,256 elementary and middle school students. The average sample size was 112 participants. Applied descriptive analysis provided information about the frequencies of the studies per year (see Figure 2) and the locale distribution where the studies were conducted (see Figure 3).
The majority of the studies (17, or 71%) were conducted within the past 5 years, which indicates a growing interest is using ECEs to support the learning of mathematics. In terms of research locale, Taiwan dominated the pool with seven studies (29%), followed by the United States with five studies (21%). The distributions show that applying and investigating the effects of ECEs in mathematics classrooms has accumulated a global interest.

Inferential analysis
Quantitative inferential analysis was performed on the primary studies to find individual weighted effect size and the mean weighted effect size of the study pool. The mean effect size for the 24 primary studies (24 effect sizes) was reported to have a magnitude of 0.60 (SE = 0.03) and a positive direction, which according to Lipsey and Wilson (2001) can be classified as of a medium size. A 95% confidence interval around the overall mean -C lower = 0.53 and C upper = 0.66 -supported its statistical significance and its relative precision as defined by Hunter and Schmidt (1990). When applied to school practice, it indicated that the score of an average student in the experimental  groups, who learned using ECEs, was 0.60 of standard deviation above the score of an average student in the control groups, who was taught using traditional methods of instruction. In order to quantify publication bias, Rosenthal's fail-safe procedure was used. The test has showed that an additional 480 unpublished null-effect studies would be required to bring the p level beyond the 0.05 threshold of significance. Further calculations show that only 130 unpublished research papers are needed for this study to nullify the mean of 0.6 below 0.05 level.
Combining the results of both tests, the inspection of the funnel plot (see Figure 1) and procedure of fail-safe, we claim that the publication bias has minimal effects of the mean effect size calculations in this study. Further examination of the computed effect size and incorporation of the U 3 effect size matrix (Cooper 2010) led to the conclusion that the average pupil who learned mathematical structures using exploratory environments scored higher on unit tests than 70% of students who learned the same concepts using traditional textbook materials. It can thus be deduced that using exploratory environments as a medium of support in the teaching of mathematics has a significant impact on students' mathematics concept understanding when compared to conventional methods of teaching. Table 2 provides a summary of the individual effect sizes of the meta-analyzed studies along with their confidence intervals. The table also contains qualitative research findings, the computer programs used as the instruments, and the reliability of measures used to compute the individual mean scores, expressed by indicating whether the test was researcher developed or standardized. Where it was available, Cronbach's alpha (α) was also listed, along with additional information provided by the primary researchers that distinguish the given study within the pool. The majority of the studies (16, or 84%) used researcher-or teacher-developed evaluation instruments, and only one (Pilli & Aksu 2013) reported a Cronbach's α coefficient of reliability measure. In addition, the majority of the studies (17, or 89%) reported positive effect sizes when an exploratory environment was used as a medium of learning. Only two studies -one conducted by Van Loon-Hillen et al. (2012) and one conducted by Kong (2007) -reported negative effect sizes favoring traditional instruction, illustrating that exploratory environments cannot replace good teaching and that some concepts, like operations on fractions (Kong 2007), require the instructor to deliver the concept and its stages and to suggest ways of overcoming obstacles that students may face. Exploratory environments seemed to produce high effect sizes in cases where students applied alreadylearned mathematics concepts in new situations (e.g., Chang et al., 2006;Guven, 2012;Roschelle et al., 2010) but not when students simultaneously explored new concepts and applied them. The highest effect size on explorations was reported by Erbas and Yenmez (2011;ES = 2.36), who examined the effect of open-ended geometry investigations, and the highest effect size on problem solving was noted by Huang and colleagues (2012;ES = 3.27), who investigated the effect of embedded support during the process of problem solving. Although an influx of onscreen instructional support might work well in many classroom settings, we believe that the elements of mathematical explorations induced in the study by Erbas and Yenmez (2011) more accurately supported the objectives of this study.

Possible moderators and analysis of their effects
Just as a mean effect size provides certain evidence for potential duplications of the study findings, subgroup analysis allows for uncovering moderators that optimize the effect. Since student mathematical achievement was the main construct under investigation, during the process of moderator formulation, attention was directed toward formulating and extracting the study features that could affect achievement when ECEs were utilized. We anticipated that through identifying such features, an optimum learning environment would emerge. By analyzing the settings of the primary research, a set of five moderators was identified: grade level, instrumentation, treatment duration, content domain, and type of learning setting. This categorization resulted in 12 subgroups whose effects were individually computed. The moderator analyses were performed for random-effects model because this metaanalysis sought also to reveal the relative effectiveness of ECE due to formulated moderators, and hence it did not assume that the effect size will be the same for all studies. The computations of effect sizes followed Cooper (2010), who suggested giving more weight to effect sizes with larger sample populations (w = inverse of the variance in the effect calculations). Along with calculating subgroup effects, corresponding confidence intervals and standard errors were also computed. Table 3 displays the effect sizes according to the formulated moderators and the subgroups. In order to provide a common metric for the subgroup effect magnitude comparisons, the effect sizes were weighted by the sample sizes. Although we realized that the effects of using ECEs might be strongly influenced by the degree of interactivity of the educational programs used and the applied scaffolding necessary to have students assimilate tasks presented in contexts, such extractions were not feasible due to the limited information provided in this regard by the primary researchers.
All of the magnitudes of the calculated effect sizes place within their confidence intervals, which proves the significance of the effect sizes and their relative precision (Hunter & Schmidt, 1990). Furthermore, considering, for example, teacher-guided support during explorations (ES = 0.75), one can conclude that the practitioners using such an approach can be 95% confident that the effect size of students' achievement will be 60% to 93% higher than when compared to traditional level of instruction. The categorization into subgroups and the descriptive analysis provided a more insightful picture about the effects of ECEs on the achievement of students in grades 1 to 8 mathematics classes and helped answer the research questions of this study, as discussed next.
Are the effect sizes of student achievement dependent on grade level?
A block of grade level was created to answer this question. Following NCTM (2000), three subgroup levels were formulated: lower elementary, which included grades 1 to 3; upper elementary, which included grades 4 and 5; and middle school, which encompassed grades 6 to 8. The computed effect size showed differences across grade levels, with middle school producing the highest effect size (ES = 0.65), which according to Lipsey and Wilson (2001) can be classified as moderate followed by lower elementary school (ES = 0.61) and upper elementary (ES = 0.41). It is inferred that this result can be attributed to the fact that students at the middle school level often use manipulatives to support their mathematics concept understanding (e.g., see Jitendra et al., 2007); thus, these students' transition to ECEs occurs more spontaneously, resulting in the highest score gain. The effect sizes in the other grades also showed a moderate magnitude.
Are the effect sizes of student achievement different when problem solving is compared to exploration?
The moderator category of instrumentation was used to conclude whether ECEs affect student achievement differently through supporting problem solving or exploration. As explorations have often led students to pattern formulations (e.g., see Panaoura, 2012;Suh & Moyer-Packenham, 2007), problem solving was usually constructed within defined stages, leading students toward finding numerical answers or unique solutions to the stated problems (e.g., see Chen & Liu, 2007;Hwang & Hu, 2013). When contrasted with problem solving, learning supported by explorations produced a higher effect size of ES = 0.62 (as opposed to ES = 0.54 for problem solving). This finding generated several conjectures. As the process of explorations resonates better with students' natural curiosity (Stokoe, 2012) and their prior experiences, working on explorations might ignite a higher student's motivation level, thus their higher achievement. Despite the fact that efforts to help students understand the solution process are multidimensional, ranging from creating schemas (Kapa, 2007) to inducing personalization (Chen & Liu,  2007; Ku & Sullivan, 2002), attempts at helping students learn the process of problem solving by embedding explorations in some of the transitioning stages are rare. With exceptions of researches conducted by Roschelle et al. (2010) and Panaoura (2012), an inquiry process is not emphasized in the accumulated pool of studies on problem solving despite a strong support by other researchers (e.g., see English, 2004). The current research on problem solving gravitates toward creating and examining the effects of cognitive support or showing worked-out solutions that students can follow (e.g., see Van Loon-Hillen et al., 2012). As illustrated by the computed effect sizes, all of these attempts seem to produce desirable positive results; however, by concentrating on simplifying or following the mechanics of the problem-solving process, the meaning of the context embedded in the problems is diminished. The question that arises here is How to convert word problems to explorations? Bonotto (2007) suggested 'Change the type of activity aimed at creating interplay between the real world and mathematics towards more realistic and less stereotyped problem situations' (p. 86) and 'change classroom culture by establishing new classroom sociomathematical norms' (p. 86). (Greer et al. 2007) proposed to 'Valorize forms of answer other than single, exact numerical answers' (p. 92).
Are the effect sizes dependent on the mathematics content domain?
Two mathematics content domains dominated in this study pool: geometry and algebra. Geometry, traditionally supported by visualization, showed a higher effect size (ES = 0.67) compared to algebra (ES = 0.61). As geometric objects can also be easily externalized by their real embodiments, more effort should be placed on contextualization and visualization of other, more abstract mathematical structures such as functions. Teaching algebraic structures via exploratory environments is being practiced and researched, yet embodying algebraic structures by context-driven scenarios seems to be a challenge, which is reflected by locating only eight (33%) such studies.
How does the type of instructional support (teacher guided or computer based) affect student achievement when computers are used?
There were two main categories of instructional support provided to the students in the study pool: computerbased support displayed on the computer screen or teacher-centered support provided by the instructor.
Computer-based instructional support dominated the study pool (16, or 67%) compared to teacher-based instruction (8, or 33%). When compared by learning effects, teacher-centered support produced a higher effect (ES = 0.75) than computer-based support (ES = 0.56). This result signifies the importance of the teacher's role in developing students' understanding of mathematics structures and helping them apply the structures to solve problems, and it corresponds with Li and Li's (2009) finding who claimed that 'teacher transfers the knowledge development and justification responsibilities to students' (p. 275). A particular instance that needs further investigation is transitioning from verifying to explaining (Hähkiöniemi & Leppäaho, 2012). Merchant et al. (2014) claimed that 'It is essential that teachers are made knowledgeable about the features and situations that make feedback effective' (p. 37). Programmed tips are important and readily available to students, yet the expertise, encouragement, and support from a live person appear to have a higher impact on students' learning. Further research contrasting learning effect sizes by using as a moderator, for example, the frequency of seeking help available on the computer screen versus frequency of seeking help from the teacher along with quality of answers sought, would likely shed more light on the cause of the differences.
In addition to analyzing the effects of moderators that reflected the research questions, the effect of treatment length was also computed. The analysis showed that the treatment of length between 2 and 5 weeks, called intermediate herein, produced the highest effect size of (ES = 0.63). A similar conclusion was reached in a metaanalysis by Xin et al. (2005), who also proved that longer treatment results in higher student achievement. The student needs to be acquainted with the mechanics of the new learning medium; thus, it is important that the first contact and experience with an ECE be absorbed into a learner's working memory. It is hypothesized that longer and more frequent exposure to the new environment allows a higher focus on task-driven objectives related to the content analysis, which consequently results in better context understanding and higher learning effects. However, as Guven (2012) and Roschelle et al. (2010) found, there is an achievement saturation level, which perhaps suggests that in order to further increase learning effects, ECEs need to mediate with other factors, not necessarily related to content knowledge, for instance different forms of analysis, synthesis, or evaluation as suggested by Anderson and Krathwohl (2001).
When linking the subgroups with the highest learning effects, it appears that month-long geometry explorations in grades 1 to 3 mathematics classes, guided by the teacher, would produce the highest learning effects.

Conclusions
While this study found a moderate positive effect size (ES = 0.59) associated with ECE, this finding does not diminish the importance of good teaching. Several studies (Christmann et al. 1997;Clark, 1994;Povey & Ransom, 2000) found that using computers purely as a method of instruction does not improve students' mathematics understanding. Hence, although computers have been used in mathematics classrooms for several decades now, the question regarding to what extent they can impact the teaching and learning of mathematics seems to be open for further investigations. This meta-analysis of up-to-date literature allowed for formulations of some inferences based on implementations of technology; however, many new questions emerged, such as the following: How do exploratory environments help students with the transfer of mathematics concepts to new situations? How can we assure that the methods of quantitative scientific modeling that students apply in their science classes are coherent with the ones used in mathematics, and vice versa? Mathematics provides tools for phenomena quantifications; thus, unifications of the techniques of modeling seem to benefit the transition of knowledge between mathematics and science and consequently affect the learners' perception of mathematics as a subject with a high applicability range. Will such unification prompt students to increase their engagement in mathematics? More detailed studies in these domains are worthy of consideration, and the availability of ECEs will be very helpful in organizing such studies.

The impact of ECEs on students' problem-solving techniques
Problem-solving techniques are developed on the basis of understanding the context through identifying the principles of the system's behavior. However, it is a highly intertwined process that might include verbal and syntactic processing, special representation storage and retrieval in short-and long-term memory, algorithmic learning, and its most complex element -conceptual understanding (Goldin, 1992). Computerized programs offering a basis for investigation offer a great potential for improving conceptual understanding of problems; however, this study shows that this area is not yet fully explored, and taking full advantage of such learning environments to examine their impact on student achievement is a possible extension of this undertaking. More specifically, enriching the problem analysis through explorations to focus learners' attention more on the underpinning principles emerges as a possible objective of such studies. Higher student achievement on explorations (ES = 0.62), compared to problem solving (ES = 0.54), encourages designs of more comprehensive research about inducing an exploratory approach to problem solving also to solving standard textbook problems as opposed to current schemata-driven methods. Would giving students more ownership in exploring a given system's behavior, hypothesizing a solution, testing, and proving or disproving their hypothesis be a possible moderator affecting learning? Will these types of activities help solidify a notion and belief in the power of mathematics as sense-making subject? There also seems to be more work needed to evaluate how learners link mathematics concepts with principles embedded in given context and how they initiate applications of the procedures that they select. 'Only an analysis of the instrument, i.e., the interaction of the artifact and the utilization schemes of its users (teachers and students), the analysis of its instrumental genesis will help in the implementation of computers, software and communication technology in the mathematics classroom' (Laborde and Sträßer 2010, p. 131).

Limitations and suggestions for future research
This meta-analytic research has certain limitations, primarily because this study could not be conducted in an experimental fashion where ECEs constituted instrumentation provided by computer programs and a direct contrast between two different modes of learning -digital and traditional -were exploited. Furthermore, the limited count of studies available to be meta-analyzed affected the study generalizability. Although sensitivity to smaller sample sizes was restored by the process of weighing, the impact of the mean effect would validate the replication of the findings more significantly by being computed over a larger study pool.
The other factor affecting validity of the computed effect sizes is the high span of interactivity of the software used in the primary studies and their exploratory nature, ranging from linear equation exploration supported by an interactive balance scale (Suh & Moyer-Packenham, 2007) to investigation of rate of change supported by the SimCalcMathWorlds program (Roschelle et al. 2010). A metric for incorporating this moderator to effect size calculations could have been furnished by evaluating the designs of the interventions through the lens of the multimedia principles defined by Clark and Mayer (2011). This task, however, was not possible to accomplish due to the lack of the software detailed descriptions.
The validity of the research would have been higher if the calculated homogeneity statistics were not statistically significant. In this meta-analysis, Q T = 117.78, with d f = 23, p < 0.01), which implied a random-effect model for the data analysis instead of a more precise fixedeffect model.
Another factor limiting the study findings involves the widely varied student assessment methods, ranging from traditional multiple-choice questions mostly locally developed to new assessment techniques such as standardizedbased assessments. Although one of the studies (Pilli & Aksu, 2013) reported a Cronbach's alpha reliability coefficient, most did not, thereby decreasing the reliability of the measuring instrument. The degree of diversity further extends due to obvious differences in mathematics curricula, objectives, and expectation levels in the nine countries whose research studies were represented herein. Even though control groups in the extracted pool of studies were taught traditionally, the term traditional teaching might have been interpreted differently depending on the country. For example, Lan et al. (2010)) defined traditional instruction as being without the support of technological devices, whereas Papadopoulos and Dagdilelis (2008) defined traditional teaching as a paper-and-pencil environment. Both descriptions imply that technology was not being used, but the treatment applied in the control groups might have varied in terms of degree of representations and method used, or teacher qualifications, which potentially could have mediated with the control groups' posttest scores. We concluded, however, that these fluctuations did not affect the overall effect size in a manner that would question the validity of the computed overall effect size.
Though at first we intended to examine the effects of embedded scientific inquiry methods in exploratory environments on students' problem-solving skills, we encountered a limited number of research studies addressing this domain. Thus, we modified the study focus. We realized that exploratory environments used in both types of interventions -explorations and problem solving -contained, to a certain degree, some elements of scientific inquiry and affected students' problem-solving skills, not just students' problem-solving performance as measured by testing. Further studies focusing primarily on the effects of inducing scientific inquiry processes in mathematical modeling and problem solving would serve to extend this paper. Technology has encouraged researchers to consider not only how to best adapt tools to the learning of mathematics but also how to adapt the content of mathematics in light of new, tool-rich possibilities to enable learners to perform tasks that would not previously have been possible (Hoyles & Noss, 2009). The task of explorations seems to provide basis for inducing such adaptations.
This meta-analysis, to a certain extent, exposed the focus of the existing primary studies on the effects of exploratory environments on problem solving in mathematics education. We advocate for searching and formulating more constructs to quantify students' problem-solving techniques with ECEs as a medium of context.