Web-based authentic inquiry experiences in large introductory classes consistently associated with significant learning gains for all students

Continuous calls for reform in science education emphasize the need to provide science experiences in lower-division courses to improve the retention of STEM majors and to develop science literacy and STEM skills for all students. Open or authentic inquiry and undergraduate research are effective science experiences leading to multiple gains in student learning and development. Most inquiry-based learning activities, however, are implemented in laboratory classes and the majority of them are guided inquiries. Although course-based undergraduate research experiences have significantly expanded the reach of the traditional apprentice approach, it is still challenging to provide research experiences to nonmajors and in large introductory courses. We examined student learning through a web-based authentic inquiry project implemented in a high-enrollment introductory ecology course for over a decade. Results from 10 years of student self-assessment of learning showed that the authentic inquiry experiences were consistently associated with significant gains in self-perception of interest and understanding and skills of the scientific process for all students—both majors and nonmajors, both lower- and upper-division students, both women and men, and both URM and non-URM students. Student performance in evaluating the quality of an inquiry report, before and after the inquiry project, also showed significant learning gains for all students. The authentic inquiry experiences proved highly effective for lower-division students, nonmajors, and women and URM students, whose learning gains were similar to or greater than those of their counterparts. The authentic inquiry experiences were particularly helpful to students who were less prepared with regard to the ability to evaluate a scientific report and narrowed the performance gap. These findings suggest that authentic inquiry experiences can serve as an effective approach for engaging students in high-enrollment, introductory science courses. They can facilitate development of science literacy and STEM skills of all students, skills that are critical to students’ personal and professional success and to informed engagement in civic life.


Introduction
Continuous calls for reform in science education emphasize the need to introduce undergraduate students to the scientific process early in their careers and improve students' ability to apply the process of science (AAAS 2011). The "Engage to Excel" report by the President's Council of Advisors on Science and Technology also stresses the importance of improving the first two years of undergraduate STEM (Science, Technology, Engineering, and Math) education by providing STEM experiences (PCAST 2012). In the report Science Literacy: Concepts, Contexts, and Consequences, the National Academies of Science, Engineering, and Medicine (NASEM) articulated "understanding of scientific practices" and "understanding of science as a social process" as two of the three key aspects of science literacy, along with content knowledge (NASEM 2016). The report also argued the necessity for developing science literacy for all citizens, based on four primary rationales: personal, economic, democratic, and cultural (NASEM 2016).
It is common in large institutions to have highenrollment, lecture-based introductory science courses for nonmajors, or similar courses for both majors and nonmajors with separate lab courses that are smaller and primarily accessible to the majors. How can we provide authentic learning experiences to help all students in these courses develop the understanding of the process of science and STEM skills? This may be particularly important for nonmajors for many of whom the introductory science courses may be their only exposure to science during their college career.
Undergraduate research experiences (URE), both traditional URE and course-based undergraduate research experiences (CURE) (Harrison et al. 2011), and open inquiries (Buck et al. 2008;Rowland et al. 2016) are effective learning experiences that can help undergraduate students develop understanding of the scientific process and STEM skills (Handelsman et al. 2007;Lopatto 2009;O'Donnell et al. 2015). Providing these authentic inquiry experiences in high-enrollment, lecture-based introductory science courses, however, can be logistically difficult or impractical. Web-based approaches have the potential to make authentic inquiry experiences in such settings feasible and effective. Given the dramatic and potentially lasting pedagogical changes in undergraduate education triggered by the current pandemic, there is a heightened need to implement web-based authentic inquiry experiences and assess their efficacy in developing critical science literacy and skills.

Literature review
Inquiry-based learning has a long history in K-12 science education (Buchanan et al. 2016;Khalaf 2018;Schwab 1962) and there has been a growing interest in undergraduate science education over the recent decades (AAAS 2011;Beck et al. 2017;Handelsman et al. 2007;Koretsky et al. 2018;Sundberg et al. 2005). In inquirybased learning, students engage in many activities and thinking processes as scientists do (NRC 2000). These activities may include making observations, framing questions and hypotheses, designing and conducting scientific investigations, formulating scientific explanations and models based on evidence and logic, communicating results, and revising the explanations or revisiting the investigations based on feedback and critique from peers (NRC 2000;Weaver et al. 2008). Practices of inquirybased learning can be grouped into four types, or an inquiry continuum, based on the degree of independence of students in the inquiry process (Bell et al. 2005;Fay et al. 2007;Weaver et al. 2008;Wheeler and Bell 2012). The traditional confirmation labs are at one extreme of the continuum where instructors provide the research question and procedure and ask students to confirm a known outcome. In structured inquiry, the research question and procedure are provided, but students are asked to explore the unknown outcome through the inquiry. In guided inquiry, only the research question is provided, and students are asked to design and conduct the investigation to answer the question. In open inquiry, also referred to as authentic inquiry (Bielik and Yarden 2016;Buck et al. 2008;Rowland et al. 2016), students are given only a raw phenomenon and asked to formulate the research question and design and conduct the investigation to answer the question.
Numerous studies have shown that inquiry-based learning leads to significant learning gains in performance, attitudes, and inquiry-related skills (Gehring and Eastman 2007;Goldey et al. 2012;Howard & Miskowski 2005;Rissing and Cogan 2009;Russell and Weaver 2010;Webb et al. 2014). Although most of the studies focused on courses for science majors, nonmajors in inquirybased introductory biology courses also showed greater improvements in science literacy and research skills (Gormally et al. 2009) and greater positive changes in attitudes toward science than those in a traditional course (Kiernan and Lotter 2019). In a study of the effects of inquiry-based ecology laboratory courses at a historically Black college and a private research university, students in both institutions had significant learning gains in confidence and scientific reasoning skills, with no institution effect, indicating positive impact of inquiry-based learning on the learning of both student populations (Beck and Blumer 2012). A longitudinal study also showed longterm improvements in learning through learner-centered and inquiry-based courses taken by biology majors early in a curriculum (Derting and Ebert-May 2010). A review of published studies on inquiry-based teaching in undergraduate biology laboratory courses showed that the majority of them used guided rather than open inquiries (Beck et al. 2017). Open inquiries, however, are likely more effective for developing understanding of scientific practices and science as a social process. Most open inquiries are implemented in laboratory classes (Beck et al. 2017;Sundberg et al. 2005) with few in lecture-based introductory courses that typically serve a large number of nonmajor students for whom such a course may be their only exposure to science in their college career.
Inquiry-based learning and undergraduate research experiences exist along a continuum (Beck et al. 2017;D'Avanzo 1996;Weaver et al. 2008), where intellectual autonomy and responsibility of the students increase from confirmation labs to apprenticeship in a research lab (Weaver et al. 2008). A defining feature of undergraduate research experiences that separates it from inquiry-based learning is discovery of new knowledge (Auchincloss et al. 2014;Weaver et al. 2008). Undergraduate research has been well-documented as a highimpact learning experience that contributes positively to students' understanding of the scientific process (Kuh et al. 2010;Lopatto 2009;O'Donnell et al. 2015). In their 2003-2005 web-based survey study of students who participated in undergraduate research, Russell et al. (2007) reported that 88% of their respondents claimed that their understanding of how to conduct research increased a fair amount or a great deal. A 3-year study of the benefits to students of participating in undergraduate research by Seymour et al. (2004) showed multiple gains, including "thinking and working like a scientist" (p. 511). Using the SURE (Survey of Undergraduate Research Experiences) to examine the benefits of undergraduate research, Lopatto (2004Lopatto ( , 2007Lopatto ( , 2009) surveyed undergraduate students from multiple institutions. Results from his cross-institutional, multiyear study show student gains not only in understanding the research process but also in planning for postgraduate education, and in personal development, e.g., working independently.
Participation in undergraduate research also correlates to retention and persistence for all students, including underrepresented minority students (Bauer and Bennett 2003;Campbell and Skoog 2008;Finley and McNair 2013;Nagda et al. 1998). Using longitudinal data from a large research university, Jones et al. (2010), in particular, highlighted the importance of underrepresented minority student participation in undergraduate research. Their study showed that students who participated in research during their undergraduate career were more likely to complete a baccalaureate degree, persist in biology, and perform well in biology, which could impact interest in admission to graduate school. Chang et al. (2014) conducted a longitudinal study analyzing data from the Cooperative Institutional Research Program's 2004 Freshman Survey and 2008 College Senior Survey, which included a sample of 3670 students at 217 institutions. Key findings from their study revealed that by increasing student involvement in key academic experiences, including undergraduate research, higher education institutions can play a significant role in reducing racial disparities in STEM, which appear to be largely a result of unequal preparation and access to academic opportunities.
Much of the literature related to undergraduate research focuses on UREs (undergraduate research experiences), which is an apprenticeship or internship model that typically involves a few selected students who invest time outside of class working with a faculty mentor on a research project. The challenge of this model is that it is difficult to scale up and make it available to all STEM students during their first 2 years of college studies (Desai et al. 2008;Harrison et al. 2011;Sadler et al. 2010;Wei & Woodin 2011). Researchers have also noted that the focus of most studies of undergraduate research is on upper-level students who are a selfselected population already interested in research.
Recently, there has been a greater emphasis on CUREs (course-based undergraduate research experiences) in which whole classes are involved in scientific research (Harrison et al. 2011). The aim of these experiences is to reach students who might not have identified research as a career or education at an earlier stage in their academic careers by integrating research into lower-level undergraduate courses (Harrison et al. 2011). Focusing on course-based research at the introductory level, Harrison et al. (2011) discussed their experiences using the Phage Genomics course as the model. Evidence from their study shows that firstyear students doing research learn the process of science, as well as how scientists practice science. Their study also showed that students' career interests, such as pursuing graduate education, changed. Shaffer et al. (2014) share the results of their multi-campus research based on a bioinformatics project (a course-based research experience) that was incorporated into the curriculum of diverse institutions. They found that the bioinformatics project within a biology curriculum provides a mechanism for successfully engaging large numbers of students in undergraduate research; benefits to students are achievable at a wide variety of academic institutions; and successful implementation of course-based research experiences requires significant investment of instructional time for students to gain full benefit. Auchincloss et al. (2014) provide a framework that characterizes CUREs and outlines research possibilities and recommendations for educators who include research experiences in their courses. Although one of the benefits of CUREs is that all students enrolled in a course are involved in research, typically CUREs have focused on science majors (Bakshi et al. 2016;Harrison et al. 2011). Additionally, along with requiring a significant investment in time Wu et al. International Journal of STEM Education (2021) (Shaffer et al. 2014), other barriers to offering CUREs exist, such as class size and a lack of resources to assist faculty with developing and effectively teaching CUREs (Bakshi et al., 2016). Whereas the examples above highlight continued efforts to engage college students in scientific inquiry and research, these opportunities, as well as research on the benefits of participation in experiences such as CUREs that are open to all students, remain focused on science majors (Ballen et al. 2017).

Current study and research questions
It is important to provide authentic inquiry experiences in high-enrollment, lecture-based introductory science courses because such experiences can help students develop science literacy as defined in the NASEM report (2016), critical thinking and problem-solving skills (NSTC 2018), interpret and make sense of science in their daily lives, and engage with science in meaningful ways with issues that have personal and community relevance (Feinstein et al. 2013). It is challenging to provide authentic inquiry experiences in these courses, however, because implementing open inquiry experiences or course-based undergraduate research experiences in these settings is logistically difficult or impractical. A web-based authentic inquiry project, with a focus on understanding the process of science and developing skills rather than content knowledge, was developed for a high-enrollment introductory ecology course (without a lab component) and successfully implemented for over a decade in a Research 1 University (Wu et al. 2016). Although related to the work of a research lab, the authentic inquiry experiences focused on science as a process and not a product (Hester et al. 2018;Spell et al. 2014) and modeled the processes used by practicing scientists (Peffer et al. 2015; Peffer and Ramezani 2019), but were not part of the research that scientists actually carried out (Chinn and Malhotra 2002). These authentic inquiry experiences focused on developing science literacy and STEM skills (NASEM 2016), to help students learn ideas and skills that are of value and use to their respective futures (Rowland et al. 2016).
The current study attempted to explore the effect of the authentic inquiry project on student learning through analyses of the data from assessments built into the course, including data on student self-perception of learning from 2007 to 2016 and student performance in evaluating a scientific report and quality of their reports from 2013 to 2016. We asked the following research questions: (1) What were the self-perceptions of learning gains of students in their interest, ability, and understanding of the scientific process through the authentic inquiry experiences, for all and different groups of students-majors and nonmajors, lower-and upper-division students, women and men, and underrepresented minorities (URM) and other ethnicities (non-URM)? (2) Did the self-perceptions of learning gains differ between each pair of the groups? (3) What was the impact of the experiences on students' performance in evaluating the quality of scientific reports for all and different groups of students? (4) Did the changes in performance differ between these groups?

Design of the inquiry project
A web-based ecological inquiry project using archived BearCam photos (see below for a description of the BearCam photos) was developed in 2006 to provide an authentic inquiry experience for students in a large introductory ecology course (Wu et al. 2016), through the activities of the NSF-funded Information Technology in Science (ITS) Center for Teaching and Learning at Texas A&M University (Schielack and Knight 2012). The learning goals of the inquiry project were to help students develop a better understanding of the nature and process of science and enhance their critical thinking and communication skills.
The activities of the inquiry project were structured over a 5-week period. Students conducted web-based individual research projects outside of class with ongoing peer feedback through online group discussions as well as instructor-facilitated discussions in class. Following is an outline of the activities through the 5 weeks and more detailed descriptions and discussions are available in Wu et al. (2016).
Week 1: form peer group and conduct background study Students (a) participate in an online ice-breaker activity to get to know each other in their inquiry project discussion group (of~10 students) in Blackboard Learn, (b) conduct an online search of background information on the biology and behavior of grizzly bears, and (c) share one interesting piece of information on bear biology or behavior, which has not been mentioned by other group members. We also dedicate the first class period of week 1 to explore with students the rationale and relevance of the project, and to discuss ecological background and the process of the inquiry project. Finally, we discuss the rubric, with 30 items organized in 10 categories ( Fig. 1), which is used throughout the project to guide the inquiry process and the writing and peer review of the inquiry report.
Week 2: develop hypothesis and design investigation Students (a) study the BearCam photos and observe patterns in bear behavior and spatial distribution, (b) formulate a hypothesis in terms of specific predictions of the pattern, (c) design the procedure for collecting relevant data, with sufficient sample size and appropriate sampling regimen for testing the hypothesis, and (d) share the hypothesis and design with the discussion group and provide feedback to the hypothesis and design of investigation of at least two other group members. A portion of one class period of week 2 was used to discuss scientific hypotheses, bias, random and stratified sampling, and effect of sample size, through mini-lectures and active learning activities. The archive photos used in the inquiry project were collected by Lawrence Griffing and his students who archived the pictures and video from a remote-controlled (at Texas A&M University) pair of video cameras at McNeil Falls in the McNeil River State Game Sanctuary for animal behavioral studies. Over 1200 archived photos organized by hours of a day were made available for the inquiry project, as well as an aerial photo of the McNeil River Fall and a scaled map with 1 m × 1 m grid to identify the location of the bears and estimate distances between bears (http:// bearcaminquiryproject.weebly.com/). Week 3: collect data, analyze data, and develop inquiry report Students (a) select a set of random/ stratified random samples (photos with time and ID) appropriate for testing the hypothesis, (b) collect data from each photo selected and record the data in an Excel file, (c) conduct data analysis and generate figure(s) or table(s) to represent the results, (d) write an inquiry report following the guidelines and rubric provided and submit completed inquiry report to the Calibrated Peer Review (CPR) system, and (e) make at least one posting to the discussion group to discuss one's data collection, analysis, and interpretations and respond to at least two postings of other group members and provide feedback. CPR (cpr.molsci.ucla.edu/Home.aspx) is used as an online instructional tool to help students understand the process and value of peer review in scientific inquiries, enhance their skills in critically evaluating scientific writing, and provide and receive feedback on their inquiry reports.
Week 4: conduct calibrated peer review Students (a) complete three calibrations, (b) review three reports of peers, and (c) conduct self-assessment of one's own report. Reviews are completed in the CPR system using the rubric. The three "calibration" reports, of high, medium, and low quality, were set up and evaluated using the rubric by the instructor. Students can make up to two attempts to evaluate each calibration report and receive feedback to learn how to evaluate reports using the rubric.
Week 5: revise inquiry report based on peer review feedback Students revise the inquiry report based on the peer review feedback and self-assessment and submit a revised report online.

Implementation and assessment
The inquiry project using BearCam was implemented in a high-enrollment introductory ecology course, Fundamentals of Ecology, at Texas A&M University in the Fall semesters of 2007-2016. This course had two sections with 400-500 students combined, from 30 to 50 different majors for some of whom this course was required and for others an elective. It was a sophomore level course; however, the typical distribution of classifications had approximately 65% lower-division (freshman and sophomore) students and 35% upper-division (junior and senior) students. In the Fall semesters of 2007-2011, all students were required to conduct an inquiry project using BearCam; in Fall semesters of 2012-2016, about 60-80% of the students conducted an inquiry project using BearCam and the others conducted a different inquiry project with similar guidelines.
Throughout the 5-week period for the inquiry project, we typically used the first 5 to 15 min at the beginning of each class (MWF) to discuss the inquiry project, which provided opportunities for formative assessment of the project and of student learning. We asked and answered student questions, had targeted exercises and discussions coupled with clicker questions, and gave minilectures as needed. Whenever possible, we tried to relate what students were doing in their inquiry project to the experiences and examples of what we do in our own research. This seemed attractive and engaging to students, possibly because they saw that they were engaging in an authentic inquiry process similar to what researchers did, as well as learning from our experiences.

Self-perception of learning gains
An online survey was conducted at the end of the inquiry project each semester and students were asked to provide feedback on the inquiry project and reflect on their learning through the project (see Supplementary  Information). Most of the questions remained the same over the years. With a subset of eight questions in the survey, we asked students to rate their interest in ecology, ability to formulate testable hypotheses, understanding how ecologists conduct research, and ability to evaluate the quality of scientific reports before and after the inquiry project. Student responses to these questions from Fall semesters of 2007-2016, in a five-level scale, rated from 1 (very low) to 5 (very high), were used to assess self-perception of learning through the inquiry project.
These four pairs of questions were part of the postproject online survey that was designed for formative assessment for the inquiry project, created by the group of faculty members (in ecology, educational psychology, and math education) who designed and implemented this inquiry project. These four pairs of questions were created based on considerations of (1) the focus of the inquiry project on developing understanding and skills related to the scientific process and (2) the anticipated potential learning gains given the specific design of the inquiry project.
For the questions on interest in ecology before and after the inquiry project, we did not specifically define what interest is (Renninger and Hidi 2011;Rowland et al. 2019). There were likely substantial variations among students in what interest in ecology means to them. We were interested in, and interpreted the results as, their perceptions of the change through the inquiry project in their various interpretations of interest in ecology collectively, without being able to interpret the specific nature of the gains. Similarly, we did not provide specific definitions in the three pairs of questions on rating their ability to formulate testable hypotheses, understanding how ecologists conduct research, and ability to evaluate the quality of a scientific report before and after the inquiry project, respectively. We did, however, discuss in class the definitions of and difference between a scientific hypothesis and a statistical or research hypothesis and, given the limitation of the BearCam data, asked students to focus on a research hypothesis and explicitly discuss the potential mechanisms or explanations in the inquiry report. During the discussions on the inquiry project at the beginning of each class, we deliberately drew parallels between what students were doing at the time and what we as researchers do in our ongoing or past ecological research projects, the reasons for the practices, the insights we gained, and lessons learned. It was common to see comments in student feedback that they appreciated the sense of relevance they gained through these discussions. This mapping to the real world of ecological research could contribute to authenticity of the experience (Radinsky et al. 2001;Rowland et al. 2016) and students' understanding of the research process. We discussed the rubric at the start of the inquiry project and used it throughout the process, with in class discussions of sample student work based on the rubric and peer reviews using the rubric, which could influence to a degree, students' interpretation of what evaluating the quality of a scientific report means.
One of the concerns in survey studies is if the respondents can correctly interpret and comprehend the survey questions (Drennan 2003). Researchers have used cognitive interviews, as a "way of assessing respondents' understanding of questionnaire items" (Park et al. 2017, p. 2), to address this concern (Drennan 2003; Kaplan et al. 2018). We conducted a series of cognitive interviews with students to examine their interpretations of the survey questions. Student volunteers who completed the ecology course in the Fall 2019 semester were recruited to participate in a cognitive interview. We chose to interview this group of students because they would likely be able to recall the inquiry project and survey more easily compared to past groups. During each interview, the students were asked to read each survey question and explain the meaning of the question in their own words.
A total of fifteen cognitive interviews were conducted. We analyzed the responses by searching for instances where a student might have interpreted the question incorrectly or misunderstood a specific word in the question. We did not encounter any situations in which a student did not comprehend the question, and all students were able to successfully explain the meaning of each survey question. Additionally, students made positive statements regarding the clarity of the questions such as, "Each statement is very clear and to the point; students would not have a difficult time understanding the meaning of each statement."

Performance
In Fall semesters of 2013-2016, we added an assessment of students' performance in evaluating the quality of scientific reports. Students were assigned to evaluate an inquiry report (the medium quality calibration report used in CPR) using the rubric (Fig. 1) before the start of the inquiry project. The performance of a student was determined as the percent of the items that they scored the same as the instructor. This assignment served as the pretest, and the evaluation of the same report during calibrations in CPR as the posttest, for students' ability to evaluate inquiry reports using a rubric. Given that additional learning gains are likely through the CPR process and the revision of the report, this assessment provided a conservative measure of learning gains.
Student performance in their revised inquiry reports was used to assess their learning. A total of 1222 inquiry reports were submitted and graded by the teaching assistants (TAs) using the rubric during the Fall semesters of 2013-2016. The frequency distribution of both the overall scores and the scores for individual categories were examined.

Data analyses
Each student was designated to a "major" group, majors or nonmajors, a "classification" group, lower-division (freshman or sophomore) or upper-division (junior or senior), and an "ethnicity" group, underrepresented minorities (URM) or other ethnicities (non-URM). This course was a required course for nine majors (Bioenvironmental Sciences, Environmental Studies, Horticulture, Landscape Architecture and Urban Planning, Renewable Natural Resources, Rangeland Ecology and Management, Recreation, Park and Tourism Sciences, Spatial Sciences, and Wildlife and Fisheries Sciences) and students in these majors were considered "majors." Students in all other majors, for which this course was an elective, were considered as "nonmajors" for this study. The underrepresented minority group included African Americans, American Indians/Alaska Natives, and Latinos. All statistical analyses were conducted using JMP-Pro 14 (SAS Institute Inc. 2018).

Self-perception of learning gains
Student self-reported scores, in a five-level scale, rated from 1 (very low) to 5 (very high), were treated as ordinal data. Wilcoxon signed-rank test with matched samples was used to test the pre-IP and post-IP differences in student self-reported interest in ecology, ability to formulate testable hypotheses, understanding how ecologists conduct research, and ability to evaluate quality of scientific report, for all students in all semesters (Falls 2007(Falls -2016 and by semester, major, classification, gender, and ethnicity. Cliff's delta (δ) was used as the effect size measure and δ = 0.11, 0.28, and 0.43 were interpreted as small, medium, and large effect sizes, respectively (Vargha and Delaney 2000).
Cochran-Mantel-Haenszel test, using semester as the grouping variable, was used to test the differences between the groups of majors and nonmajors, lower-and upper-division, women and men, and URM and non-URM, in the gains of student self-reported interest in ecology, ability to formulate testable hypotheses, understanding how ecologists conduct research, and ability to evaluate quality of scientific report. Cramer's V was used as the effect size measure and V = 0.10, 0.30 and 0.50 were interpreted as small, medium and large effect sizes, respectively with df = 1.

Performance
Comparisons were made between the scores for student evaluation of inquiry reports before the inquiry project (pre-IP) and after the completion of the first submission of the inquiry project ("post-IP," but before the completion of the CPR and revision of the report), for all students combined, majors, nonmajors, lower-division, upper-division, women, men, URM, and non-URM. The score was determined as the percent of the rubric items that a student rated correctly. Students were also grouped based on levels of pre-IP performancegrouped by B, C, D, or F (using 90, 80, 70, and 60% break points) in scores for pre-IP evaluation of inquiry report (no A scores). A mixed model ANOVA, with subject (unique student code) nested within semester as random effects, was used to compare the pre-IP and post-IP scores for evaluation of inquiry reports for all students and by major, classification, gender, ethnicity, and pre-IP performance groups, respectively. The distribution of the scores was right skewed, and the variable was log-transformed for the analyses. A Bonferroni correction was used to adjust the critical α-value (0.05/13 = 0.0038). Cohen's d was used to evaluate the effect size of the differences between the pre-IP and post-IP scores. The effects of major, classification, gender, and ethnicity on the gains (post-IP score minus pre-IP score) were evaluated using a mixed model ANOVA, with subject (unique student code) nested within semester as random effects.
The grade distributions of the overall scores for the revised report, as well as the scores for the individual categories, were generated. Potential effects of major, classification, gender, ethnicity on overall score, and scores for the individual categories were evaluated using mixed model ANOVA, with subject (unique student code) nested within semester as random effects. The distributions of the response variables were right skewed, and the variables were log-transformed for the analyses.

Self-perception of learning gains
With respect to the first research question on the selfperceptions of learning gains of students in their interest, ability, and understanding of the scientific process through the authentic inquiry experiences, for all and different groups of students, the results showed significant differences in the pre-IP and post-IP scores of selfreported interest in ecology, ability to formulate testable hypotheses, understanding how ecologists conduct research, and ability to evaluate the quality of scientific reports, for all students from Fall semesters of 2007-2016 (Fig. 2). When evaluated separately, the differences were also significant for each semester. Although the selfreported pre-IP and post-IP scores were significantly different in all categories, the effect size differed among the categories. The effect size was highest in understanding how ecologists conduct research, followed by ability to evaluate quality of scientific report, ability to formulate testable hypotheses, and interest in ecology (Fig. 2).
When evaluated by individual student groups from Fall semesters of 2007-2016, combined as well as for each semester, the self-reported gains in interest in ecology, ability to formulate testable hypotheses, understanding how ecologists conduct research, and ability to evaluate the quality of scientific reports were significant for every group-majors, nonmajors, lower-division students, upper-division students, women, men, URM students, and non-URM students (Fig. 3).
When comparing the self-reported gains between each pair of the student groups to explore the second research question on differences between groups, there was no statistically significant difference between majors and nonmajors in any of the categories (Fig. 4a). There were significant differences between lower-and upperdivision students in all categories except for interest in ecology, with the lower-division students having higher level of gains (Fig. 4b). No significant differences were found between men and women, except for the ability to evaluate the quality of scientific reports with women having greater gains (Fig. 4c). There were significant differences between URM and non-URM students, with higher level of gains by URM students, except for ability to evaluate the quality of scientific reports (Fig. 4d). The effect sizes for these significant differences were small (Fig. 4b-d).

Performance
With respect to the third research question on the impact of the experiences on students' performance in evaluating the quality of scientific reports, the results showed that the post-IP average score for report evaluation was significantly higher than the pre-IP average score, for all students, both majors and nonmajors, both Fig. 2 Students' self-perception of learning. Distributions of self-reported levels of interest in ecology, ability to formulate testable hypotheses, understanding how ecologists conduct research, and ability to evaluate quality of scientific report, before (pre-IP) and after (post-IP) the inquiry project (n = 2714). The W statistic and p value for Wilcoxon signed-rank test and the effect size (Cliff's delta, δ) are shown above each category lower-and upper-division students, both women and men, and both URM and non-URM students (Fig. 5). When examined by pre-IP performance groups (B, C, D, and F), there was no statistically significant gain for pre-IP B group, but there were significant gains for the pre-IP C, D, and F groups with increasing effect size (Fig. 6). These changes in the performances of pre-IP performance groups could be attributed to regression to the mean since we did not employ a control group. The significant change (8.82%) in the overall mean, however, suggests that there was a non-random change. The change in the pre-B group was non-significant but negative (− 0.54%) which is likely due to regression to the mean. The change in pre-C group is significant (4.07%). This positive change is likely underestimated due to the effect of regression to the mean, given that the pre-IP mean for this group (74.26%) was higher than the overall pre-IP mean (68.41%). The changes for the pre-D and pre-F group are large (11.29 and 23.41%, respectively) which likely overestimate the change due to regression to the mean. If we can assume the change in pre-B indicates the general magnitude of the effect of regression to the mean, the impact is likely relatively small. Given these considerations, the general pattern of these results is likely valid.
The findings related to the fourth research question on differences in gains in performance between each pair of the groups showed no significant difference between the gains of majors and nonmajors, lower-and upper-division students, or women and men. However, the gain of URM students (11.43 ± 14.05%) was significantly (p = 0.0220) greater than that of non-URM students (7.96 ± 11.59%). The performance of URM was substantially lower than that of non-URM in pre-IP evaluation, but became close to that of non-URM in post-IP evaluation.
About 46.5% of the students received an A for their revised inquiry reports overall and the percent of students received B, C, D, and F were 29.7, 14.4, 5.8, and 3.6%, respectively. Each of the 10 categories of the rubric has three items and the possible scores for each category range from 0 (meeting none of the three criteria) to 3 (meeting all three criteria). The percent of students making each of the scores (0, 1, 2, or 3) in each of the 10 categories were shown in Fig. 7. Between 59.5% (for Discussion) and 86.7% (for Hypothesis) of the students received full scores (3) in individual categories. There was no significant difference in the average inquiry report scores between majors and nonmajors, lower-and upper-division students, or URM and non-URM students. However, the average score for women (89.10 ± 11.10%) was significantly (p = 0.0002) higher than that of men (86.32 ± 12.80%). When comparing the scores for the individual categories of the rubric, women scored significantly higher than men for Sampling (p = 0.0007), Discussion (p = 0.0038), and Writing (p = Fig. 6 Improvements in report evaluation score for students at different levels of pre-inquiry project performance. Comparison of the pre-inquiry project (pre-IP) and post-inquiry project (post-IP) scores for evaluation of an inquiry report using a rubric, for each of the pre-IP performance groups (made B, C, D, or F, respectively, in pre-IP evaluation of inquiry report; n = 123, 121, 126, 97). The F-Ratio, df, and p value for mixed model ANOVA and the effect size (Cohen's d) are shown above each group 0.0093) (higher in other categories as well but the differences were not significant). There was no significant difference between majors and nonmajors, lower-and upper-division students, or URM and non-URM students, with the exceptions that majors scored higher for Data Collection, lower-division students scored higher for Objective, and non-URM students scored higher for Conclusions.

Discussion
Authentic inquiry experiences consistently associated with significant learning gains for all students Although studies show that providing lab-based open inquiry and research experiences offers many benefits to undergraduate students, at large institutions the highenrollment of many undergraduate science courses creates challenges to offering these opportunities. The results of this study show that web-based authentic inquiry projects in high-enrollment introductory courses such as the Authentic Inquiry Project have the potential to address these challenges and provide all students with valuable inquiry experience that models scientific research in the field. These authentic inquiry experiences were consistently associated with significant learning gains in both student self-perception of learning and their performance, for all students regardless of major, classification, gender, or ethnicity (Figs. 2 and 3). According to Linn and colleagues (2015), much of the evidence related to the benefits of undergraduate research is based on student self-report surveys or interviews. They reviewed 60 empirical studies published in the last 5 years and found that only four directly measured gains in research capabilities or conceptual understanding and over half solely relied on self-report surveys, which they claim has serious limitations. The long-term (2007-2016) data on student self-perception of learning in this study were used in conjunction with student performance data from fall semesters 2013-2016, which consistently show significant learning gains for all students (Fig 5).
Student performance in evaluating the quality of scientific reports using a rubric pre-and post-IP showed significant gains (Fig. 5). These gains represent not only improved understanding of the process and communication of science but also the improved skills of evaluation based on criteria-higher level thinking skills and general competencies that are important goals of undergraduate STEM education (NSTC 2018). Furthermore, it is noteworthy that the authentic inquiry experiences were particularly helpful to students who were less prepared with regard to these skills. The students with lower initial performance levels achieved greater gains, which narrowed the performance gap (Fig. 6). This is significant because traditional undergraduate research experiences are often provided to selected and self-selected students with better preparation and higher levels of interest (Eagan et al. 2011;Linn et al. 2015). Additionally, although there are some examples of CUREs being offered in highenrollment introductory lab courses (Brownell et al. 2013), they are mostly offered to smaller classes of majors (Bangera and Brownell 2014). These authentic inquiry experiences can facilitate learning gains for a large number of students with various prior experiences (e.g., majors and nonmajors) and levels of preparation. A high percentage of students were able to acquire the inquiry skills identified in the rubric and received high scores on their inquiry reports (Fig. 7), which reflected attainment of their understanding of the research process. The high scores for the hypothesis category corroborated the self-reported gains in ability to formulate testable hypothesis. Student performance on inquiry reports also provide some evidence for attainment of written communication skills (Fig. 7). It is important to note, however, that the actual performance of students may not be as high as reflected by these scores. This is a result of TAs' concern about being able to adequately justify taking points off if questioned by students.

Strong learning gains for lower-division students
Research shows that students are more likely to leave STEM majors during the first 2 years of college (Seymour and Hewitt 1997) and their experiences in their first-year science courses have a strong influence in students' decision to leave a science major (Strenta et al. 1994). The "Engage to Excel" report called for focused efforts to improve the lower-division undergraduate STEM education and engage students in STEM experiences during the first 2 years of college (PCAST 2012). In light of these research and call for actions, the significant gain in the interest in ecology of lower-division students, and especially the difference of interest in ecology between lower-and upper-division students post-IP showing lower-division students having significantly (p < 0.0084) higher levels self-perception of interest is important to note. If students maintain interest in their major, they are more likely to persist in their major. Learning gains in this area could have a potential impact on retention and persistence in STEM, as well as on future career and/or graduate school considerations (Lopatto 2009). According PCAST (2012), fewer than 40% of students who have an interest in science graduate with a STEM degree. Capturing and maintaining students' interest in science early on in their studies could help address this concern.
It is also important to note that both lower-and upper-division students had significant gains in selfperception of ability to formulate hypothesis, understanding of how research is conducted, and ability to evaluate a scientific report. Furthermore, the gains for lower-division students were significantly higher than those for upper-division students in all three categories (Fig. 4b), with lower pre-IP levels for lower-division students and similar post-IP levels between lower-and upper-division students. Interestingly, the lower-and upper-division students actually had similar levels of performance in evaluating a scientific report, both pre-IP and post-IP, with similar gains (Fig. 5), which was inconsistent with the pattern of their self-reported abilities to evaluate a scientific report. It appears either the perceptions of lower-division students underestimated their abilities, or the perceptions of upper-division students over-estimated their abilities to evaluate a scientific report. It would be worthwhile to explore how well the upper-division students were able to gain these abilities through upper-level courses and whether the authentic inquiry experiences for the lower-division students would enable them to gain more through upper-level courses.
These authentic inquiry experiences, although limited in sophistication and nuances, could serve as an early intervention for the majors to promote their interest and enhance their understanding and skills of the scientific process, which can potentially support their motivation and persistence and improve their readiness for engagement of more in-depth research experiences in the upper-level courses (Laursen et al. 2010;Lopatto 2007;Seymour et al. 2004). In the same process, these authentic inquiry experiences facilitate the development of science literacy and STEM skills of the nonmajors and the interactions among the majors and nonmajors with diverse backgrounds and perspectives likely enrich the learning for all of students.

Significant learning gains for nonmajors
To respond to the need to develop science literacy for all, which includes understanding scientific practices and science as a social process, in addition to content knowledge (NASEM 2016), we must provide opportunities for all students, both science/STEM majors and nonmajors, to engage in authentic experiences that help them develop science literacy. Authentic inquiry projects, such as this one, show promise in helping all students develop science literacy as evidenced by the significant learning gains for both majors and nonmajors. The authentic inquiry experiences were consistently associated with significant increases in self-perception of interest in ecology for both majors and nonmajors (Fig. 3a). The majors had higher levels of self-perception of interest in ecology than nonmajors both pre-and post-IP (Fig. 3a), which is not surprising given that interest is one of the factors on which students base their choice of major. The gains in self-perception of interest in ecology, however, were similar between the two groups (Fig. 4a), suggesting that the authentic inquiry experiences were equally effective in stimulating interest in ecology for both the majors and nonmajors.
The self-perception of learning gains was highly significant for both majors and nonmajors and similar between the two groups, in ability to formulate testable hypotheses, understanding how ecologists conduct research, and ability to evaluate the quality of scientific reports (Figs. 3 and 4a). Consistent with the pattern of their self-perception of learning gains, students' performance in evaluating the quality of scientific reports using a rubric pre-and post-IP also showed significant and similar gains for both majors and nonmajors (Fig.  5). Research has shown that there is a relationship between student attitudes toward science, interest being one measure of attitude, and achievement in science courses (Cook & Mulvihill 2008;Osborne et al. 2003). These findings suggest that authentic inquiry experiences through these inquiry projects are likely effective in facilitating the development of key elements of science literacy and STEM skills for the nonmajors.
A recent report highlights the importance of developing STEM skills for all Americans because they enable individuals to succeed in personal, professional, and civic life and are vital to the innovations and prosperity of the Nation (NSTC 2018). For most nonmajors, the few general education lower-division science elective courses are their only exposure to science. Therefore, it is critical to provide effective learning experiences of the scientific process, such as these authentic inquiry projects, for nonmajors in these introductory science courses.

Strong learning gains for women and URM students
Given the continued challenge of recruitment, retention, and graduation rates of women and URM students in STEM disciplines and the lower graduation rates of URM students across all majors, the similar gains between women and men and the greater gains of the URM than non-URM students in self-perception of interest in ecology through the authentic inquiry project (Fig. 4c, d) may be particularly salient. Both women and men self-report significant learning gains in addition to interest in ecology. Whereas women had similar levels of gains as men in self-perception of ability to formulate hypothesis and understanding of how research is conducted, they self-reported significantly greater gains than men in ability to evaluate a scientific report (Fig. 4c), although the gain in their actual performance in evaluating a scientific report in the direct assessment was similar (higher but not statistically different) to that of men. In terms of the performance in the culminating inquiry report, women had a significantly higher level of overall performance, and higher or similar level of performance in all evaluation subcategories for the inquiry reports, than men. These results reveal the importance of further investigation into differences in outcomes related to gender and self-reported gains compared to direct measures. Research findings on gender differences in students' self-evaluation of academic ability in a specific academic area or domain-referred to as academic selfconcept-continue to suggest that male students typically report higher self-concepts than female students, in male stereotyped fields, such as mathematics and science, despite a lack of differences in performance (Nagy et al. 2010;Sáinz and Eccles 2012). While we cannot make a similar claim here without further investigation, the fact that women self-reported similar gains as men in some categories yet performed better than men in the culminating report is important to note. Academic self-concept has been identified as an important contributor to academic achievement (Eccles et al. 1993;Marsh et al. 2005), motivation (Eccles et al. 1983), and career choice (Luzzo and McWhirter 2001). Given the importance of self-concept in learning, future pre-IP self-reports, which were designed as formative assessments in this study, can be used more intentionally to guide pedagogical decisions throughout the course related to student self-concept.
Both URM and non-URM students reported significant learning gains in addition to self-perception of interest in ecology. URM students had greater gains in self-perception of ability to formulate hypotheses and understanding of how research is conducted than non-URM students (Fig. 4d). Although URM students reported similar levels of gains as non-URM students in self-perception of ability to evaluate a scientific report (Fig. 4d), the gain in their actual performance in evaluating a scientific report in direct assessment was significantly greater than that of non-URM students. The pre-IP performance of URM was lower, but their post-IP performance was similar to that of non-URM students. The overall performance in the inquiry report was similar between the URM and non-URM students which suggests that the IP helped close the gap between URM students' initial preparation and performance compared to non-URM students. The learning gains in both self-perception of learning and actual performance for women and URM students may have implications for academic success, retention in general, and retention and success in STEM. Research shows that students' belief in their ability to successfully accomplish academic tasks (i.e., their sense of selfefficacy) can impact performance and persistence in their educational pursuits (Bandura 1993;Lent et al., 2003). MacPhee et al. (2013) identify the importance of self-efficacy to student performance for women and URM students in particular. Students' confidence in their abilities can also impact their sense of belonging in their chosen field and/or in college in general. Women and URM STEM students are especially vulnerable to feeling as if they do not fit in (Stout et al. 2011) and integration and sense of belonging has been shown to be a predictor of student academic success in STEM disciplines (Cole and Espinoza 2008). Written responses from student surveys regarding their experiences with the inquiry project consistently highlight the value of working with peers throughout the project. For future studies related to this project, it would be beneficial to investigate the specific benefits of peer learning and whether it contributed positively to students' sense of belonging and self-efficacy. Chen and Soldner (2013) reported that an increase in the probability of undergraduate college students leaving STEM majors was associated with poor performance in STEM courses relative to non-STEM courses. They also reported that less success in STEM courses was also associated with an increased probability of dropping out of college altogether. The findings in this study related to women and URM students make a strong case for providing authentic inquiry experiences in lower-division science courses, where attitudes about learning and ability in the sciences are shaped.

Limitations
A limitation of this study is that it is not a true experiment with controls. It is conducted in one large introductory course over time. It was a diverse class with 400-550 students from 30 to 50 different majors each semester (in two sections). The situational factors for the class varied over the decade as well, with changes in student demographics, TAs and instructors, textbook, classroom space and learning management system, and instructor experience and teaching practices. The consistent, significant changes in both student self-perception and student performance, despite this variability over the decade, provide some confidence in the meaningful impact of the authentic inquiry experiences. The design of the study resembled, to a degree, the Recurrent Institutional Cycle Design (Campbell and Stanley 1963) where the pretests act as a quasi-control group and the repetition of the course shows replication. If a consistent effect is found despite the uncontrolled situational factors, such as in this study, then there is an argument for a treatment effect.
Although the eight survey questions related to selfperceptions of learning gains were evaluated positively using retroactive cognitive interviews, they were part of an online survey that was designed for formative assessment for the inquiry project and was not a validated instrument. Given the potential role these authentic inquiry projects can play in filling the gap to provide STEM experiences for all students, further work needs to be done to develop or adapt validated instruments to assess the impact of these authentic inquiry experiences on student learning. The survey questions related to self-perceptions of learning gains used a retrospective pretest approach, which likely overestimate the gains (Taylor et al. 2009) due to motivational biases (Hill and Betz 2005). These survey questions were designed with a focus on facilitating student self-reflection on their learning (as a learning process), as well as assessing their self-perception of change through the inquiry project, for which the use of retrospective pretest was more appropriate (Hill and Betz 2005). Use of a prospective pretest, administered at the beginning of the project, would result in lower gains, likely underestimated gains due to response-shift biases (Howard and Dailey 1979), which are more appropriate for estimating program effects (Hill and Betz 2005). The true effect sizes for the changes in student selfperceptions are likely lower than the estimated ones. If the true effect sizes were only half of the estimated ones, we would have small to medium effect size for changes in self-perceptions of interest in ecology and medium to large effect sizes for changes in self-perceptions of ability to formulate testable hypotheses, understanding how ecologists conduct research, and ability to evaluate the quality of scientific reports.

Conclusions
Both scholarly research and national policies identify critical needs for reform in undergraduate science education to provide STEM experiences in lower-division courses to engage and improve the retention of students for developing the STEM workforce. Reforms in undergraduate science education are also critically needed to develop science literacy and STEM skills for all students, which help individuals succeed in personal, professional, and civic life and are vital to the innovations and prosperity of the Nation. Open or authentic inquiry and undergraduate research are effective science experiences leading to multiple gains in student learning and development. Most inquiry-based learning activities, however, are implemented in laboratory classes and the majority of them are guided inquiries. Although highly effective, the traditional apprentice approach of undergraduate research can only engage a limited number of selected students; and the development in course-based undergraduate research experiences (CURE) has significantly expanded the reach. It is still a great challenge, however, to provide STEM research experiences to all students, especially to nonmajors and in large introductory courses. It is productive to explore other effective approaches to provide STEM experiences that allow students to engage in the scientific process.
We examined a web-based authentic inquiry project implemented in a high-enrollment introductory ecology course for over a decade. Our results in both student self-perception of learning and their actual performance show that the authentic inquiry experiences were consistently associated with significant learning gains in terms of interest, understanding, and skills of the scientific process for all students. The authentic inquiry experiences proved effective for lower-division students, nonmajors, and women and URM students. Lowerdivision students self-reported significant gains in interest in ecology and understanding and skills of the Wu et al. International Journal of STEM Education (2021)  scientific process and had significant gains in performance in evaluating a scientific report, at levels similar or greater than those of upper division students. Nonmajors had significant learning gains at a similar level as majors, based on both self-perception of learning and actual performance. The significant learning gains of women and URM students were similar or greater than those of their counterparts. Furthermore, the authentic inquiry experiences were particularly helpful to students who were less prepared with regard to the actual ability to evaluate a scientific report and narrowed the performance gap. These findings suggest that authentic inquiry experiences can serve as an effective approach for engaging students in high-enrollment, introductory science courses. They can be an early-intervention for the majors to promote their interest and enhance their understanding and skills of the scientific process and improve their readiness for engagement of more in-depth research experiences in the upper-level courses. In the same process, providing these authentic inquiry experiences to all students can facilitate the development of science literacy and STEM skills of nonmajors and help them succeed in personal, professional, and civic life and contribute to the innovations and prosperity of society.