13
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Call on me! Undergraduates’ perceptions of voluntarily asking and answering questions in front of large-enrollment science classes

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Allowing students to voluntarily ask and answer questions in front of the whole class are common teaching practices used in college science courses. However, few studies have examined college science students’ perceptions of these practices, the extent to which students choose to engage in these practices, and what discourages students from participating. In this study, we surveyed 417 undergraduates at a research-intensive institution about their experiences asking and answering questions in large-enrollment college science courses. Specifically, students answered questions about to what extent they perceive voluntarily asking and answering questions in large-enrollment science courses is helpful to them and why. They also answered questions about to what extent they engage in asking and answering questions in large-enrollment college science courses and what factors could discourage them from participating. Using binary logistic regression, we examined whether there were differences among students of different demographic groups regarding their opinions about asking and answering questions. We found that overwhelmingly students reported that other students voluntarily asking and answering instructor questions is helpful to them. Notably, compared to continuing generation students, first-generation students were more likely to perceive other students asking questions to be helpful. Despite perceiving asking and answering questions to be helpful, over half of students reported that they never ask or answer questions in large-enrollment college science courses during a semester, and women were more likely than men to report never asking questions. We identified fear of negative evaluation, or students’ sense of dread associated with being unfavorably evaluated, as a primary factor influencing their decision to answer instructor questions. This work adds to a growing body of literature on student participation in large-enrollment college science courses and begins to uncover underlying factors influencing student participation.

          Related collections

          Most cited references69

          • Record: found
          • Abstract: found
          • Article: not found

          Getting Under the Hood: How and for Whom Does Increasing Course Structure Work?

          INTRODUCTION Studies across the many disciplines in science, technology, engineering, and mathematics (STEM) at the college level have shown that active learning is a more effective classroom strategy than lecture alone (reviewed in Freeman et al., 2014). Given this extensive evidence, a recent synthesis of discipline-based education research (DBER; Singer et al., 2012) suggests that it is time to move beyond simply asking whether or not active learning works to more focused questions, including how and for whom these classroom interventions work. This type of research is being referred to as second-generation education research (Eddy et al., 2013; Freeman et al., 2014) and will help refine and optimize active-learning interventions by identifying the critical elements that make an intervention effective. Identifying these elements is crucial for successful transfer of classroom strategies between instructors and institutions (Borrego et al., 2013). Using these DBER recommendations as a guide, we have replicated a course intervention (increased course structure; Freeman et al., 2011) that has been demonstrated to increase student achievement at an R1 university and explored its effectiveness when transferred to a different university with a different instructor and student population. Specifically, we expanded on the original intervention studies by exploring 1) how different student subpopulations respond to the treatment in terms of achievement and 2) course-related behaviors and perceptions. These two forms of assessment will help us both elucidate how this intervention achieves the observed increases in student achievement and identify the elements critical for the intervention's success. Are Active-Learning Interventions Transferable? The transferability of active-learning interventions into novel educational contexts is critical to the successful spread of active learning across universities (National Science Foundation, 2013). Unfortunately, transferability of an intervention across contexts cannot be assumed, as there is some evidence that the success of classroom interventions depends on the student populations in the classroom (Brownell et al., 2013), instructor classroom management style (Borrego et al., 2013), and the topics being taught (Andrews et al., 2011). Thus, interventions that work with one instructor at one institution in one class may not necessarily transfer into novel contexts. Yet the majority of published active-learning interventions at the college level have been tested with at best one or two instructors who are usually at the same institution. We test the transferability of the increased course structure intervention (Freeman et al., 2011), which was effective at a Pacific Northwest R1 university with a predominately white and Asian student body, in a Southern R1 university with a different instructor (who had no contact with the original authors) and a more diverse student body. Additionally, the original study was an introductory biology course for aspiring majors, while the current implementation included mostly nonmajors in a mixed-majors general education course. Thus, in this study, we test the transferability of the increased course structure intervention across three contexts: 1) different instructors, 2) different student body, and 3) different courses (majors vs. nonmajors). Do Course Interventions Differentially Impact Achievement in Some Student Subpopulations? There is emerging evidence that classroom interventions could have different impacts on students from different cultural contexts. For example, Asian-American students learn less when they are told to talk through problems out loud compared with when they think through them silently. White students, on the other hand, performed just as well, and in some cases better, when allowed to talk through problems (Kim, 2002, 2008). This finding has implications for a differential impact of peer instruction on Asian students relative to their white classmates. In addition to different cultural norms for learning, students from different subpopulations bring different value sets into the classroom that can influence how they learn in different classroom environments. For example, one study found that when a setting is perceived as interdependent (rather than independent) first-generation students perform better, but continuing-generation students do not differ (Stephens et al., 2012). Positive interpersonal feelings also increased the performance of Mexicans but not European Americans on a learning task (Savani et al., 2013). Thus, the classroom environment itself could have differential impacts on different students. Findings like these begin to call into question whether “one-size-fits-all” classrooms interventions are possible and encourage researchers to disaggregate student response data by subpopulations (Singer et al., 2012). Up until now, the majority of college-level program evaluations that have disaggregated student groups have done so broadly based on their historical presence in science (underrepresented minority [URM] vs. majority students). Also, most of these studies have explored the impact of supplemental instruction outside an actual science course on student achievement (reviewed in Tsui, 2007; Fox et al., 2009). Only a few STEM course–based curricular interventions have disaggregated student performance (physics: Etkina et al., 1999; Hitt et al., 2013; math: Hooker, 2010; physical science: Poelzer and Zeng, 2008). In biology, two course-based active-learning interventions have been shown to reduce achievement gaps between historically underrepresented students and majority students. Preszler (2009) replaced a traditional course (3 h of lecture each week) with a reformed course that combined 2 h of lecture with 1 h of peer-led workshop. This change in class format increased the grades of all participating students, and the performance of URM students and females increased disproportionately. The second intervention was the increased course structure intervention (Haak et al., 2011). This intervention decreased the achievement gap between students in the Educational Opportunities Program (students from educational or economically disadvantaged backgrounds) and those not in the program by 45% (Haak et al., 2011). Studies that cluster students into two categories (URM vs. majority) assume that students within these clusters respond in the same way to classroom interventions. Yet the URM label includes black, Latin@, 1 Native American, Hawaiian and Pacific Islander students, and the majority designation is often both white and Asian students. The consequence of clustering leads to conclusions that are too generalized; for example, that black students will respond in a similar way to a treatment as do Latin@ students (Carpenter et al., 2006). Yet the different racial and ethnic groups that are included in the URM designation have very different cultures, histories, and exposure to college culture that could impact whether a particular classroom strategy is effective for them (Delpit, 2006). National trends in K–12 education, revealing different achievement patterns and trajectories for black and Latin@ students, also challenge the assumption that URMs are a homogeneous group (Reardon and Galindo, 2009). To our knowledge, only two college-level curricular interventions in STEM, and none in biology, have subdivided the URM category into more fine-grained groups to explore the effectiveness of classroom interventions for these different student populations. In these studies, students of different racial/ethnic groups responded differently to the classroom interventions (Etkina et al., 1999; Beichner et al., 2007). This was demonstrated most dramatically by Beichner et al. (2007), in whose study white and black students were the only groups to benefit significantly from an active-learning intervention. These findings highlight the need for more studies to analyze college course performance by racial/ethnic groups. These smaller categories can still be problematic, as they still combine students with very different cultural backgrounds and experiences into broad categories such as white, Asian, Native American, and Latin@ (Lee, 2011; Carpenter et al., 2006), but disaggregating students to this level will provide a finer-grained picture of the classroom than has been previously reported. A second population of students of concern is first-generation students. These students have limited exposure to the culture of college and are often from working-class backgrounds that may be at odds with the middle-class cultural norms of universities (e.g., the emphasis on abstract over practical knowledge and independence over interdependence; Stephens et al., 2012; Wilson and Kittleson, 2013). The differences between first- and continuing-generation students have been shown to change how they respond to “best-practices” in teaching at the college level, sometimes to the extent that they respond oppositionally (Padgett et al., 2012). In biology, we are not aware of any studies that have explored the response of this population to an active-learning intervention, although there has been promising work with a psychology intervention (Harackiewicz et al., 2014). In our study, we explored whether racial (black, white, Native American, Asian) and/or ethnic (Latin@) identity and first-generation versus continuing-generation status influenced a student's response to the increased course structure. We hypothesized that different student groups would vary in the extent to which an active-learning intervention would influence their exam performance. How Do Active-Learning Interventions Change Course-Related Behaviors and Attitudes of Students? Understanding how interventions change course-related behaviors and attitudes is an important next step in education research, as these behaviors and attitudes mediate how the course structure influences performance (Singer et al., 2012). Some work has already described how active learning increases achievement at the college level, although this work is lacking in the STEM disciplines and usually only looks at the student body as a whole. Courses with more active learning are positively correlated with increased student self-reported motivation and self-efficacy (van Wyk, 2012) and a deeper approach to learning (Eley, 1992). Unfortunately, this work is only done in active-learning classrooms, and either there is no control group (cf. Keeler and Steinhorst, 1995; Cavanagh, 2011) or the study asks students to compare their experience with a different course with a different instructor and content in which they are currently enrolled (cf. Sharma et al., 2005). In our study, we examine how student attitudes and course-related behaviors change between a traditionally taught and an increased-structure course with the same content and instructor. Reviewing the elements of successful classroom interventions suggests possible factors that could contribute to the increase in student achievement. For example, the increased course structure intervention involves the addition of three elements: graded preparatory assignments, extensive student in-class engagement, and graded review assignments (Table 1). Proponents of the increased course structure intervention have hypothesized that the additional practice led to the rise in student performance (Freeman et al., 2011). Yet providing opportunities for practice might not be enough. When and what students practice, as well as the context of and their perceptions of the practice may influence to the impact of the extra practice on learning. Table 1. The elements of a low-, moderate-, and high-structure course Graded preparatory Student in-class engagement Graded review assignments assignments (example: clicker questions, (example: practice exam (example: reading quiz) worksheets, case studies) problems) Low (traditional lecture) None or  40% of course time ≥1 per week aNeed either a preparatory or review assignment once per week, but not both. There are many possible factors that change with the implementation of increased course structure. We focus on three candidate factors, but it is important to recognize that these factors are not mutually exclusive or exhaustive. Factor 1. Time allocation: Increasing course structure will encourage students to spend more time each week on the course, particularly on preparation. How students allocate their out-of-class study time can greatly influence their learning and course achievement. Many students adopt the strategy of massing their study time and cramming just before exams (Michaels and Miethe, 1989; McIntyre and Munson, 2008). Yet distributed practice is a more effective method for learning, particularly for long-term retention of knowledge (Dunlosky et al., 2013). The increased course structure helps students distribute their study time for the class by assigning daily or weekly preparatory and review assignments. These assignments 1) spread out the time students spend on the course throughout the quarter (distributed practice, rather than cramming just before exams) and 2) encourage students to engage with a topic before class (preparatory assignment) and then again in class (in-class activities) and again after class (review assignments). In addition, the preparatory assignments not only encourage students to read the book before class, but also have students answer questions related to the reading, which is a more effective method for learning new material then simply highlighting a text (Dunlosky et al., 2013). We believe that the outside assignments scaffold how students spend time on the course and are one of the primary factors by which increased course structure impacts student performance. However, this idea has never been explicitly tested. In this study, we asked students to report how much time they spent outside of class on the course weekly and what they spent that time doing. We predicted that students would spend more time each week on the course and would spend more time on the parts associated with course points. These results would imply an increase in distributed practice and demonstrate that the instructor can successfully guide what students spend time on outside of class. Factor 2. Classroom culture: Increasing course structure will encourage students to perceive the class as a community. To learn, students must feel comfortable enough to be willing to take risks and engage in challenging thinking and problem solving (Ellis, 2004). High-stakes competitive classrooms dominated by a few student voices are not environments in which many students feel safe taking risks to learn (Johnson, 2007). The increased-structure format has students work in small groups, which may help students develop a more collaborative sense of the classroom. Collaborative learning in college has been shown to increase a sense of social support in the classroom as well as the sense that students like each other (Johnson et al., 1998). This more interdependent environment also decreases anxiety and leads to increased participation in class (Fassinger, 2000) and critical thinking (Tsui, 2002). Increased participation on in-class practice alone could lead to increased performance on exams. In addition, a more interdependent environment has been shown to be particularly important for the performance of first-generation students and Mexican students (Stephens et al., 2012; Savani et al., 2013). Finally, feeling like they are part of a community increases both performance and motivation, especially for historically underrepresented groups (Walton and Cohen, 2007; Walton et al., 2012). We predicted that students in an increased-structure course would change how they viewed the classroom, specifically, that they would feel an increased sense of community relative to students in low-structure courses. Factor 3. Course value: Increasing course structure will increase the perceived value of the course to students. In the increased-structure course, students come to class having read the book, or at least worked through the preparatory assignment, and thus have begun the knowledge acquisition stage of learning. This shift of content acquisition from in class to before class opens up time in the classroom for the instructor to help students develop higher-order cognitive skills (Freeman et al., 2011), providing opportunities to encourage students to make connections between course content and real-world impacts and to work through challenging problems. These opportunities for practice and real-world connections are thought to be more engaging to students then traditional lecture (Handelsman et al., 2006). Thus, through increased engagement with the material (because of increased interest in it) student performance will increase (Carini et al., 2006). We predicted students in the increased-structure course would feel more engaged by the material and thus would value the course more. We considered these three factors—time allocation, classroom culture, and course value—when surveying students about their perceptions and behaviors. We analyzed student survey responses in both the traditional and increased-structure course to identify patterns in responses that support the impact of these three factors on student performance. In summary, we test the transferability of one active-learning intervention (increased course structure; Freeman et al., 2011) into a novel educational context. We expand upon the initial studies by 1) disaggregating student performance to test the hypothesis that student subpopulations respond differently to educational interventions and 2) using student self-reported data to identify possible factors (time allocation, classroom culture, course value) through which the intervention could be influencing student achievement. METHODS AND RESULTS The Course and the Students The course, offered at a large research institution in the Southeast that qualifies as a more selective, full-time, 4-yr institution with a low transfer-in rate on the Carnegie scale, is a one-semester general introduction to biology serving a mixed-majors student population. The course is offered in both Fall and Spring semesters. Course topics include general introductions to the nature of science, cell biology, genetics, evolution and ecology, and animal physiology. The class met three times a week for 50 min each period. An optional laboratory course is associated with the lecture course, but lab grades are not linked to lecture grade. Although multiple instructors teach this course in a year, the data used in this study all come from six terms taught by the same instructor (K.A.H.). The instructor holds a PhD in pathology and laboratory medicine and had 6 yr of experience teaching this course before any of the terms used in this study. The majority of students enrolled in the course were in their first year of college (69%), but the course is open to all students. The class size for each of the six terms of the study averaged 393 students. The most common majors in the course include biology, exercise and sports science, and psychology. The combined student demographics in this course during the years of this study were: 59% white, 13.9% black, 10.3% Latin@, 7.4% Asian, 1.1% Native American, and 8% of either undeclared race, mixed descent, or international origin. In addition, 66.3% of the students identified as female, 32.1% male, and 1.6% unspecified gender, and 24% of these students were first-generation college students. The Intervention: Increasing Course Structure Throughout our analyses, we compared the same course during three terms of low structure and three terms of moderate structure (Table 1). How these designations—low and moderate—were determined is explained later in the section Determining the Structure Level of the Intervention. During the low-structure terms of this study (Spring 2009, Fall 2009, Spring 2010), the course was taught in a traditional lecture format in which students participated very little in class. In addition, only three homework assignments were completed outside the classroom to help students prepare for four high-stakes exams (three semester exams and one cumulative final). In the reformed terms (Fall 2010, Spring 2011, Fall 2011), a moderate-structure format was used with both in-class and out of class activities added. The elements added—guided-reading questions, preparatory homework, and in-class activities—are detailed below, and Table 2 gives some specific examples across one topic. Table 2. Sample question types associated with the three assignment types added during the moderate-structure terms Example learning objective: Determine the possible combinations of characteristics produced through independent assortment and correlate this to illustrations of metaphase I of meiosis Preclass (ungraded) Preclass (graded) In-class (extra credit) Example guided-reading questions 1. Examine Figure 8.14, why are the chromosomes colored red and blue in this figure? What does red or blue represent? 2. Describe in words and draw how independent orientation of homologues at metaphase I produces variation. Example preparatory homework question Independent orientation of chromosomes at metaphase I results in an increase in the number of: a) Sex chromosomes b) Homologous chromosomes c) Points of crossing over d) Possible combinations of characteristics e) Gametes Example in-class questions Students were shown an illustration of a diploid cell in metaphase I with the genotype AaBbDd. For all questions, students were told to “ignore crossing over.” 1. For this cell, what is n = ? 2. How many unique gametes can form? That is, how many unique combinations of chromosomes can form? 3. How many different ways in total can we draw metaphase I for this cell? 4. How many different combinations of chromosomes can you make in one of your gametes? Guided-Reading Questions. Twice a week, students were given ungraded, instructor-designed guided-reading questions to complete while reading their textbook before class. These questions helped to teach active reading (beyond highlighting) and to reinforce practice study skills, such as drawing, using the content in each chapter (Table 2; Supplemental Material, section 1). While these were not graded, the expectation set by the instructor was that the daily activities built from this content and referred to them, without covering them in the same format. Keys were not posted. Preparatory Homework. Students were required to complete online graded homework associated with assigned readings before coming to class (Mastering Biology for Pearson's Campbell Biology: Concepts and Connections). The instructor used settings for the program to coach the students and help them assess their own knowledge before class. Students were given multiple opportunities to answer each question (between two and six attempts, depending on question structure) and were allowed to access hints and immediate correct/incorrect answer feedback. The questions were typically at the knowledge and comprehension levels in Bloom's taxonomy (Table 2). In-Class Activities. As course content previously covered by lecture was moved into the guided-reading questions and preparatory homework, on average 34.5% of each class session was now devoted to activities that reinforced major concepts, study skills, and higher-order thinking skills. Students often worked in informal groups, answering questions similar to exam questions by using classroom-response software (www.polleverywhere.com) on their laptops and cell phones. Thirty-six percent of these questions required a student to apply higher-order cognitive skills such as application of concepts to novel scenarios or analysis (see Supplemental Material, section 2, for methods). Although responses to in-class questions were not graded, students received 1–2 percentage points of extra credit on each of four exams if they participated in a defined number of in-class questions. The remaining 65.5% of class time involved the instructor setting up the activities, delivering content, and course logistics. These percentages are based on the observation of videos from four randomly chosen class session videos. The course was videotaped routinely, so the instructor did not know in advance which class sessions would be scored. Determining the Structure Level of the Intervention Using the data from two articles by Freeman and colleagues (Freeman et al., 2007, 2011) and consulting with Scott Freeman (personal communication) and the Biology Education Research Group at the University of Washington, we identified the critical elements of low, moderate, and high structure (Table 1). Based on these elements, our intervention was a “moderate” structure course: we had weekly graded preparatory homework, students were talking on average 35% of class time, and there were no graded review assignments. Study 1: Does the Increased Course Structure Intervention Transfer to a Novel Environment? Total Exam Points by Course Structure. Our measure of achievement was total exam points. We chose this measure over final grade, because the six terms of this course differed in the total points coming from homework (3 vs. 10%) and the opportunity for bonus points could inflate the final grade in the reformed class. Instead, we compared the total exam points earned out of the possible exam points. As total exam points varied across the six terms by 5 points (145–150), all terms were scaled to be out of 145 points in the final data set. As this study took place over 4 years, we were concerned that term-to-term variation in student academic ability and exam difficulty could confound our survey and achievement results. To be confident that any gains we observed were due to the intervention and not these other sources of variation, we controlled for both exam cognitive level (cf. Crowe et al., 2008) and student prior academic achievement (for more details see Supplemental Material, section 2). We found that exams were similar across all six terms and that the best control for prior academic achievement was a student's combined SAT math and SAT verbal score (Table 3; Supplemental Material, section 2).We therefore used SAT scores as a control for student-level variation in our analyses and did not further control for exams. Table 3. Regression models used to determine whether 1) increased structure can be transferred to a novel environment (study1) and 2) student subpopulations vary in their response to increased course structure (study 2)a Base model: Student performance influenced by course structure Outcome ∼ Term + Combined SAT scores + Gender + Course Structure Model 2: Impact of course structure on student performance varies by race/ethnicity/nationality. Outcome ∼ Term + SAT scores + Gender + Course Structure + Race + Race × Course Structure Model 3: Impact of course structure on student performance varies by first-generation status. Outcome ∼ Term + SAT scores + Gender + Course Structure + First-generation + First-generation × Course Structure aBolded terms in models 2 and 3 are the new additions that test the specific hypotheses that the impact of course structure will vary by student populations. The outcome variable is either student achievement on exams or student failure rates. Course and Exam Failure Rates by Course Structure. To become a biology major, students must earn a minimum of a “C−” in this course. Thus, for the purpose of this study, we considered a grade below 72.9% to be failing, because the student earning this would not be able to move on to the next biology course. We measured failure rates in two ways: 1) final grade and 2) total exam points. Although the components contributing to final course grade changed across the study, this “C−” cutoff for entering the biology major remained consistent. This measure may be more pertinent to students than overall exam performance, because it determines whether or not they can continue in the major. To look more closely at whether increased student learning was occurring due to the intervention, we looked at failure rates on the exams themselves. This measure avoids the conflation of any boost in performance due to extra credit or homework points or deviations from a traditional grading scale but is not as pertinent to retention in the major as course grade. The statistical analysis for this study is paired with that of study 2 and is described later. Study 2. Does the Effectiveness of Increased Course Structure Vary across Different Student Populations? In addition to identifying whether an overall increase in achievement occurred during the moderate-structure terms, we included categorical variables in our analyses to determine whether student subpopulations respond differently to the treatment. We focused on two designations: 1) student ethnic, racial, or national origin, which included the designations of Asian American, black, Latin@, mixed race/ethnicity, Native American, white, and international students; and 2) student generational status (first-generation vs. continuing-generation college student). Both of these factors were determined from student self-reported data from an in-class survey collected at the end of the term. Statistical Analyses: Studies 1 and 2 Total Exam Points Earned by Course Structure and Student Populations. We modeled total exam points as continuous response and used a linear regression model to determine whether moderate course structure was correlated with increased exam performance (Table 3). In our baseline model, we included student combined SAT scores, gender identity (in this case, a binary factor: 0 = male, 1 = female), and the term a student was in the course (Fall vs. Spring) as control variables. Term was included, because the instructor has historically observed that students in the Spring term perform better than students in the Fall term. To test our first hypothesis, that increasing the course structure would increase performance (study 1), we included treatment (0 = low structure, 1 = moderate structure) as a binary explanatory variable. To test our second hypothesis, that students from distinct populations may differ in their response to the classroom intervention, we ran two models (Table 3) that included the four variables described above and either 1) student racial and ethnic group (a seven-level factor) or 2) student first-generation status (a binary factor: 1 = first generation, 0 = continuing generation). If any of these demographic descriptors were not available for a student, that student was not included in the study. We ran separate regression models for race/ethnicity and generation status, because we found these terms were correlated in an initial test of correlations between our possible explanatory variables (Kruskal-Wallis χ2 = 68.1, df = 5, p  10 h) 1–3 h 4 –7 h 2.60 (2.02–3.35) 0.982 (0.974–0.990) Complete readings before class (Never, Rarely, Sometimes, Often) Rarely Sometimes 1.97 (1.54–2.52) 0.994 (0.985–1.00) Preparatory homework importance (Not at all, Somewhat, Important, Very) Somewhat Important 4.6 (3.56–5.85) 0.98 (0.97–0.98) Review notes after class (Never, Rarely, Sometimes, Often) Sometimes Sometimes 0.738 (0.583–0.933) 0.972 (0.965–0.980) Complete textbook review questions (Never, Rarely, Sometimes, Often) Rarely Rarely 0.50 (0.400–0.645) 0.98 (0.972–0.99) Factor 2. Classroom culture: Increasing course structure will encourage students to perceive the class as more of a community. Contribute to classroom discussions (Never, Rarely, Sometimes, Often) Never Rarely 1.13 (0.890–1.44) 0.99 (0.988–1.00) Work with a classmate outside of class (Never, Rarely, Sometimes, Often) Sometimes Sometimes 0.83 (0.664–1.06) 0.984 (0.0977–0.991) Believe students in class know each other (Strongly disagree, Disagree, Neutral, Agree, Strongly agree) Neutral Neutral 2.4 (1.92–3.09) 0.996 (0.989–1.00) Believe students in class help each other (Strongly disagree, Disagree, Neutral, Agree, Strongly agree) Agree Agree 1.22 (0.948–1.57) 1.01 (0.999–1.02) Perceive class as a community (Strongly disagree, Disagree, Neutral, Agree, Strongly agree) Neutral Neutral 1.99 (1.57–2.52) 0.986 (0.979–0.993) Factor 3. Course value: Increasing course structure will increase the value of the course to students. Amount of memorization (Most, Quite a bit, Some, Very Little, None) Some Some 1.07 (0.84–1.35) 0.98 (0.982–0.997) Attend lecture (Never, Rarely, Sometimes, Often) Often Often 0.72 (0.471–1.09) 0.984 (0.971–0.997) Use of skills learned (Strongly disagree, Disagree, Neutral, Agree, Strongly agree) Agree Agree 0.909 (0.720–1.15) 0.991 (0.983–0.998) Lecture importance (Not at all, Somewhat, Important, Very) Very Important Important 0.57 (0.448–0.730) 0.998 (0.991–1.01) aThe second and third columns are the raw median responses under each structure. The fourth and fifth columns are the odds ratios from the log-odds regression including course structure and SAT scores as explanatory variables (> 1 = students more likely to report a higher value;   0.0001). Interestingly, even with the additional investment of hours each week, a focus on preparation seemed to represent a trade-off with time spent reviewing: after we controlled for SAT math and reading scores, students were 1.4 times less likely to review their notes after class as frequently (β = −0.30 ± 0.12 SE, p = 0.011) and 1.9 times less likely to complete the practice questions at the end of each book chapter (β = −0.68 ± 0.12 SE, p  0.0001). After we controlled for SAT math and reading scores, students also did not vary in their frequency of lecture attendance (although this could be because it was high to begin with; β = −0.32 ± 0.21 SE, p = 0.13). Student perception of the importance of the skills they learned in the class did not vary between course structures (β = −0.09 ± 0.12 SE, p = 0.42) nor did they perceive that the moderate-structure course involved more cognitive skills other than memorization (β = 0.07 ± 0.12 SE, p = 0.58). Population-Specific Patterns Black Students Demonstrate Differences in Behaviors and Perceptions among Student Populations. On the basis of the results in study 1, which demonstrated that increased course structure was most effective for black and first-generation students, we explored student survey responses to determine whether we could document what was different for these populations of students. We identified one behavior and three perception questions for which adding a binomial variable identifying whether a student was part of the black population or not increased the fit of the log-odds regression to the data. These differential responses may help us elucidate why this population responded so strongly to the increased-structure treatment. The one behavior that changed disproportionately for black students relative to other students in the class was speaking in class. Under low structure, black students were 2.3 times more likely to report a lower level of in-class participation than students of other ethnicities (β = −0.84 ± 0.35 SE, p = 0.012). The significant interaction between being black and being enrolled in the moderate-structure course (β = 0.89 ± 0.38 SE, p = 0.019) means this difference in participation completely disappears in the modified course. Perception of the course also differed for black students compared with the rest of the students in three ways. First, black students were more likely to report that the homework was important for their understanding relative to other students in the class under both low and moderate structure. (β = 1.06 ± 0.31 SE, p = 0.0006). The significant interaction term between course structure and black racial identity indicates the difference between black students and other students in the class decreases under moderate structure (Table 5; β = 1.06 ± 0.31 SE, p = 0.0006), but this seems to be due to all students reporting higher value for the homework under moderate structure. In addition, black students perceived that there were less memorization and more higher-order skills in the class relative to other students in the class (β = −0.39 ± 0.59 SE, p = 0.024) under both low and moderate structures. Finally, there was a trend for black students to be 1.3 times more likely to report that the skills they learned in this course would be useful for them (β = 0.29 ± 0.16 SE, p = 0.07). Unlike the clear patterns with black students, we found no significant differences in survey responses based on first-generation status. Behaviors and Perceptions That Correlate with Success Are More Numerous under Moderate Structure. During the low-structure term, only lecture attendance impacted exam performance (i.e., significantly improved the fit of the models to the exam performance data after we controlled for student SAT scores; F = 9.59, p < 0.0001). Specifically, students who reported attending fewer lectures performed worse on exams. Students who reported accessing the textbook website more tended to perform better on exams (F = 2.48, p = 0.060), but this difference did not significantly improve the fit of the model. In the moderate-structure terms, attending class (F = 9.59, p < 0.0001), speaking in class (F = 9.03, p < 0.0001), and hours spent studying (F = 10.6, p < 0.0001), reviewing notes (F = 3.19, p = 0.023), and seeking extra help (F = 5.94, p < 0.0001) all impacted student performance on exams. Additionally, one perception changed significantly: students with a higher sense of community performed better (F = 4.14, p = 0.0025). DISCUSSION With large foundation grants working toward improving STEM education, there has been a push for determining the transferability of specific educational innovations to “increase substantially the scale of these improvements within and across the higher education sector” (NSF, 2013). In this study, we provide evidence that one course intervention, increased course structure (Freeman et al., 2011), can be transferred from one university context to another. In addition to replicating the increase in student achievement across all students, we were able to elaborate on the results of prior research on increased course structure by 1) identifying which student populations benefited the most from the increased course structure and 2) beginning to tease out the factors that may lead to these increases. The Increased-Structure Intervention Can Transfer across Different Instructors, Different Student Bodies, and Different Courses (Majors vs. Nonmajors) One of the concerns of any classroom intervention is that the results depend on the instructor teaching the course (i.e., the intervention will work for only one person) and the students in it. We can test the independence of the intervention by replicating it with a different instructor and student body and measuring whether similar impacts on student achievement occur. The university at which this study took place is quite different from the university where the increased course structure intervention was developed (Freeman et al., 2011). Both universities are R1 institutions, but one is in the Southeast (and has a large black and Latin@ population), whereas the original university was in the Pacific Northwest (and has a high Asian population). Yet we find very similar results: in the original implementation of moderate structure in the Pacific Northwest course, the failure rate (defined as a course grade that would not allow a student to continue into the next course in the biology series) dropped from 18.2% to an average of 12.8% (a 29.7% reduction; Freeman et al., 2011). In our implementation of moderate structure, the failure rate dropped by a similar magnitude: from 26.6% to 15.6% (a 41.3% reduction). This result indicates that the impact of this increased-structure intervention may be independent of instructor and that the intervention could work with many different types of students. Some Students Benefit More Than Others from Increased Course Structure We found that transforming a classroom from low to moderate structure increased the exam performance of all students by 3.2%, and black students experienced an additional 3.1% increase (Figure 1A), and first-generation students experienced an additional 2.5% increase relative to continuing-generation students (Figure 1B). These results align with the small body of literature at the college level that indicates classroom interventions differ in the impact they have on student subpopulations (Kim, 2002; Preszler, 2009; Haak et al., 2011). Our study is novel in that we control for both student past academic achievement and disaggregate student racial/ethnic groups beyond the URM/non-URM binary. Our approach provides a more nuanced picture of how course structure impacts students of diverse demographic characteristics (independent of academic ability). One of the most exciting aspects of our results is that we confirm that active-learning interventions influence the achievement of student subpopulations differentially. This finding is supported by both work in physics (Beichner et al., 2007), which found an intervention only worked for black and white students, and work in psychology, which revealed Asian-American students do not learn as well when they are told to talk through problems out loud (Kim, 2002). These studies highlight how important it is for us to disaggregate our results by student characteristics whenever possible, as overall positive results can mask actual differential outcomes present in the science classroom. Students come from a range of educational, cultural, and historical backgrounds and face different challenges in the classroom. It is not surprising that in the face of this diversity one intervention type does not fit all students equally. Comparing our results with published studies in STEM focused on historically underrepresented groups, we see that our achievement results are of a similar magnitude to other interventions. Unlike our intervention, previous interventions generally are not implemented within an existing course but are either run as separate initiatives or separate courses or are associated with a series of courses (i.e., involved supplemental instruction [SI]; cf. Maton et al., 2000; Matsui et al., 2003). These SI programs are effective, but can be costly (Barlow and Villarejo, 2004), and because of the cost, they are often not sustainable. Of seven SI programs that report data on achievement and retention in the first term or first two terms of the program, and thus are directly comparable to our study results, failure rate reductions ranged from 36.3 to 77%, and achievement increased by 2.4–5.3% (Table 7). In our study, the failure rate reduction was 41.3%, and overall exam performance increased by 3.2% (6.2% for black students and 6.1% for first-generation students), which is within the range of variation for the short-term results of the SI studies. These short-term results may be an underestimate of the effectiveness of the SI programs, as some studies have shown that their effectiveness increases with time (Born et al., 2002). Yet the comparison still reveals promising results: one instructor in one course, without a large influx of money, can make a difference for students as large in magnitude as some supplemental instruction programs. Table 7. Changes in achievement and failure rate for SI programs in the first term of their implementationa Failure rate Achievement         % Change:     % Change: Study Classroom Non-SI SI failure rate Non-SI SI achievement Fullilove and Treisman, 1990 Calculus I 41% 7% 77 NA NA NA Wischusen and Wischusen, 2007 Biology I 18.6% 6.9% 62.9 ∼85% ∼87% 2.4 Rath et al., 2007 Biology I 27% 15% 44.4 ∼75% ∼79% 5.3 Peterfreund et al., 2007 Biology I 27% 15% 44 ∼75% ∼79% 5.3 Minchella et al., 2002 Biology I and II 30.2% 16.9% 44 ∼75% ∼78% 4 Barlow and Villarejo, 2004 General Chemistry 44% 28% 36.3 ∼80% ∼83% 3.8 Dirks and Cunningham, 2006 Biology I NA NA NA ∼81% ∼84% 3.7 aMost achievement data were reported on the 4.0 scale, and the percentage of points earned was approximated using a conversion scale. In comparison, in the current student population, we saw a 41.3% reduction in the failure rate and a 3.2–6.3% increase in achievement, depending on which student subpopulation was the focus. Exploring How Increased Course Structure Increases Student Performance Survey data allowed us to explore how student course-related behaviors and attitudes changed with increased course structure. We focused on three specific factors and found evidence that changes in time allocation contributed to increased performance and some support for changes in classroom culture also impacting learning. We did not find evidence to support the idea that the value students found in the course influenced their performance. Factor 1. Time Allocation. Under low structure, students on average spent only 1–3 h on the course outside of class, rarely came to class having read the assigned readings, and were highly dependent on the lecture for their learning. Students also placed little value on the occasional preparatory homework assignments. With the implementation of moderate structure, students increased the amount of time they spent on the course each week to 4–7 h, were twice as likely to come to class having read the assigned readings, and saw the preparatory assignments as being equally as important for their learning as the actual lecture component. These shifts in behaviors and perceptions support our hypothesis that increased course structure encourages students both to distribute their studying throughout the term and to spend more time on behaviors related to graded assignments. We believe that these changes in student behaviors and perceptions occurred because of the structure of accountability built into the moderate-structure course. Students reading before class is an outcome almost all instructors desire (based on the ubiquitous syllabus reading lists), but it is evident from our study and others that, under low structure, students were on average “rarely” meeting this expectation (see also Burchfield and Sappington, 2000). We found the dual method of assigning preparatory homework and making the reading more approachable with ungraded guided-reading questions increased the frequency of students reading before class. It seemed that course points (accountability) were necessary to invoke this change in student behavior, because we did not see a similar increase in the frequency with which students reviewed notes after class. It is possible that moving to high structure (Freeman et al., 2011), with its weekly graded review assignments, could increase the achievement of our students even more, because they would be held accountable for reviewing their notes more frequently. Factor 2. Classroom Culture. We found some evidence to support the hypothesis that increased course structure creates a community environment rather than a competitive environment. Under low structure, students did not seem to get to know the other students in the class and did not positively view the class as a community (although they did believe that students in the class tried to help one another). With increased structure, students were two times more likely to view the class as a community and 2.4 times more likely to say students in the class knew each other. This result is a critical outcome of our study, arguably as important as increased performance, because a sense of being part of a community (belonging) is crucial for retention (Hurtado and Carter, 1997; Hoffman et al., 2002) and has been correlated with increased performance for first-generation students (Stephens et al., 2012). When discussing reasons for leaving STEM, many students, particularly students of color and women, describe feelings of isolation and lack of belonging (Hewlett et al., 2008; Cheryan et al., 2009; Strayhorn, 2011). Because introductory courses are some of the first experiences students have in their major, these could potentially play a role in increasing retention simply by facilitating connections between students through small-group work in class. Factor 3. Course Value. We did not find support for the hypothesis that students in the moderate-structure class found the course to be more valuable than students in the low-structure course. First, there was no difference in how much students valued the skills they learned in the course, but this could be because they did not recognize that the low- and moderate-structure terms were asking them to do different things. Across both terms, students on average believed that they were doing the same amount of memorizing versus higher-order skills such as application and analysis, even though the instructor emphasized higher-order skills more in the moderate-structure terms. In addition, behaviorally, we did not see any evidence of a higher value associated with the course in terms of increased attendance. In fact there was no difference in attendance across treatments. The attendance result was surprising to us, because increased attendance has been shown to be a common result of making a classroom more active (Caldwell, 2007; Freeman et al., 2007); however, these previous interventions all assigned course points to in-class participation, whereas our interventions only gave students bonus points for participation. In a comparison of in-class attendance with and without points assigned to class participation, Freeman et al. (2007) found that attendance dropped in the class in which no points were assigned. Thus, it is possible that attendance in these classes could be increased in the future if points rather than extra credit were assigned for participation. This idea is supported by our data that it is actually the students with the highest predicted achievement (i.e., highest SAT scores) who are more likely to miss lecture. Because these students already were doing well in the course, it may be that the motivation of receiving a few bonus points for attending class was not enough encouragement. Additional evidence that changes in time allocation and classroom culture contribute to achievement comes from the correlation between survey responses and exam performance. Under moderate structure, the number of hours a student spent studying per week and a higher sense of community were both positively correlated with exam performance. The support for these two factors, time allocation and classroom culture, helps us identify potential critical elements for the implementation of the increased-structure intervention. First, students need to be made accountable for preparing before attending class. This can take multiple forms, including guided-reading questions, homework, and/or reading quizzes before class or at the start of class, but the key is that they need to be graded. Without this accountability in the low-structure terms, students were not doing the reading and were likely cramming the week before the exam instead of distributing their study time. The second critical element seems to be encouraging the students in the class to view themselves as a community through small-group work in class. Further research could explore how best to approach in-class work to develop this sense of community rather than competition. Changes in Achievement, Behaviors, and Perceptions Vary among Student Populations In addition to looking at overall patterns in student behaviors and perceptions, we can also disaggregate these data to begin to understand why some groups might benefit more from the intervention. From the achievement data, we identified black and first-generation students as populations who responded most strongly to the treatment. Patterns in behaviors and attitudes were apparent for one of these populations (black students) and not the other (first-generation students). The response of black students on our survey questions differed from other students in the class in three ways. First, under both classroom structures, black students were more likely to report that the homework contributed to their learning in the course, and there was a trend for black students more than any other student groups to report that they valued the skills they developed from this class more than other students. Second, black students perceived the class to require more higher-order skills. These results imply that these students had a greater need for the kind of guidance provided by instructor-designed assignments. Thus, the addition of more homework and more explicit practice may have had a disproportionate impact on these students' achievement. Third, black students were significantly less likely than other students to speak up in class, but this disparity disappeared under moderate structure. We suspect that the increased sense of the classroom as a community may have contributed to this increased participation. Although first-generation students did not differ in how they responded to survey questions versus continuing-generation students, they could still differ in how valuable the changes in the course were to them. In particular, the increased sense of community that seemed to correlate with the implementation of moderate structure could have helped them disproportionately, as has been demonstrated in a previous study (Stephens et al., 2012). In addition, although students grouped in the category first generation share some characteristics, they are also very different from one another in terms of culture, background, and the barriers they face in the classroom (Orbe, 2004; Prospero et al., 2012). For example, in our university setting, 55% of first-generation students have parents with low socioeconomic status and 50% transfer in from community colleges. The variation in students could thus obscure any patterns in their responses. Future analyses will attempt to distinguish subpopulations to identify patterns potentially hidden in our analysis. Limitations of This Work One of the major purposes of this article is to recognize that classroom interventions that work in one classroom may not work in others because 1) student populations differ in how they respond to classroom treatments, and 2) instructors do not always implement the critical elements of an active-learning intervention. Thus, it is important for us to note that, although we have shown that increased structure can work with both majors and nonmajors and with students from a range of racial and ethnic groups, we are still working in an R1 setting. More work needs to be done to establish the effectiveness of the increased course structure intervention in community college or comprehensive university settings (although the evidence that it works well for first-generation students is a good sign that it could transfer). In addition, this study was with one instructor, thus we can now say increased course structure has worked for two independent instructors (the instructor of the current course and the instructor of the original course; Freeman et al., 2011), but further work is necessary to establish its general transferability. In addition, this study has suggested two factors by which increased course structure seems to be working by 1) encouraging distributed practice with a focus on class preparation and 2) helping students view the class as more of a community. Yet these are only two of many possible hypotheses for how this intervention works. It is possible that assigned preparatory assignments and small-group work to encourage community are not the only elements critical for this intervention's success. Further studies could explore how to best implement activities in class or the impact of adding graded review assignments on achievement. Implications for Instructor and Researcher Best Practices As a result of implementing an increased course structure and examining student achievement and survey results, we identified the following elements critical for student success and the success of future implementations: Students are not a monolithic group. This result is not surprising. Students vary in many ways, but currently we do not know much about the impact of these differences on their experience with and approach to a college-level course. Future studies on student learning should disaggregate the students involved in the study (if possible), so instructors looking to implement an intervention can determine whether, and potentially how well, a particular intervention will work for their population of students. Accountability is essential for changing student behaviors and possibly grades. We found that without accountability, students were not reading or spending many hours each week on the course. With weekly graded preparatory homework, students increased the frequency of both behaviors. We did not provide them credit for reviewing each week, and we found the overall frequency of this behavior decreased (even though our results demonstrate that students who did review notes performed better). Survey questions are a useful method of identifying what behaviors an instructor might target to increase student performance. From our survey results, it seems that creating weekly review assignments might increase the frequency that students review their notes and thus increase their grades. Without the survey, we would not have known which behaviors to target. Overall, this work has contributed to our understanding of who is most impacted by a classroom intervention and how those impacts are achieved. By looking at the achievement of particular populations, we can begin to change our teaching methods to accommodate diverse students and possibly increase the effectiveness of active-learning interventions. Supplementary Material Supplemental Material
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Measurement of social-evaluative anxiety.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Gender Gaps in Achievement and Participation in Multiple Introductory Biology Classrooms

              INTRODUCTION Women are underrepresented in undergraduate science, technology, engineering, and mathematics (STEM) majors (National Science Foundation [NSF], 2011). Even fewer women pursue graduate school and careers in STEM fields, particularly careers in academia (Handelsman, 2005; National Research Council [NRC], 2007; Beede et al., 2011; NSF, 2011). The possible reasons for the gap in the persistence of females compared with males in STEM, frequently referred to as the “leaky pipeline,” are numerous and multifaceted (Clark Blickenstaff, 2005; Burke and Mattis, 2007), and despite a concentrated effort by funding agencies directed at both K–12 and colleges, the problem persists. The one exception to this pattern of underrepresentation of females in STEM is the field of biology. Women account for more than 60% of undergraduate biology majors and approximately half of all graduate students in the biosciences (Luckenbill-Edds, 2002; Amelink, 2009), unlike other STEM disciplines such as physical sciences, in which women make up only 43% of undergraduates (Amelink, 2009) and 20% of graduate students (Mulvey and Nicholson, 2011, 2012). Owing to the significant numbers of females pursuing biology, it is often assumed that biology is a STEM discipline that has overcome gender 1 disparities. In fact, this assumption is so prevalent that studies in chemistry and physics sometimes use biology as a positive control for comparisons of the observed gender gaps in their fields (e.g., Ferreira, 2003; Ecklund et al., 2012). Gender inequity in biology does emerge at the postgraduate level, however, as fewer female biologists pursue postdoctoral work or positions in academia relative to males (NSF, 2011). In addition, the observed distribution of professional prestige demonstrates gender inequities: even in a female-dominated field, women are less likely to be selected to participate in symposia, especially if they are organized by men (Isbell et al., 2012), and women are less likely to be first authors on biology papers as compared with other more “caring” fields (Lariviere et al., 2013). Published explanations for these differences are often based on individual-level decisions such as work/life balance preferences or the desire to start a family (Ceci and Williams, 2010, 2011; Rosser 2012), as opposed to systematic institutional challenges. Although gender disparities in biology have been primarily documented at the graduate level and later academic life, it is likely that student experiences at the K–16 levels influence these later outcomes. Exploring the potential for gender disparities early in a student's college experience seems particularly important, because early experiences and decisions, such as choosing a major or being recognized as competent by a biologist, are the first steps in the process of developing a professional biology identity (Cech et al., 2011). Although it is important to track the retention of students, this type of coarse-grained measure does not provide insights into the underlying mechanisms that may impact the experiences of female students. In studies in education, psychology, and sociology, measures of the experience of students that have been shown to be related to retention include: academic achievement (Carrell et al., 2010; Kost-Smith et al., 2010), interest in the discipline (Kost-Smith et al., 2010), class participation (Holmes, 1992; Guzzetti and Williams, 1996), science identity (Meece et al., 2006), professional role confidence (Cech et al., 2011), access to resources (Jovanovic and King, 1998), self-efficacy (Meece et al., 2006), and course-related anxiety (Pomerantz et al., 2002). In our study, we focus on two measures that an instructor could easily document in his/her own course: exam achievement and participation in whole-class discussions. The most commonly studied gap in STEM fields is achievement, which is a strong predictor of retention in STEM disciplines, particularly relative to achievement in non-STEM courses (Beasley and Fischer, 2012; Riegle-Crumb et al., 2012). Studies conducted in biology and biochemistry on gender differences in achievement offer conflicting results. Rauschenberger and Sweeder (2010) showed that females underperformed compared with males in an introductory-level biochemistry course when prior ability was controlled for, and a subsequent study showed that females also systematically scored lower than males in upper-division biology courses (Creech and Sweeder, 2012). However, Strenta (1994) failed to find a gender gap in persistence in biology or any STEM field after controlling for student ability, although there was a trend for gender differences in confidence about ability and feelings of depression about progress, with females having lower confidence and more feelings of depression. More recently, Lauer et al. (2013) saw no difference in academic achievement between males and females in two courses: introductory biology and biochemistry. These conflicting results indicate that the question of gender gaps in academic performance at the introductory biology level remains unresolved. A second potential gender gap that could occur in biology classrooms is a gap in participation. Although, to our knowledge, prior research has not been done on participation in undergraduate biology classrooms, there is a large body of literature on participation across disciplines at the college level. It is evident that instructors at the college level value classroom participation; instructors dedicate 2–23% of class time to student participation in a typical lecture class (Nunn, 1996), and this percentage can be even higher in active-learning classrooms (Smith et al., 2013). In addition, a student's greater level of participation in class has been linked with positive perception of the class (Crombie et al., 2003), decreased anxiety about performance and ability in the course topic (Fassinger, 2000), and increased critical thinking (Tsui, 2002). In studies at the college level, the pattern of participation by men and women in whole-class discussions is not consistent, with various studies showing a bias in either direction (more female than male participation: Howard and Henney, 1998; Fritschner, 2000; Howard et al., 2006; more male than female participation: Crombie et al., 2003; Tatum et al., 2013; or no difference: Cornelius et al., 1990; Pearson and West, 1991; Brady and Eisler, 1999). Although many classrooms were observed in these studies, none of them were courses in STEM disciplines nor were they conducted in classrooms the size of typical large-enrollment introductory-level STEM courses. Two possible explanations for the conflicting achievement and participation patterns at the undergraduate level include: 1) variation in student populations and 2) variation in the instructors. When discussing the experience of female students in a discipline, it is important to recognize that students are not monolithic. Gender is a complicated identity based on a person's internal experience of who he or she is. Thus, individuals can vary in the degree to which they identify with their gender, the gender roles associated with their gender, and how their gender identity influences their experience in different settings such as a classroom (Nosek et al., 2002; Schmader et al., 2004; Lane et al., 2012). In addition, gender is only one of a multitude of social identities that make up who we are and how we react in certain settings. Other identities, such as a student's race/ethnicity, could modify a student's experience as a female in a classroom (Ong et al., 2011), and there have been a few studies that have looked at the interactions between race/ethnicity and gender (Anderson, 2005; Riegle-Crumb et al., 2011). Just as all females are not the same, not all biology classrooms are the same. The classroom experience can be influenced by a multitude of factors, including teaching methods (Beichner et al., 2007), who is enrolled in the class (Theobald and Freeman, 2014), and whether the course is optional or required (Brownell et al., 2013). One classroom factor that has been found to have a specific influence on achievement and participation is instructor gender. In STEM courses, females participated more and identified more with the subject matter when instructors were female (Stout et al., 2011; Young et al., 2013). Some studies have found that instructors of the same gender, particularly instructors students perceive as competent, can improve the performance of female students (Haley et al., 2007; Hoffman and Oreopoulos, 2009; Carrell et al., 2010; Antecol et al., 2012), while other studies found no difference (Griffith, 2010; Price, 2010; Stout et al., 2011). Thus, an instructor effect in college-level STEM courses remains a contentious issue that would benefit from further exploration. In our large, retrospective study, we tested the hypothesis that there are no gender disparities in undergraduate biology for achievement or whole-class participation, using a data set that included 23 classes, 26 instructors, and almost 5000 students across three large introductory biology courses for majors at a large research institution. We explicitly tested two additional subhypotheses: 1) whether instructor gender influences these patterns of female participation and achievement and 2) whether student racial/ethnic identity modifies the relationship between student gender and achievement. METHODS AND RESULTS The Classes This study examined 23 individual offerings (called classes) of the three courses composing the introductory biology series for science majors over a recent 2-yr period at a large public R1 university on the quarter system. The first course in the series focuses on evolution and ecology; the second on molecular, cellular, and developmental biology; and the third on plant and animal physiology. Students taking the introductory biology series are predominately sophomores and biology majors. Although this is a three-course series, not all science majors are required to take all three. Individual classes ranged in size from 159 to more than 900 students, depending on the term. Teaching methods varied between instructors; some were taught with exclusively passive teaching methods, while others were highly student centered and interactive. In addition, exam format varied from almost exclusively essay questions to exclusively multiple choice, with the majority of classes using short-answer exam formats. Although some classes were taught by one instructor (33.3%), most classes were cotaught by two instructors (66.7%), each teaching for 5 wk. In total, 26 different instructors taught these 23 classes. Instructor gender also varied across these classes: 33.3% were taught exclusively by either one or two male instructors, 37.5% had both a male and a female instructor, and 29.2% had either one or two female instructors. During the 2-yr period, more than 5000 students enrolled in the series. Demographic information collected by the university registrar revealed that on average 58.1% of the students in these classes identified as female, but this number ranged from 53 to 64%, depending on the specific class. In addition, on average 43.2% of students identified as white, 37.6% Asian, 2.5% black, 0.8% Hawaiian and Pacific Islander, 4.8% Latin@ 2 , 1.1% Native American, and 3.4% did not identify a racial or ethnic group. An additional 6.6% were international students. Study 1: Is There a Gender Achievement Gap in Introductory Biology? Methods. We collected student exam performance for the 23 introductory biology classes along with the following demographic data obtained from the registrar: student gender identity (0 = male, 1 = female), student racial/ethnic/national identity (Asian, Black, Hawaiian/Pacific Islander, Latin@, Native American, White, and International), and college grade point average (GPA) upon entry into the introductory biology series. We also recorded classroom-level variation in the gender identity of the instructors as 0 = no female instructors, 1 = half of class taught by a female instructor, 2 = all of class taught by female instructor(s). The response variable for our analysis was overall performance on exams in the class. The number of points allocated to exams varied from class to class (from 200 to 400 points), but because the focus of this project was to compare the relative position of males and females within a classroom and not to document the absolute value of their performance in the class, we standardized exam scores by transforming them into z-scores based on each classroom's mean and SD. On average, each student was represented in our analysis twice. Accounting for Differences between Students. Students vary in many ways that could influence exam performance. We hypothesized that exam performance will be influenced by gender and ethnicity and therefore included those terms in our analyses. In addition, because this study, like many educational studies, has a quasi-random design (due to students selecting into classes rather than being randomly assigned to classes), it is possible for inherent differences in students outside of our variables of interest to potentially bias results (Theobald and Freeman, 2014). To limit this possibility, we include two kinds of covariates in our analyses that can account for potential differences between students: 1) a measure of student performance in college and 2) a random-effect term that captures between-student variation (see Statistical Analyses below). To account for potential covariation between academic preparedness and our response variables (exam performance and participation), we include cumulative college GPA as a covariate in our analyses, as it has been shown to be a strong predictor of student achievement in a number of previous studies (Xie and Shauman, 2003; Freeman et al., 2007a, 2011; Riegle-Crumb and King, 2010; Haak et al., 2011; Creech and Sweeder, 2012). In addition, including a covariate that captures some aspects of academic preparedness in our models allows us to more rigorously test the effect of our variables of interest (e.g., student gender and ethnic identity) on our outcome variable (e.g., student exam performance; Lipsey et al., 2012). Therefore, when we describe our results, we are describing differences in exam performance between males and females with the same cumulative college GPAs at entry into the introductory biology series. Accounting for Differences among Classrooms. As instructional practices (Lorenzo et al., 2006; Beichner et al., 2007) and exam difficulty/format (Dimitrov, 1999; Bell and Gafni, 2000) have been shown to influence the relative achievement of males and females, we include a variable for class as a random effect in our analyses (see Statistical Analyses below). This random effect captures the variation in performance between classes that is not related to our predictor variables and allows us to compare outcomes across different class and exam structures. The strength of this approach is that we can control for individual class variation that could be due to anything that may be different among the courses (e.g., the instructor, the students, exam format, exam difficulty, or something that we have not even considered). Statistical Analyses. Similar to many educational studies using multiple classes or schools, the data in this study are hierarchically nested. As students are nested in classes, we have explanatory variables at both the student level (student gender identity and cumulative college GPA) and the class level (instructor gender identity and term and course). The hierarchical nature of this data set is important to account for, because a student's exam performance is likely to be more similar to a classmate's performance than a student outside of his or her class, as students in the same class share the same exams and instructional environment (Kreft and de Leeuw, 2002). In cases like this, linear regression can lead to erroneous conclusions, because the assumption of independence of observations is violated (i.e., 100 students in the same class are not 100 independent data points; Kreft and de Leeuw, 2002). A statistical method called multilevel modeling has been developed to account for this nonindependence of data in nested-data structures and is widely used in the fields of education, sociology, and ecology (Paterson and Goldstein, 1991; Kreft and de Leeuw, 2002; Raudenbush and Bryk, 2002; Bolker et al., 2009). Multilevel models differ from traditional linear regression models in many ways. First multilevel models are a type of mixed-effects model that includes fixed and random effects. Fixed effects are generally the variables of interest, and, in linear regressions, all variables are assumed to be fixed. In mixed-effects models, some variables are allowed to be random effects. Random effects are those that can be seen to be drawn at random from a population. This allows for inference beyond the specific populations measured (Kreft and de Leeuw, 2002). For example, the particular classes students are enrolled in could be considered a random effect if the subset of classes used in a study can be seen as having been chosen at random from a larger pool of possible classes. Random effects are also the variables that can account for clustering in a nested-data structure (Bolker et al., 2009). A second way multilevel models differ from linear models is the ability to account for interactions that occur across levels of the hierarchy. For example, its possible that the relationship between Scholastic Aptitude Test (SAT) verbal score and exam performance might differ for a student in a class with multiple-choice exams versus essay exams. Multilevel models can account for this by incorporating a random slope for SAT verbal (i.e., allowing the slope of the relationship between SAT verbal and z-score to vary from class to class). In our model, the outcome variable was z-scores from exam performance. Student gender identity, student racial/ethnic identity, college GPA, and instructor gender identity were fixed effects, and class and student were random effects. Having student identification as a random effect allows us to account for repeated measures on the same student and avoids issues of pseudoreplication. In preliminary analyses used to develop a baseline model (cf. Zuur et al., 2009), we found that only the relationship between college GPA and z-score varied from class to class, so we used a random slope model that allowed the slope of the regression line to vary by class to account for these differences. These preliminary results indicate that the size of the gender gap is not a unique feature of a particular combination of course structure, exam format, or instructor. In this study the only class-level factor we were able to isolate was instructor gender identity. It may be possible with data collection beyond the scope of this work to parse out the impact of specific exam formats and/or different instructional practices on student achievement, and these are potential areas of future research. Multilevel models were analyzed in R using the lme4 package (Bates et al., 2013). To identify which fixed-effect variables best explained the patterns in student exam scores, we used a powerful multimodel inference technique using Akaike's information criterion (AIC; Akaike, 1973). This statistical method is commonly used in the fields of ecology, evolution, and behavior when data come from observational studies with a large number of possible explanatory variables. It has begun to be used in educational studies focused on large student populations that have a large number of possible explanatory variables (Haak et al., 2011). Several authors have argued that multimodel inference is a more rigorous approach to model selection and variable selection in regression analyses than the more common method of simple significance testing (Akaike, 1974; McQuarrie and Tsai, 1998; Anderson et al., 2000; Johnson and Omland, 2004; Burnham et al., 2011; Garamszegi, 2011; Symonds and Moussalli, 2011). In addition, this type of multimodel inference method avoids some of the common issues of stepwise model-selection methods, including the inconsistencies in model selection that result from different stepwise methods and criteria (reviewed in Hegyi and Garamszegi, 2011). We used this multimodel inference technique using the Akaike's information criterion corrected for small sample sizes (AICc) on linear mixed-effects regression models with a continuous outcome variable: student exam performance (z-scores). AICc estimates the likelihood that each possible model is the best model given our sample size (Akaike, 1973; Anderson, 2008). These AICc values are then used to rank the models. From these AICc values, AICc differences (Δi) and Akaike weights (ωi) are calculated. The Δi represents the strength of evidence in support of each model as the best model. The larger the Δi, the less likely the model. Models with an Δi > 10 are considered poor predictors compared with the best model and thus are not included in our analyses (Burnham and Anderson, 2004). Akaike weights (ωi) are a calculation of the likelihood of the observed data given a particular model that has been standardized so the sum of all the model weights add up to one. These weights make it easier to compare models, as the likelihood is approximately the probability that the model is actually the best model. AICc analyses were implemented in R using the MuMIn package (Barton, 2013). In addition to identifying the best model, the multimodel inference approach also allows us to use information from all possible models to generate regression coefficients through model averaging (Anderson, 2008; Garamszegi, 2011). This method of calculating regression coefficients accounts for the underlying uncertainty that is always present as to which model best fits the data. Akaike weights can also be used to calculate a measure of the relative importance of an explanatory variable (Anderson, 2008; Garamszegi, 2011). This process involves summing the Akaike weights across all the models that include a particular explanatory variable. This relative variable importance is the probability that a particular variable is important for explaining observed differences in exam performance. Six potential fixed variables were initially considered to contribute to student exam performance (Z.Score): 1) cumulative college GPA upon entry into the biology series (Cum.GPA); 2) student gender identity (a factor with two levels; Stu.Gender); 3) student race/ethnicity/nationality (a factor with seven levels; Ethn); 4) an interaction between student gender identity and race/ethnicity/nationality (Stu.Gender*Ethn); 5) instructor gender identity (a factor with three levels; Inst.Gender); and 6) an interaction between student gender identity and instructor gender identity (Stu.Gender*Inst.Gender). Only students with a complete set of these variables were included in this analysis. Combinations of these variables produced a total of 26 potential models to describe our data. The total number of models tested was substantially lower than our number of observations (n = 7841 students), which justified fully exploring this set of models. Thus, we systematically explored the possible models for our data and ultimately chose the model that best fits the data according to the model-selection statistics. We also calculated the model averaged regression coefficients for the fixed effects in our model. Our initial full model was as follows: This model includes the random terms for student identity (represented by 1|Stu.ID) and the interaction between cumulative GPA and class (represented by Cum.GPA|class). Results for Study 1: Is There a Gender Achievement Gap in Introductory Biology? Using model selection, we found six models with reasonable support (Δi < 10) that explained the patterns in exam performance across the 23 classes (Table 1). The top two models had the majority of the support (summed ω = 0.71). The best model included three of the six possible fixed effects (cumulative college GPA, student gender identity, student race/ethnicity/nationality). The second-best model included the two instructor variables (instructor gender and student gender identity*instructor gender). Table 1. Best models include student gender identity as a predictor of exam performancea Rank Modelb AICc Δi ωi 1 Cum.GPA + Ethn + Stu.Gender 18019.9 0 0.41 2 Cum.GPA + Ethn + Stu.Gender + Inst.Gender + Stu.Gender*Inst.Gender 18020.6 0.63 0.30 3 Cum.GPA + Ethn + Stu.Gender + Inst.Gender 18022.5 2.58 0.11 4 Cum.GPA + Ethn + Stu.Gender + Ethn 18023.1 3.21 0.08 5 Cum.GPA + Ethn + Stu.Gender + Stu.Gender*Ethn + Inst.Gender + Stu.Gender*Inst.Gender 18023.5 1.95 0.07 6 Cum.GPA + Ethn + Stu.Gender + Inst.Gender 18025.7 5.79 0.02 aRelative ranking (from most support to least) of six best models for predicting student exam performance using AICc model selection. Only models that are informative (Δi < 10) are shown. The table shows only fixed-effect terms, but all models also include two random-effect terms: Student and an interaction between cumulative college GPA and the class students were enrolled in. bCum.GPA = cumulative college GPA at start of introductory biology series; Stu.Gender = student's gender identity; Ethn = student ethnic/racial/national identity; Inst.Gender = instructor(s) gender. The main effect of identifying as female across all our models was to decrease exam performance by ∼0.2 of an SD (β = 0.21 ± 0.04 [SE], p value < 0.0001; Table 2). The student gender identity variable had a relative variable importance of 1 and was present in all of the six best models, implying that gender had a consistent and reliable impact. That is, if two students are in the same class and have the same GPA and race/ethnicity/nationality, but one student is male and the other is female, our model predicts that the female student will score 0.2 SDs lower in the distribution of scores in the class. In classes with 400 exam points (n = 19), the average SD was 42.8 points. Thus, female students are scoring, on average, 11 points (2.8%) lower on their overall exam grades than male students with the same GPA. Table 2. Female gender significantly decreases exam performance relative to males across 23 introductory biology classesa Relative variable Model averaged regression Parameter importance coefficient ± SE p Valueb Intercept NA −4.10 ± 0.20 <0.0001 Student-level variables: Cumulative GPA 1 1.32 ± 0.06 <0.0001 Ethnicity/Race/Nationality: 1 (reference level: White) Asian −0.13 ± 0.03 <0.0001 Black −0.43 ± 0.09 <0.0001 Hawaiian/Pacific Islander −0.22 ± 0.14 0.114 International −0.44 ± 0.06 <0.0001 Latin@ −0.24 ± 0.07 0.001 Native American −0.24 ± 0.11 0.030 Student Gender: 1 (reference level: Male) Female −0.21 ± 0.04 <0.0001 Ethnicity/Race/Nationality*Student Gender: 0.18 (reference levels: White*Male) Asian*Female −0.01 ± 0.05 0.830 Black*Female 0.17 ± 0.14 0.227 Hawaiian/Pacific Islander*Female 0.19 ± 0.23 0.412 International*Female −0.08 ± 0.09 0.383 Latin@*Female 0.22 ± 0.10 0.026 Native American*Female 0.13 ± 0.19 0.492 Classroom-level variables: Instructor Gender: 0.51 (reference level: Only Male) −0.08 ± 0.07 0.27 1 Female/1 Male Only Female −0.01 ± 0.08 0.90 Student Gender*Instructor Gender: 0.37 (reference levels: Male*Only Male) 0.07 ± 0.04 0.055 Female Student*1 Female/1 Male Instructor Female Student*Only Female Instructor(s) 0.10 ± 0.05 0.024 aModel-averaged regression coefficients and relative variable importance for all six possible fixed-effect terms. Although not shown, this model includes two random-effect terms: (1|Stu.ID) + (Cum.GPA|class). bBolded p values are significant. A main effect of race on exam points was well supported in our analyses (relative variable importance = 1, present in all six of the best models). However, the interaction between student gender identity and race/ethnicity/nationality was not supported; the interaction term had the lowest relative variable importance (0.18) of all the predictors included in the model (Table 2). It also is present only in the fifth most well-supported model, and this model does not have much support relative to the best model (ωi = 0.07 out of 1; Table 1). The model averaged coefficients reveal that the only significant interactions between gender identity and racial/ethnic identity is within Latin@s (0.22 ± 0.1, p value = 0.026; Table 2). In a class taught exclusively by males, if a white male student, a Latina student, and a Latino student enter that class with the same cumulative college GPA, the Latina student is predicted to perform 0.23 SDs lower than the white male (−0.24*Latin@ + −0.21*StudentGenderF + 0.22*Latin@*StudentGenderF) and the Latino student is predicted to perform 0.24 SDs lower than the white male (−0.24*Latin@). Thus, although there is no difference between being male and female for Latin@s, both underperform compared with white males. This lack of gender gap for Latin@ students could be attributed to males experiencing a great cost for being Latino in the introductory biology classroom compared with females, or it could be attributed to Latinas experiencing less of a cost of being female than other racial groups. It is impossible to distinguish between these hypotheses from this type of observational data, but it could be interesting to further explore the experience of Latin@ students, as this pattern is unique among the racial and ethnic groups in this study. Instructor gender and the interaction between instructor gender and student gender were present in the second-best model (Table 1) and have relative variable importances of 0.5 and 0.37, respectively (Table 2). This indicated that there is more uncertainty about their importance than the variable of student gender identity, student race/ethnicity/nationality, and cumulative college GPA. Using the model average coefficients that incorporate this uncertainty about the relationship between instructor gender identity and student performance, we find it is only the interaction between student gender identity and females exclusively teaching the class that has a significant positive impact on student exam performance (β = 0.10 ± 0.05, p = 0.024; Table 2). Thus, if a course was taught solely by female instructors, the achievement gap between students of different genders with the same cumulative college GPA and race/ethnicity/nationality would be reduced by 62%. This would mean the gender gap in a class with two female instructors would be reduced from 11 points (a gap of 2.8%) to 7 points (a gap of 1.7%). Study 2: Are There Gender Gaps in Participation during Whole-Class Student–Instructor Interactions? Methods. Over the 2-yr period, 26 instructors taught the introductory biology series. Though many instructors taught the courses more than once during this 2-yr period, participation data were only collected from one quarter for each of the 26 instructors. We observed individual class sessions to determine participation rates. Kane and Staiger (2012) found that two trained individuals each observing a single 45-min session of a teacher's class have a reliability score of 0.67 (i.e., observations are more likely to be due to a characteristic of the teacher and not due to a particular observer), and this paired observation of one session is just as reliable as having independent observations of four sessions. In our study, to be conservative and to increase the number of student–teacher interactions sampled, we randomly selected three class sessions for each instructor. These 78 videos were scored by two observers, one male and one female, who recorded 1) the ways in which students verbally interact with the instructor during class and 2) the perceived gender of any student who spoke out during class. In this study, we focused solely on student verbal interactions that occurred in the context of the whole class. Although there are other ways for students to interact in class (e.g., asking an instructor a question during small-group work), it was impossible for us to analyze those conversations through our whole-class video recordings. We categorized student interactions in front of the whole class in the following ways: 1) asking a spontaneous question, 2) volunteering to answer an instructor-generated question, or 3) responding to an instructor-generated question when called on by the instructor through random call. An event was coded as a spontaneous student question when a student asked an instructor an unprompted question or was only very generally prompted: “Does anyone have a question?” Volunteer responses were characterized by students raising their hands or shouting out answers of their own volition in response to instructor questions. In these volunteer responses, only those students who choose to participate did. Random call required students to be more accountable for participating in class than either of the two previous methods. Random call has a particular structure that is similar to cold-calling, with the instructor calling on students by name to answer questions the whole class hears. However, random call differs from cold-calling in that an instructor does not decide upon whom he or she will call. Instead, an instructor comes to class with a randomized class list and calls student names in the order the names appear on this list. Observers were able to distinguish random call from volunteer responses in the videos by watching the instructor behaviors. In random call, the instructor calls out student first and last names without waiting for volunteers and can often be seen referring to a list before saying a student name. Only instructors who had a total of five or more students participate in any one of these three types of student–instructor interactions across the three observed class sessions were included in the analysis. We chose five as a lower cutoff to be conservative, as the analysis we planned to use involved ratios. With ratios, the fewer observations, the easier it is to see extreme values that would be classified as significant deviations from expected. Based on this criterion, only 20 of the 26 instructors qualified for analysis of student participation in whole-class interactions. The two observers also independently assigned a gender to the participating students in the videos based on the students' visual appearances and/or auditory characteristics. If observers could not identify the gender of the speaker or did not agree on the gender, the student was marked as “cannot determine.” Overall, observers could not assign a gender to 7.9% of the students who spoke in front of the whole class. If more than 20% of the total number of students speaking in the three sessions could not be assigned a perceived gender, then the instructor teaching that class was not included in our analysis. This occurred only for two instructors in which either the camera was too far away to see any of the students who spoke or students spoke so briefly it was impossible to identify them. Therefore, of the 20 instructors who had a total of more than five students speak out to the whole class over three class sessions, we were only able to analyze participation data for 18 instructors. We chose to work with historic video data so that we did not influence instructor behavior by sitting in and recording real-time interactions. However, the methods used in this study have several limitations. The first disadvantage of working with historic video data is that we cannot identify individual students by name in order to determine their self-reported gender identity. Perceived gender was the best proxy we could collect, but perceived gender does not always align with self-identified gender (e.g., a male student with long hair may be mistakenly identified as a female student, or a student who appears to be female based on physical characteristics may actually self-identify as male). Second, in the majority of our observed classrooms, an individual instructor used multiple student-engagement techniques (volunteers and student questions) as well as small-group work. Thus, it was not possible for us to link exam performance (i.e., academic achievement) in these classes to interaction methods used, because multiple methods were used, and it was impossible to ascertain the independent impact of one of these methods on exam performance. Statistical Analyses. Analyses were run separately for each type of student–instructor interaction (spontaneous questions, volunteer discussions, and random call) to determine whether there are gendered patterns of participation under each strategy. Some of the instructors (n = 4) had enough student participants in two categories to be included in both sets of analyses, and a few (n = 2) exceeded the minimum number of students for all three methods. Therefore, an individual instructor could be included in the analysis of more than one type of interaction. Overall, 11 instructors were included in the analysis for spontaneous student questions, 13 in the analysis of volunteer-based discussions and four in the analysis of random call discussions. As the number of student–instructor interactions varied widely between these 18 instructors, results will be expressed as percentage of interactions by females. Because only a small number of students were in each instructor analysis, an exact binomial test for goodness of fit was used to compare the expected value of female speakers (the percentage of women enrolled in the class) with the observed percentage of female voices heard in each interaction type. To explore the gender bias in each interaction type across all instructors, a two-tailed t test was performed across all the instructors for student questions, volunteer responses, and random call, individually. In addition a nonparametric Kruskal-Wallis analysis of variance was performed to determine whether instructor gender influenced female response rates. Analyses were implemented in R (R Core Team, 2012). Results for Study 2: Are There Gender Gaps in Participation during Whole-Class Discussions? Across 11 classrooms that had spontaneous student questions, there was not a significant difference (two tailed t test: p = 0.319) between the proportion of females enrolled in a class (58.7 ± 3.5% SD) and the proportion of questions asked by females (39.9 ± 22.5%). Although the summary t test did not reveal a significant difference across the 11 classes, the exact binomial tests within each class identified five classrooms in which females asked fewer questions than expected (p < 0.03) and six classrooms for which there was no statistical difference (Figure 1). In no classrooms did females significantly ask more questions than males. Figure 1. Variation by class in the percentage of questions asked by females. Comparison of the percentage of females in a class (gray bars) with percentage of unprompted questions in class asked by females (nested black bars). Asterisks (*) indicate that the exact binomial test was significant at the p = 0.05 level. On the other hand, across the 13 classrooms in which there were volunteer responses, the number of responses attributed to females (36.7 ± 12.9%) was significantly lower (p = 0.042) than would be expected based on the number of females enrolled in each class (59.2 ± 3.6%). There was less variation from class to class in this result relative to the variation in spontaneous questions: nine of the 13 classrooms revealed significant differences (p < 0.05) between observed and expected number of female volunteer responses (Figure 2). In no classrooms were females heard more than males when the instructor solicited volunteer responses. Figure 2. Females heard in volunteer student–instructor interactions significantly less than expected based on enrollment. Comparison of the percentage of females in a class (gray bars) with percentage of volunteer-based student–instructor interactions that involved female students (black bars). Asterisks (*) indicate that the exact binomial test was significant at the p = 0.05 level. Unlike spontaneous student questions or volunteer responses, there were no significant gender differences in participation when participation was based on random call (p = 0.9). In this case 61.0 ± 0.04% of students in the class were female and 60.0 ± 11.8% of the participants in random call were female (Figure 3). This pattern was consistent across the four classrooms in which random call was used. Figure 3. Random call extinguishes gender gap in whole-class participation. Comparison of the percentage of females in a class (gray bars) with percentage of females who are called on during random call (RC)-based discussions (nested black bars). We found no evidence that instructor gender moderated any of these participation patterns (volunteer: χ2 = 0.34, df = 1, p = 0.56; student questions: χ2 = 0, df = 1, p = 1). DISCUSSION In our study of 23 classes at an R1 university, we found evidence of systematic gender-based gaps in both exam achievement and whole-class participation. Female students underperformed on exams compared with male peers with similar historical college performance. Furthermore, female voices were heard much less frequently than would be expected based on the gender composition of the classes. The causes and consequences of these subtle disparities are difficult to discern, but they could have lasting impacts on the development of a science identity, sense of belonging, and confidence of female science majors, which may have negative effects on long-term retention of women in the field of biology (Wickware, 1997; Johnson, 2007; Collett et al., 2013). Small, Yet Potentially Important Achievement Gap between Males and Females In this study, we found that the exam performance of female students was consistently a quarter of a SD lower than the performance of male students with similar college GPAs, leading to an average 2.8% difference in exam scores. In addition, the main effect of gender was significant even when an interaction between gender and race/ethnicity/nationality was added. This indicates that the impact of being female in a biology classroom is consistent across the different racial/ethnic groups present in the observed classrooms. If the main effect terms had been nonsignificant, but interaction terms between gender and race/ethnicity/nationality were significant, this would have indicated that gender only had a significant impact for some racial/ethnic groups. This was not the pattern we observed: the only group with a significant interaction term between race/ethnicity and gender was Latin@s. Replication of this difference and more detailed studies will be necessary to parse out the significance of the difference between the experience of Latin@s in the introductory biology classroom and other racial/ethnic groups. We can put the small achievement gap found in these biology classes into perspective by comparing our result with 1) achievement gaps based on other social identities for the same students, 2) achievement gaps in biology courses at other institutions, and 3) other studies of achievement gaps in college-level STEM courses. These provide a sense of how the magnitude of the gender gap compares with gaps that are already of concern in biology and whether or not biology is different from other STEM fields in terms of gender performance. Social identities currently of concern in biology include first-generation status and racial/ethnic identity. We do not have data on first-generation status for our sample, but we do have racial and ethnic identity. Racial/ethnic achievement gaps are usually established by comparing a particular group's performance with white students. In our study, we found the difference in performance between males and females was similar in magnitude to that between white students and Latin@s, Native Americans, and Hawaiian and Pacific Islanders in these 23 classrooms. It was less than half of the achievement gap between white and black students and white domestic students and international students. The gender achievement gap was double that of the Asian and white achievement gap. These results reveal the gender achievement gap is of similar magnitudes to some gaps already of concern in biology, although it is smaller than others. In contrast to our study, three studies in introductory biology classes found no significant achievement gaps between males and females (Willoughby and Metz, 2009; Creech and Sweeder, 2012; Lauer et al., 2013). However, these studies were only of one class each and thus substantially smaller in sample sizes than our study. In addition, only Creech and Sweeder (2012) controlled for student ability using college GPA as we did, and they found no gap in a 200-level biology class, but in a 400-level class females underperformed by 3.5% compared with males. Overall, our study is the largest study of introductory biology and the only study of introductory biology to demonstrate an achievement gap. Compared with studies across STEM that also controlled for student prior academic performance when calculating a gender achievement gap, the achievement gap in biology we observed is slightly smaller in magnitude than in most other fields. These include studies in fields thought to be much less female friendly than biology, such as physics (7.5% lower; Miyake et al., 2010) and biochemistry (3.5-4.3%; Rauschenberger and Sweeder, 2010). The smaller achievement gap observed in our study implies that at least for our study, biology is different from other STEM fields in terms of female students' performance. Achievement gaps in performance are only one measure, though, and more measures need to be studied (and more institutions sampled) before any definitive conclusions can be drawn. Explanations for achievement gaps between males and females in STEM are numerous, but, with our retrospective study design, we cannot distinguish among them. Instead, we will present two possible explanations, out of a myriad of possibilities (Hill et al., 2010), that seem plausible for our study setting. First, female students may enter introductory biology classes with a weaker biology background than males. Some evidence for this hypothesis comes from student scores on the Advanced Placement biology exam, on which males were found to consistently outperform females (Coley, 2001). Willoughby and Metz (2009) found that females underperformed on biology concept inventories given at the beginning of an introductory biology class. Additional evidence in support of a potential gap in preparedness for men and women can be found in other STEM fields. A gap in preparedness was found in a study of physics students in which females on average took fewer high school physics courses (Kost-Smith et al., 2010). More male high school students than female high school students have an interest in pursuing a STEM major (Ma, 2011), which could also lead to males taking more science courses in high school. Even if males and females took the same number of science classes in high school, females could still have a weaker background in biology if they did not receive the same opportunities to participate in STEM courses in K–12 that males did. There is evidence that males in K–12 classes are more likely to manipulate laboratory equipment and more likely to offer explanations in class, depriving females of opportunities to gain the skills that could be useful in college-level biology (Howe and Abedin, 2013). This difference in preparation, if present in our population, could potentially explain the achievement gap but needs to be further explored. A second possible explanation for this achievement gap comes from the social psychology literature: the phenomenon of stereotype threat. Stereotype threat can be defined as fear that one's behaviors will confirm an existing stereotype of a group to which one belongs (Steele and Aronson, 1995). This phenomenon has been shown to reduce performance (Nguyen and Ryan, 2008) and is particularly strong in people who identify with the field they feel threatened in (e.g., identifying strongly with science; Inzlicht and Schmader, 2012). Interventions to alleviate stereotype threat have been shown to increase the performance of women in math-related fields (Spencer et al., 1999; Miyake et al., 2010). Currently, we do not have data on whether women are under stereotype threat in biology, although it is present across many other STEM fields (physics: Miyake et al., 2010; math: Spencer et al., 1999; computer science: Cheryan et al., 2009; engineering: Bell et al., 2003). Only one study has used a stereotype threat intervention in biology (Lauer et al., 2013), but this paper did not establish that there was an achievement gap between males and females before employing the intervention, making their negative result difficult to interpret; it is possible that the intervention could work in a classroom with a gender gap in achievement. In addition, there are multiple types of stereotype threat (Inzlicht and Schmader, 2012), so the failure of one intervention that addresses one type of threat does not imply that other interventions will not work. Thus, it remains a possibility that women in biology are under stereotype threat and that this phenomenon could explain our results. Further work is needed to thoroughly explore this possibility. In summary, we found a systematic achievement gap between males and females in our study, but, because our study design was retrospective, we had limited access to the measures necessary to distinguish between different explanations for the achievement gap we observed. Future prospective work could administer surveys that address differences in preparation and experience of stereotype threat to distinguish among these and other possibilities. Instructor Gender May Impact the Achievement Gap Evidence for an instructor gender effect on gender gaps in achievement at the college level is mixed. Some studies find that instructor gender does impact the achievement of females (Haley et al., 2007; Hoffman and Oreopoulos, 2009; Carrell et al., 2010), but other studies do not support this finding (Griffith, 2010; Price, 2010; Stout et al., 2011). Our study found some evidence for a small but significant impact of instructor gender, although there was some uncertainty about the importance of these terms (relative variable importance was moderate). Specifically, female students performed 0.1 of a SD better on exams when a course was taught exclusively by female instructors, which halved the achievement gap between males and females of the same ethnicity/race/nationality who entered the class with the same cumulative college GPA. This finding makes our study consistent and of a similar magnitude of effect with college-level STEM data that found that female students taught by female instructors in STEM courses outperformed female students taught by male instructors (Carrell et al., 2010). One limitation of our study is that we did not document whether teaching methods or exam format varied by instructor gender. Without this information, it is impossible to determine whether female instructors teach differently than male instructors and whether the instructor effect is due primarily to instructor gender. We do know anecdotally that the majority of exams across all 23 courses were short-answer format and that several of the instructors with the most student-centered classrooms were male. Gender Gaps Exist in Whole-Class Participation One of the novel aspects of this study is that we moved beyond simply quantifying academic achievement gaps to examining gaps in classroom participation in college-level STEM classrooms. Overall, we found that female and male students were equally likely to ask spontaneous questions in ∼50% of the classes. When students were asked to offer volunteer responses, 69% of classrooms showed a pattern of male-biased participation; across these classes, males on average spoke 63% of the time, even though they comprised 40% of the overall class. Our study corroborates findings in elementary school science classrooms that show boys are eight times more likely to volunteer answers in class than girls (Sadker and Sadker, 1994). At the college level, studies of participation have found a range of patterns (more female than male: Howard and Henney, 1998; Howard et al., 2006; Fritschner, 2000; more male than female: Crombie et al., 2003; Tatum et al., 2013; no difference: Cornelius et al., 1990; Pearson and West, 1991; Brady and Eisler, 1999), but, to our knowledge, ours is the first observational study of college-level participation in a STEM classroom. In a study in STEM using self-reporting by students, women reported lower participation rates in biology, engineering, and chemistry courses (Crombie et al., 2003), and we have preliminary data showing a similar pattern in two introductory biology classrooms (unpublished data). Class participation in our study took the form of interaction between two individuals: the instructor and the student. First, individual students decided whether or not to volunteer to answer an instructor's question, and then the instructor decided which volunteers to call on to speak. Either, or more likely both, individuals' behavior(s) could lead to the gender gap in participation observed in this study without anyone's conscious intent (Greenwald and Krieger, 2006). Instructors enter their classroom with a set of perceptions about the class that may include, among many other things, what topics will interest students most, what students will already know about the subject, and who will participate the most. Some of these perceptions could include unconscious, and thus unexamined, biases about the roles of male and female students in the classroom and in science (Greenwald and Krieger, 2006). For example, if our previous experiences in science classrooms demonstrate that male students talk more and participate more actively than females (as shown in the K–12 literature: Holmes, 1992; Guzzetti and Williams, 1996; Howe and Abedin, 2013), then, as instructors, we might unconsciously expect the same pattern to occur in our classroom. Moreover, if we expect males to participate more, especially when offering answers (again seen in the K–12 literature; Altermatt et al., 1998; Burns and Myhill, 2004), then we might unconsciously facilitate this pattern by calling on males more. Thus, perpetuating gender inequality in the classroom can be a passive process that only requires us to remain unaware of our biased expectations (Greenwald and Krieger, 2006; Hill et al., 2010). An illuminating example of this passive unconscious bias in a science classroom comes from a study at the elementary school level in which researchers worked with science instructors to equalize student participation. Instructors involved in this process found it difficult, and one instructor reported that he felt he was devoting 90% of class time to females, when really it was just equal time (Whyte, 1986). It was his unconscious bias that females would not participate at equal rates that influenced his perception of the classroom dynamics. A more recent study demonstrating that this unconscious bias against women persists in STEM found that faculty members of all genders were more likely to hire a male undergraduate lab assistant than a female, pay the male lab assistant more than the female, and offer a greater level of mentoring, even when the candidates had identical qualifications (Moss-Racusin et al., 2012). The second factor that could contribute to the gender bias in participation in volunteer-based classroom interactions is a student's decision to volunteer. In the K–12 literature, there is a consistent pattern wherein females speak less than males in traditionally male-dominated fields such as science. There is also extensive evidence at the K–12 level that girls are less confident than boys in their knowledge in science fields even after controlling for their actual performance (Meece et al., 2006; Micari et al., 2007; Sikora and Pokropek, 2012) and, thus, may not feel confident enough to provide an answer in front of a large group. Girls also seem to be much more concerned about how their instructors view them, and the fear of creating a negative perception could hold them back from participating (Pomerantz et al., 2002). At the college level, this difference in confidence between males and females has been demonstrated in several STEM disciplines, including engineering (Cech et al., 2011) and physics (Lindstrom and Sharma, 2011). When participation in biology classes is skewed toward males, females are systematically missing out on valuable practice that may lead to benefits such as achievement and/or retention in STEM. Although these are often the more common benchmarks for success in education research, speaking in class can also strengthen a student's relationship with the field of biology and improve his or her sense of belonging, which could indirectly impact retention. For example, speaking and earning praise, or hearing people with a similar social identity (e.g., same gender or race) as you earning praise from an authority figure has been shown to increase a student's sense of belonging in a field (Carlone and Johnson, 2007; Ong et al., 2011). By not being called on and not receiving the validation of an authority figure (e.g., the instructor; Sinnes and Loken 2012), females may develop a lower sense of belonging as a person who can contribute to the biology community. This incongruence between how a female views herself as a person capable of being a competent biologist and how she thinks others view her could lead to stereotype threat or imposter syndrome, the conviction that despite her accomplishments she is still not good enough for the field. Both of these phenomena are known to decrease student performance and contribute to attrition from STEM fields (Massey and Fischer, 2005; Freeman et al., 2007b; Collett et al., 2013). In addition, in biology and STEM fields in general, practitioners must be comfortable offering their ideas in group settings such as meetings, conferences, and day-to-day interactions with collaborative teams. Science classrooms are an opportunity for students to practice these skills in a low-stakes environment. At first it may seem unkind to call on students who are hesitant to participate, but research has shown that students who participate in class, even if they are forced to participate initially through cold-calling, become more comfortable talking in class and even begin to volunteer on their own (Dallimore et al., 2010, 2013). This increased confidence in participating could transfer to higher-stakes environments such as lab meetings and scientific conferences. Thus, the limited participation of females in introductory biology classrooms is denying females the chance to practice science discourse skills to the same degree as males and preventing them from gaining the confidence to participate in more high-stakes environments. Furthermore, classrooms in which males dominate discussions could indicate to future male scientists that underparticipation by females in biology is standard. For all these reasons, unequal class participation may have greater and more enduring consequences for equity that are difficult to measure. Disparities in Whole-Class Discussions Can Be Ameliorated Using Random Call It seems that factors at both the student and instructor level could lead to disparities in who participates in the biology classroom. Fortunately, our results also indicate that there is at least one simple solution to the problem: using random call to structure class participation. By choosing to employ this interaction method, instructors will call on males and females in proportion to their representation in class and can prevent gender disparity in who participates. Random call differs from volunteer-based student participation, because it requires the instructor to call on people based on a list created before he or she enters the classroom. This list of randomized names not only prevents any instructor bias from influencing who is called on but also does not allow students to opt out of participation because they are uncomfortable. Random call may sound intimidating to students, but instructors in this study alleviated this anxiety somewhat by having students discuss their answers in small groups (e.g., think–pair–share) before anyone was called on to report to the whole class. Random call is additionally useful, because it spreads participation equally over the whole class and prevents a few students from monopolizing an instructor's time. In this study, instructors made a randomized class list in advance using Microsoft Excel, but other instructors have used a deck of cards with students names on them that they shuffled and drew from (Tanner, 2013), and there are even apps for modern handheld devices designed for this task, such as Names in a Hat for iPhone. Replication with Other Student Populations Is Necessary before Drawing Conclusions across Biology In this study, we focused on two measures of gender equity in biology across 23 classes with almost 5000 students. Although our sample size is large, the entire sample is taken from only one institution that has its own unique identity, culture, and student demographics. Gender inequity in biology is a complex factor that is influenced by the experiences brought to the classroom by both the students and the instructors. Therefore, it is important for researchers not to make assumptions about the dynamics of social identities in biology classrooms based only on data from this paper, which represent only one institution. Rather, both researchers and instructors need to document the gender patterns in their own classes and institutions to determine the pervasiveness of gender gaps. In addition, unconscious bias and learned roles in the classroom are experiences that could influence a range of student outcomes, including self-efficacy, interest in science, and course-related anxiety. In this study, we only investigated two of the myriad of measures that could be used to elucidate areas of gender inequities in biology; there is the need for more work to be done to identify patterns in the experience of females in biology across a range of outcome variables and a range of institution types. CONCLUSION Although biology has been successful at closing gender gaps in attracting and retaining undergraduate and graduate students, in this study, we document more subtle gaps that persist. We found that both academic achievement and participation in class reveal evidence of systematic gender differences in introductory biology at a R1 institution, which suggest that there may be many unexplored aspects of science identity development that remain to be addressed before we can purport gender equity in biology. As the undergraduate student body continues to diversify at colleges universities, it is becoming increasingly important that instructors not only have deep content expertise and use evidence-based teaching practices, but also that they are aware of the challenges facing students of different social identities in the biology classroom. Many of these barriers have already been identified by the social sciences; these researchers have also developed many successful interventions to help students cope with lower confidence (Aronson et al., 2002), stereotype threat (Cohen et al., 2006; Miyake et al., 2011), and other barriers in the classroom faced by specific groups. As we work toward improving undergraduate biology education for all students, recognizing and challenging our own biases is an essential first step toward making undergraduate biology more equitable. The remaining challenge for all of us is to act on our awareness by modifying our teaching to maximize the learning environment for the ever increasing diversity of students in our classrooms.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: ResourcesRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: InvestigationRole: MethodologyRole: ResourcesRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: ValidationRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: Formal analysisRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Writing – review & editing
                Role: Formal analysisRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Project administrationRole: SupervisionRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS One
                PLoS One
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                12 January 2021
                2021
                : 16
                : 1
                : e0243731
                Affiliations
                [1 ] The Biology Education Research Lab, Research for Inclusive STEM Education Center, School of Life Sciences, Arizona State University, Tempe, AZ, United States of America
                [2 ] BSC 4932: Undergraduate Biology Education Research Class, Department of Biology, University of Central Florida, Orlando, FL, United States of America
                [3 ] Mary Lou Fulton Teachers College, Arizona State University, Tempe, Arizona, United States of America
                [4 ] The Department of Biology, University of Central Florida, Orlando, FL, United States of America
                University of Eastern Finland, SWEDEN
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Author information
                https://orcid.org/0000-0002-2419-5450
                https://orcid.org/0000-0001-8427-7741
                Article
                PONE-D-20-24104
                10.1371/journal.pone.0243731
                7802933
                33434226
                28ef1a5b-ee76-4f08-b50c-aab9ba9fef7b
                © 2021 Nadile et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 2 August 2020
                : 27 November 2020
                Page count
                Figures: 5, Tables: 5, Pages: 23
                Funding
                The authors received no specific funding for this work.
                Categories
                Research Article
                Social Sciences
                Sociology
                Education
                Schools
                Colleges
                People and Places
                Population Groupings
                Professions
                Instructors
                People and Places
                Population Groupings
                Educational Status
                Undergraduates
                Biology and Life Sciences
                Neuroscience
                Cognitive Science
                Cognitive Psychology
                Learning
                Human Learning
                Biology and Life Sciences
                Psychology
                Cognitive Psychology
                Learning
                Human Learning
                Social Sciences
                Psychology
                Cognitive Psychology
                Learning
                Human Learning
                Biology and Life Sciences
                Neuroscience
                Learning and Memory
                Learning
                Human Learning
                Social Sciences
                Sociology
                Education
                Lectures
                Research and Analysis Methods
                Research Design
                Survey Research
                Surveys
                Research and Analysis Methods
                Research Design
                Observational Studies
                Biology and Life Sciences
                Psychology
                Emotions
                Fear
                Social Sciences
                Psychology
                Emotions
                Fear
                Custom metadata
                All relevant data are within the manuscript and its Supporting Information files.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article