INTRODUCTION In 2012, the President’s Council of Advisors on Science and Technology (PCAST) reported that there is a need for an additional one million science, technology, engineering, and mathematics (STEM) graduates in the United States over the next decade to meet U.S. economic demands (1). It was noted that even a modest increase in the persistence of STEM students in the first 2 years of their undergraduate education would alleviate much of this shortfall (1). Replacing conventional introductory laboratory courses with discovery-based research courses is a key recommendation that is expected to lead to enhanced retention. Providing authentic research experiences to undergraduate students and directing them toward careers in STEM is a priority of science education in the 21st century (1 – 4). An abundance of evidence shows that involvement of undergraduate students in authentic research experiences has strong benefits for their engagement and interest in science (5 – 7) and that this often increases student interest in STEM careers (8). It is common for undergraduate students at research colleges and universities to participate in faculty-led research programs—especially during their last 2 years—with graduate students and postdoctoral researchers participating in their mentorship (9). Research experiences promote college retention (10), but the capacity for high-quality mentored undergraduate research within faculty research programs is limited, and this route is unlikely alone to satisfy the economic demands of the coming decade. There have been many successful efforts to develop classroom undergraduate research experiences (11–14; see also http://www.sciencemag.org/site/special/ibi/ and http://www.curenet.org/), but identifying authentic research experiences that scale to larger numbers of undergraduate students often proves elusive (4). Bioinformatic approaches engaging substantial numbers of students at diverse institutions have been described (15, 16) and are successful in providing research experiences (14) but do not include a wet-bench laboratory component. Taking advantage of research infrastructures at research-intensive institutions to advance missions in undergraduate education is desirable, and community-oriented approaches have been developed (17, 18), although the potential is largely untapped. Some research projects are likely to be more suitable for undergraduate involvement than others, and identifying those both rich in discovery and accessible to early-career students is challenging (19). The Phage Hunters Integrating Research and Education (PHIRE) program, in which undergraduate and high school students isolate novel bacteriophages, sequence their genomes, annotate them, and analyze them from a comparative genomics perspective, is one response to this challenge (19 – 21). The approach takes advantage of the large, dynamic, old, and highly genetically diverse nature of the bacteriophage population (22, 23). Moreover, although phages play key roles in bacterial pathogenesis (24) and the global climate and ecology (25), we know remarkably little about them outside a few well-studied prototypes. Phages can be easily isolated from the environment, and their relatively small genomes (40 to 150 kbp) are readily sequenced and annotated (26). Phage isolation requires little prior expert knowledge or technical skill, providing an accessible entry point for students from all backgrounds to engage in inquiry-based science (21). Each isolated phage is new, students can name their own phage, and a sense of ownership in their discovery helps to motivate them to explore the secrets of their phage by isolating genomic DNA, determining its sequence, annotating gene predictions, and comparing the sequence to that of other known viruses (21). This programmatic transition from a broadly accessible and concrete introduction to sophisticated genomic analysis provides a rich and structured education platform (27), applicable to STEM and non-STEM students, including first-year undergraduates (28 – 30). To investigate whether the PHIRE approach can be extended to environments beyond the expert phage-focused research laboratory, the Howard Hughes Medical Institute (HHMI), the University of Pittsburgh, and James Madison University investigated a framework enabling broad usage at diverse institutions, involving large numbers of undergraduate students and nonexpert instructors, and assessed its impact. The approach proved to be scalable (4,800 students at 73 schools over 5 years), it was implementable at research-intensive and research-poor institutions, generated gains in phage biology research, and enhanced student retention, and the student-reported gains were equivalent to those from an intense summer research experience. RESULTS The attributes of the PHIRE program at the University of Pittsburgh demonstrate that phage discovery and genomics are a platform that supports engagement of students in authentic research without requiring prior mastery of anything other than very basic concepts and content material (21). We therefore examined whether this could be broadly implemented at institutions with a wide spectrum of missions and demographics, without a requirement for resident expertise in bacteriophage biology. Our core hypothesis was that student participation in this research would generate new insights into phage diversity and evolution while simultaneously elevating student engagement in science, stimulating overall academic performance, and encouraging persistence in STEM fields. Below, we report the structure of the HHMI Science Education Alliance Phage Hunters Advancing Genomics and Evolutionary Science (SEA-PHAGES) course and its impacts on both research advances and student learning. The SEA-PHAGES course. The SEA-PHAGES course (formerly called the National Genomics Research Initiative) is a yearlong research experience targeted at beginning college students. Classes typically enroll 18 to 24 students and are taught by one or two faculty members together with a student teaching assistant. In the first term, students isolate phages from locally collected soil samples using Mycobacterium smegmatis as the primary bacterial host, a nonpathogenic strain relevant to understanding Mycobacterium tuberculosis. Students purify and characterize their phages, visualize them with electron microscopy, and extract and purify the DNA. The genome of one phage isolate is sequenced between terms, and in the second term, students annotate the genome using bioinformatics tools to define putative genes, understand genomic arrangements, and predict protein functions. Sequence and annotation quality is expertly reviewed and collated on the PhagesDB database (http://www.phagesdb.org) and submitted to GenBank. The Phamerator program (31) is used to explore genome relationships, and all phage samples are archived for use by the research community. The SEA-PHAGES course curriculum aims to introduce students to research methods and approaches, experimental design, and data interpretation but does not seek to instruct students in content matter outside the immediate biological context. But, as students are direct participants in scientific discovery, the goal is to engage, excite, increase the confidence of, and draw students into a cycle of self-motivation. If successful, we predicted that this would translate into enhanced performance in other STEM classes, greater retention within STEM training, and an increase in the numbers of students seeking continued research experiences beyond their freshman year. Program faculty and teaching assistants are trained at two weeklong workshops, one for each term of the course. Detailed manuals are provided, and community discussions are facilitated by a wiki site. Students and faculty present their findings at an annual SEA-PHAGES Research Symposium, at regional and national meetings, and through peer-reviewed publications. In the 5 years of the program, more than 4,800 students have participated (1,800 in 2012–2013), including STEM majors, non-STEM majors, honors students, and “typical” students. The number of participating schools has grown to more than 70 institutions (see Table S1 in the supplemental material), ranging from community colleges to research universities (Table 1). As can be seen from these program design features, the educational model of the SEA-PHAGES program integrates course-based learning within a framework of scientific activity, including a real-world scientific research agenda, professional networking, and scientific dissemination of results. In this way, the cost-effectiveness of course-based learning is combined with professional science with mutual benefits. TABLE 1 Diversity of institutions participating in SEA-PHAGES Carnegie classification a No. of schools Research universities; very high/high research activity 30 Master’s degree-granting colleges and universities 18 Baccalaureate colleges 22 Associate’s degree-granting colleges 3 a Schools offering the SEA-PHAGES course are organized according to their classification by the Carnegie Foundation for the Advancement of Teaching (2010). Gains in understanding viral diversity. The contributions of the SEA-PHAGES students have been essential to our current understanding of the diversity of mycobacteriophages, demonstrating the substantial impact of the distributed approach compared to what would be accomplished by a single laboratory, and have resulted in several publications with student authors (29, 31 – 39). Since the start of the program in 2008, SEA-PHAGES students have isolated 3,000 new phages (with global positioning system [GPS] coordinates recorded) and characterized their phages by DNA restriction analysis and electron microscopy. More than 450 mycobacteriophage genomes have been sequenced and annotated, and more than 350 sequences have been deposited in GenBank (Fig. 1). These genomes include many distinctly different types and numerous complex variants (40), and the entire genome collection codes for over 48,000 genes representing 3,780 sequence phamilies (a group of proteins sharing similarity to at least one other above threshold BlastP and Clustal values ). Correlations between genome and geography or time of isolation have been explored (35, 41), as well as the evolutionary mechanisms contributing to the pervasive genome mosaicism (33). The genomes contain numerous examples of biological intrigue, including novel inteins, introns, mobile elements, immunity systems, and regulatory schemes (33 – 35, 42 – 45), as well as potential for developing new tools for understanding tuberculosis (46 – 49). FIG 1 SEA-PHAGES students contribute to scientific knowledge. Results are from the first 5 years of the SEA-PHAGES program isolating new phages, showing the cumulative numbers of phages isolated (blue), cumulative numbers of genomes sequenced (orange), cumulative numbers of gene phamilies (purple), and total numbers of mycobacteriophages in GenBank (green). Not all genomes sequenced and annotated in year 5 are yet available in GenBank. The diversity of phages known to infect a single common host is remarkable; there are many thousands of potential bacterial hosts for phage isolation, and host range studies suggest that simply using a different strain of the same bacterial species will result in distinct profiles of diversity (38). With an estimated 1031 phage particles in the biosphere and a population that turns over every few days (23), there is an inexhaustible reservoir for discovery. Impacts on student education and retention. The Survey of Undergraduate Research Experience (SURE) and the Classroom Undergraduate Research Experience (CURE) measure the students’ assessment of their understanding of science and scientists, confidence in their ability to perform research, and their perceived gains in skills (50). The self-perceptions of learning gains, motivation and attitude, and career aspirations of the SEA-PHAGES course participants were assessed with pre- and postcourse SURE-like surveys (see Fig. S1 in the supplemental material). Twenty of the SEA-PHAGES survey items are shared with the regular SURE and CURE surveys, allowing the comparison of the SEA-PHAGES students’ learning gains with those of students who engaged in a dedicated summer research experience (SURE) and students who completed traditional science courses with no research element (CURE) (Fig. 2). The SEA-PHAGES students scored as well as or better on all 20 learning gains compared to the SURE students, reflecting benefits at least equivalent to those accrued through a summer-long apprentice-based undergraduate research experience. The increase in scientific self-efficacy reported by the SEA-PHAGES students is likely to be directly related to their retention in science (51). FIG 2 Student evaluation of learning gains. Mean learning gains for common survey items on the SURE (green diamonds), CURE (blue squares), and the SEA-PHAGES (red triangles) assessment instruments are shown. The SURE survey data represent 2,358 students who completed summer research in 2009; the CURE survey data represent 476 students who were enrolled in science courses that were described by their instructors as without a research element (data collected for fall 2007 through spring 2009); the SEA-PHAGES data represent 121 students who evaluated their course following the academic year 2008–2009. Error bars represent 2 standard errors around the mean. To analyze the effect of the SEA-PHAGES course on student persistence, we compared retention of students enrolled in the SEA-PHAGES course (77% first-year students and 95% STEM majors) with two benchmark statistics: the retention of all students and the retention of STEM majors with the same number of years of college experience and enrolled at the same school (Fig. 3A), important parameters given the typical rates for student attrition between first- and second-year STEM undergraduates (52). Data were from 27 comparisons from 20 institutions and show clearly that SEA-PHAGES students matriculated into the second year at significantly higher rates than did either benchmark group. Thus, early engagement in a research experience improves student retention into the second year. The positive impacts of this course-based research experience are similar to what has been reported for apprentice-based research experiences (5, 53), represent an effective response to the call to action in the National Science Foundation (NSF) Vision and Change and PCAST reports (1, 4), and provide validation for this educational model on a larger scale. FIG 3 (A) Retention of SEA-PHAGES participants (red) compared to other students at the same institution (blue), year 1 to year 2 of their college experience. Retention data were gathered from 20 institutions, with some institutions contributing data from multiple years, resulting in 27 sets of comparison data. Retention data were analyzed with a between-group analysis of variance with 3 levels of the independent variable (all majors, STEM majors, and SEA-PHAGES students) for 171 reports. The result was interpreted as significant at the 0.05 level. (B) SEA-PHAGES students (red) perform better than peers (blue) in traditional laboratory sections in the introductory lecture course. Results are for 127 SEA-PHAGES students and 1,120 students in the traditional laboratory course from six institutions. In the lecture course, SEA-PHAGES students averaged 2.95 on a 4.0 scale, compared to the 2.58 average of students in traditional lab sections. This difference was significant (t = 2.64; P < 0.05). Anticipating that research-stimulated motivation will influence student performance in other courses, we selected six schools that substituted the SEA-PHAGES course for a regular biology laboratory and compared the grades of participating students in the accompanying biology lecture course (Fig. 3B). We limited this analysis to schools that enrolled “typical” students into the PHAGES lab sections rather than those aimed at honors students or students at academic risk. The biology lecture course grades of SEA-PHAGES students were compared directly to those of peers enrolled in the same lecture course but in the regular biology laboratory. As is the case with most applied research, students were not randomly assigned to conditions, and even among these “typical” students, there may have been some self-selection for registration in the SEA-PHAGES course. We observed substantial differences in both the average grades and the grade distribution of SEA-PHAGES students relative to those of students in traditional lab sections (Fig. 3B), and although these data are preliminary and warrant further study, they suggest that there could be broad educational benefits to the SEA-PHAGES experience. Because of the concern that SEA-PHAGES students might suffer from lack of exposure to a broader coverage of subject matter in the regular laboratory course, we developed a 25-item pre- and postcourse survey of biological concepts (see Fig. S2 in the supplemental material) which was administered to students before and after the laboratory courses. There was no significant difference in performances on the test between SEA-PHAGES students and the comparison group of students (see Fig. S3). Both groups improved from pretest to posttest, and there was no significant difference between the groups in terms of the extent of their improvement. The lack of exposure to additional topics in the SEA-PHAGES course thus had no obvious detrimental effect. DISCUSSION The HHMI SEA-PHAGES program provides a general model for accomplishing improvements in the persistence of students in science by transforming a small-scale scientific inquiry into a cross-institution education platform that engages first-year students. The outcomes are consistent and robust, benefitting diverse groups of students across a variety of institutions. The materials costs are similar to those of other inquiry-based courses, and many institutions have implemented the course without external support, other than assistance with sequencing costs and programmatic and scientific support from HHMI and the University of Pittsburgh (some schools received direct external support for materials during their first 3 years in the program). The size and diversity of the phage population provide an inexhaustible wealth of biological novelty that imposes no obvious limit on the number of students who can participate. Future opportunities include further broadening the implementation of the SEA-PHAGES course as well as extending the model to development of similar projects in which scientific discovery, project ownership, and simple entry points can be implemented at the first-year college level. Meeting these opportunities will lead to a broad and sustainable enhancement of undergraduate science education, an advancement of scientific knowledge, and an increase of student persistence in science. MATERIALS AND METHODS Participants. The study was conducted with SEA-PHAGES faculty and students in the United States and the Commonwealth of Puerto Rico. David Lopatto and participant institutions obtained appropriate institutional review board (IRB) approval. SEA-PHAGES faculty are trained in a weeklong workshop focusing on in situ procedures and pedagogy in preparation for the fall semester and a weeklong workshop focusing on in silico bioinformatics tools in preparation for the spring semester. Faculty and students are invited to a SEA-PHAGES National Symposium to present their scientific findings. The SEA office conducts annual site visits and provides continuous technical support for institutions year-round. The SEA Wiki maintains an up-to-date depository for announcements, communication forums for faculty and students, curriculum resources, instructional materials, and research archives. SEA-PHAGES faculty members recruited comparison group students on a volunteer basis to enhance the validity of statistical analysis. The comparison group students were recruited among students taking introductory laboratory courses. Except for the student grade analysis, comparison group students cannot be matched to SEA-PHAGES students on each campus, so statistical analysis was limited to quasiexperimental analysis based on a nonequivalent comparison group. Systemic Research sent out invitations to all consenting students’ e-mail addresses individually. Analysis. During academic year 2009–2010, different aspects of the SEA-PHAGES and comparison group were measured. White/Caucasian students made up the majority of each group, 66% of SEA-PHAGES students and 76% of comparison group students. The majority of both groups lived in suburban communities (66% SEA-PHAGES and 64% comparison group students), attended public high schools (83% SEA-PHAGES and 83% comparison group students), and were in their first year in college (SEA-PHAGES, 77% first-year students, 18% sophomores; comparison group, 70% first-year students, 20% sophomores). There were a higher percentage of male students in the SEA-PHAGES course (38%) than in the comparison group (29%), but in both groups, female students were the clear majority. Retention rates. The Institutional Annual Survey measures student retention rates by tracking full-time, first-time entering students who are seeking bachelor’s degrees. The Institutional Annual Survey was conducted among institutions during November to December. Retention rates were calculated for students returning in fall 2008 and fall 2009. An analysis of variance was performed over 3 groups (all majors, STEM majors, and SEA-PHAGES students). The data were reported by institution and category, including 63 reports for all majors, 43 reports for STEM majors, and 65 reports for SEA-PHAGES students. The SEA CURE survey. The Classroom Undergraduate Research Experience (CURE) survey was specially adapted to the SEA-PHAGES program by David Lopatto (Grinnell College, Grinnell, IA). The CURE survey consists of multiple sections, including institution, class, demographics, science-related activities, major and minor concentration, postgraduate academic goals, experiences in laboratory course elements, experience in research, engagement in activities or endeavors, course benefit, learning experience in laboratory experiments and tools, overall course evaluation, and opinions about science. Systemic Research added a few questions to the postcourse CURE survey to collect data regarding students’ SEA-PHAGES course satisfaction, SEA Wiki access and utilization, SEA-PHAGES research paper and presentation experience, and general comments. The survey was administered twice a year: the presurvey at the beginning of the fall semester and the postsurvey at the end of the spring semester. As with the Biological Concepts Survey (BCS), Systemic Research developed the online survey forms using the Vovici EFM Community Professional website. The pre- and postcourse survey invitations were e-mailed to individual students according to their academic calendars. Using Vovici’s survey follow-up feature, three reminder e-mails were sent after the initial invitations. The collected survey responses were securely saved in a dedicated Vovici HHMI website and Systemic Research’s NGRI student database. The SURE survey data represent 2,358 students who completed summer research in 2009; the CURE survey data represent 476 students who evaluated science courses that were described by their instructors as without a research element (data collected fall 2007 through spring 2009); the SEA-PHAGES data represent 121 students who evaluated their course following the academic year 2008–2009. Mean learning gains were calculated for each category of the 20 items common to both the CURE and SURE surveys. Grades. Eleven institutions submitted their SEA-PHAGES students’ laboratory and introductory biology course performance data for fall 2008 and spring 2009 in the academic year 2008–2009 and fall 2009 and spring 2010 in the academic year 2009–2010. Letter grade distributions for both SEA-PHAGES and comparison students were collected. Six institutions had matched data that were utilized in the analysis, with 127 SEA-PHAGES and 1,120 comparison student grades. For statistical analysis, the letter grades were assigned numerical values from 4 (grade A) to 0 (grade F). t tests were performed comparing the mean grades received by SEA-PHAGES students and comparison group students in the biology lecture course. Biological methods. Mycobacteriophage isolation was performed using Mycobacterium smegmatis mc2155 as a host, and phages were identified as PFU either by direct plating on bacterial lawns or after enrichment in the presence of M. smegmatis. Following purification and amplification, DNA was isolated and sequenced using Sanger, 454, or Ion Torrent technologies, using a shotgun approach followed by targeted sequencing to validate ambiguities and determine genome ends. Genome annotations were performed using various software platforms, including GBrowse (54), Apollo (55), DNAMaster (http://cobamide2.bio.pitt.edu/), Glimmer (56), GeneMark (57), and analysis programs available at the National Center for Biotechnology Information (NCBI). Comparative genomics used Phamerator (31) and Gepard (58). Assembled genome sequences and genome annotations were subjected to expert review prior to submission to GenBank. Detailed methods for phage isolation, sequencing, and analysis are available on PhagesDB (http://phagesdb.org). SUPPLEMENTAL MATERIAL Figure S1 SEA-CURE postsurvey. Download Figure S1, PDF file, 0.2 MB Figure S2 Biological Concepts Survey (BCS) questions. Download Figure S2, PDF file, 0.3 MB Figure S3 Biological Concepts Survey results. Download Figure S3, PDF file, 0.1 MB Table S1 Institutions offering the SEA-PHAGES course from 2008 to 2013. Table S1, PDF file, 0.1 MB.