+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Viral coinfection analysis using a MinHash toolkit

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.



          Human papillomavirus (HPV) is a common sexually transmitted infection associated with cervical cancer that frequently occurs as a coinfection of types and subtypes. Highly similar sublineages that show over 100-fold differences in cancer risk are not distinguishable in coinfections with current typing methods.


          We describe an efficient set of computational tools, rkmh, for analyzing complex mixed infections of related viruses based on sequence data. rkmh makes extensive use of MinHash similarity measures, and includes utilities for removing host DNA and classifying reads by type, lineage, and sublineage. We show that rkmh is capable of assigning reads to their HPV type as well as HPV16 lineage and sublineages.


          Accurate read classification enables estimates of percent composition when there are multiple infecting lineages or sublineages. While we demonstrate rkmh for HPV with multiple sequencing technologies, it is also applicable to other mixtures of related sequences.

          Electronic supplementary material

          The online version of this article (10.1186/s12859-019-2918-y) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references 9

          • Record: found
          • Abstract: found
          • Article: not found

          Human papillomavirus genome variants.

          Amongst the human papillomaviruses (HPVs), the genus Alphapapillomavirus contains HPV types that are uniquely pathogenic. They can be classified into species and types based on genetic distances between viral genomes. Current circulating infectious HPVs constitute a set of viral genomes that have evolved with the rapid expansion of the human population. Viral variants were initially identified through restriction enzyme polymorphisms and more recently through sequence determination of viral fragments. Using partial sequence information, the history of variants, and the association of HPV variants with disease will be discussed with the main focus on the recent utilization of full genome sequence information for variant analyses. The use of multiple sequence alignments of complete viral genomes and phylogenetic analyses have begun to define variant lineages and sublineages using empirically defined differences of 1.0-10.0% and 0.5-1.0%, respectively. These studies provide the basis to define the genetics of HPV pathogenesis. © 2013 Elsevier Inc. All rights reserved.
            • Record: found
            • Abstract: found
            • Article: not found

            Human papillomavirus infection with multiple types: pattern of coinfection and risk of cervical disease.

            We investigated coinfection patterns for 25 human papillomavirus (HPV) types and assessed the risk conferred by multiple HPV types toward cervical disease. Sexually active women (n=5,871) in the NCI-sponsored Costa Rica HPV Vaccine Trial's prevaccination enrollment visit were analyzed. Genotyping for 25 HPVs was performed using SPF(10)/LiPA(25). We calculated odds ratios (ORs) to assess coinfection patterns for each genotype with 24 other genotypes. These ORs were pooled and compared with pair-specific ORs to identify genotype combinations that deviated from the pooled OR. We compared risk of CIN2+/HSIL+between multiple and single infections and assessed additive statistical interactions. Of the 2478 HPV-positive women, 1070 (43.2%) were infected with multiple types. Multiple infections occurred significantly more frequently than predicted by chance. However, this affinity to be involved in a coinfection (pooled OR for 300 type-type combinations=2.2; 95% confidence interval [CI]=2.1-2.4) was not different across HPV type-type combinations. Compared with single infections, coinfection with multiple α9 species was associated with significantly increased risk of CIN2+(OR=2.2; 95% CI=1.1-4.6) and HSIL+(OR=1.6; 95% CI=1.1-2.4). However, disease risk was similar to the sum of estimated risk from individual types, with little evidence for synergistic interactions. Coinfecting HPV genotypes occur at random and lead to cervical disease independently.
              • Record: found
              • Abstract: found
              • Article: not found

              HPV16 Sublineage Associations With Histology-Specific Cancer Risk Using HPV Whole-Genome Sequences in 3200 Women.

              HPV16 is a common sexually transmitted infection although few infections lead to cervical precancer/cancer; we cannot distinguish nor mechanistically explain why only certain infections progress. HPV16 can be classified into four main evolutionary-derived variant lineages (A, B, C, D) that have been previously suggested to have varying disease risks.

                Author and article information

                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                12 July 2019
                12 July 2019
                : 20
                [1 ]ISNI 0000 0004 1936 8075, GRID grid.48336.3a, Division of Cancer Epidemiology and Genetics, National Cancer Institute, ; Rockville, Maryland USA
                [2 ]ISNI 0000000121885934, GRID grid.5335.0, Department of Genetics, University of Cambridge, ; Cambridge, UK
                [3 ]ISNI 0000 0004 4665 8158, GRID grid.419407.f, Cancer Genomics Research Laboratory, Leidos Biomedical Research Inc., ; Frederick National Laboratory for Cancer Research, Frederick, MD USA
                [4 ]ISNI 0000 0004 0606 5382, GRID grid.10306.34, Wellcome Sanger Institute, ; Wellcome Genome Campus, Hinxton, UK
                [5 ]ISNI 0000 0000 9957 7758, GRID grid.280062.e, Women’s Health Research Institute, Kaiser Permanente Northern California, ; Oakland, California USA
                [6 ]ISNI 0000 0000 9957 7758, GRID grid.280062.e, Regional Laboratory, Kaiser Permanente Northern California, ; Oakland, California USA
                [7 ]ISNI 0000000121791997, GRID grid.251993.5, Department of Epidemiology and Population Health, Albert Einstein College of Medicine, ; Bronx, New York USA
                © The Author(s) 2019

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( applies to the data made available in this article, unless otherwise stated.

                Funded by: FundRef, Wellcome Trust;
                Award ID: WT207492
                Funded by: FundRef, Wellcome Trust;
                Award ID: WT206194
                Funded by: FundRef, National Institutes of Health;
                Award ID: HHSN261200800001E
                Funded by: FundRef, National Cancer Institute;
                Award ID: intramural research program of the Division of Cancer Epidemiology and Genetics
                Custom metadata
                © The Author(s) 2019


                Comment on this article