37
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Strobe sequence design for haplotype assembly

      research-article
      1 , , 2 , 3 , 1
      BMC Bioinformatics
      BioMed Central
      The Ninth Asia Pacific Bioinformatics Conference (APBC 2011)
      11–14 January 2011

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Humans are diploid, carrying two copies of each chromosome, one from each parent. Separating the paternal and maternal chromosomes is an important component of genetic analyses such as determining genetic association, inferring evolutionary scenarios, computing recombination rates, and detecting cis-regulatory events. As the pair of chromosomes are mostly identical to each other, linking together of alleles at heterozygous sites is sufficient to phase, or separate the two chromosomes. In Haplotype Assembly, the linking is done by sequenced fragments that overlap two heterozygous sites. While there has been a lot of research on correcting errors to achieve accurate haplotypes via assembly, relatively little work has been done on designing sequencing experiments to get long haplotypes. Here, we describe the different design parameters that can be adjusted with next generation and upcoming sequencing technologies, and study the impact of design choice on the length of the haplotype.

          Results

          We show that a number of parameters influence haplotype length, with the most significant one being the advance length (distance between two fragments of a clone). Given technologies like strobe sequencing that allow for large variations in advance lengths, we design and implement a simulated annealing algorithm to sample a large space of distributions over advance-lengths. Extensive simulations on individual genomic sequences suggest that a non-trivial distribution over advance lengths results a 1-2 order of magnitude improvement in median haplotype length.

          Conclusions

          Our results suggest that haplotyping of large, biologically important genomic regions is feasible with current technologies.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: found
          • Article: not found

          Analysis of genetic inheritance in a family quartet by whole-genome sequencing.

          We analyzed the whole-genome sequences of a family of four, consisting of two siblings and their parents. Family-based sequencing allowed us to delineate recombination sites precisely, identify 70% of the sequencing errors (resulting in > 99.999% accuracy), and identify very rare single-nucleotide polymorphisms. We also directly estimated a human intergeneration mutation rate of approximately 1.1 x 10(-8) per position per haploid genome. Both offspring in this family have two recessive disorders: Miller syndrome, for which the gene was concurrently identified, and primary ciliary dyskinesia, for which causative genes have been previously identified. Family-based genome analysis enabled us to narrow the candidate genes for both of these Mendelian disorders to only four. Our results demonstrate the value of complete genome sequencing in families.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            HapCUT: an efficient and accurate algorithm for the haplotype assembly problem.

            The goal of the haplotype assembly problem is to reconstruct the two haplotypes (chromosomes) for an individual using a mix of sequenced fragments from the two chromosomes. This problem has been shown to be computationally intractable for various optimization criteria. Polynomial time algorithms have been proposed for restricted versions of the problem. In this article, we consider the haplotype assembly problem in the most general setting, i.e. fragments of any length and with an arbitrary number of gaps. We describe a novel combinatorial approach for the haplotype assembly problem based on computing max-cuts in certain graphs derived from the sequenced fragments. Levy et al. have sequenced the complete genome of a human individual and used a greedy heuristic to assemble the haplotypes for this individual. We have applied our method HapCUTto infer haplotypes from this data and demonstrate that the haplotypes inferred using HapCUT are significantly more accurate (20-25% lower maximum error correction scores for all chromosomes) than the greedy heuristic and a previously published method, Fast Hare. We also describe a maximum likelihood based estimator of the absolute accuracy of the sequence-based haplotypes using population haplotypes from the International HapMap project. A program implementing HapCUT is available on request.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              An update of the HLA genomic region, locus information and disease associations: 2004.

              The human major histocompatibility (MHC) genomic region at chromosomal position 6p21 encodes the six classical transplantation HLA genes and many other genes that have important roles in the regulation of the immune system as well as in some fundamental cellular processes. This small segment of the human genome has been associated with more than 100 diseases, including common diseases--such as diabetes, rheumatoid arthritis, psoriasis, asthma and various autoimmune disorders. The MHC 3.6 Mb genomic sequence was first reported in 1999 with the annotation of 224 gene loci. The locus and allelic information of the MHC continue to be updated by identifying newly mapped expressed genes and pseudogenes based on comparative genomics, SNP analysis and cDNA projects. Since 1999, new innovations in bioinformatics and gene-specific functional databases and studies on the MHC genes have resulted in numerous changes to gene names and better ways to update and link the MHC gene symbols, names and sequences together with function, variation and disease associations. In this study, we present a brief overview of the MHC genomic structure and the recent information that we have gathered on the MHC gene loci via LocusLink at the National Centre for Biological Information (http://www.ncbi.nih.gov/.) and the MHC genes' association with various diseases taken from publications and records in public databases, such as the Online Mendelian Inheritance in Man and the Genetic Association Database.
                Bookmark

                Author and article information

                Conference
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2011
                15 February 2011
                : 12
                : Suppl 1
                : S24
                Affiliations
                [1 ]Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
                [2 ]Pacific Biosciences, 1505 Adams Drive, Menlo Park, CA 94025, USA
                [3 ]Scripps Genomic Medicine, Scripps Translational Science Institute, La Jolla, CA 92037, USA
                Article
                1471-2105-12-S1-S24
                10.1186/1471-2105-12-S1-S24
                3044279
                21342554
                7a9a6369-0f27-4e17-a059-ef2575cb9807
                Copyright ©2011 Lo et al; licensee BioMed Central Ltd.

                This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                The Ninth Asia Pacific Bioinformatics Conference (APBC 2011)
                Inchon, Korea
                11–14 January 2011
                History
                Categories
                Research

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article