Blog
About

  • Record: found
  • Abstract: found
  • Article: not found

Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Genome research

Software, Algorithms, Bayes Theorem, Chromosome Mapping, statistics & numerical data, Computer Simulation, DNA, genetics, DNA, Bacterial, Diploidy, Genome, Bacterial, Genome, Human, Humans, Polymorphism, Single Nucleotide, Reproducibility of Results, Salmonella paratyphi A, Sequence Alignment, Sequence Analysis, DNA

Read this article at

ScienceOpenPublisherPMC
Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

      Related collections

      Author and article information

      Journal
      18714091
      2577856
      10.1101/gr.078212.108

      Comments

      Comment on this article