34
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Analysing high-throughput sequencing data in Python with HTSeq 2.0

      brief-report

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Summary

          HTSeq 2.0 provides a more extensive application programming interface including a new representation for sparse genomic data, enhancements for htseq-count to suit single-cell omics, a new script for data using cell and molecular barcodes, improved documentation, testing and deployment, bug fixes and Python 3 support.

          Availability and implementation

          HTSeq 2.0 is released as an open-source software under the GNU General Public License and is available from the Python Package Index at https://pypi.python.org/pypi/HTSeq. The source code is available on Github at https://github.com/htseq/htseq.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references9

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          SciPy 1.0: fundamental algorithms for scientific computing in Python

          SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            HTSeq—a Python framework to work with high-throughput sequencing data

            Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.de
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Array programming with NumPy

              Array programming provides a powerful, compact and expressive syntax for accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It has an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, materials science, engineering, finance and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves 1 and in the first imaging of a black hole 2 . Here we review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data. NumPy is the foundation upon which the scientific Python ecosystem is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Owing to its central position in the ecosystem, NumPy increasingly acts as an interoperability layer between such array computation libraries and, together with its application programming interface (API), provides a flexible framework to support the next decade of scientific and industrial analysis.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                15 May 2022
                21 March 2022
                21 March 2022
                : 38
                : 10
                : 2943-2945
                Affiliations
                School of Clinical Medicine, University of New South Wales , Sydney, NSW 2033, Australia
                Adult Cancer Program, Lowy Cancer Research Centre, University of New South Wales , Sydney, NSW 2033, Australia
                Bioquant Center, University of Heidelberg , 69120 Heidelberg, Germany
                Division of Surgery, Oncology and Pathology, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University , Lund, Sweden
                School of Clinical Medicine, University of New South Wales , Sydney, NSW 2033, Australia
                Adult Cancer Program, Lowy Cancer Research Centre, University of New South Wales , Sydney, NSW 2033, Australia
                Department of Pathology, School of Medical Sciences, University of New South Wales , Sydney, NSW 2052, Australia
                Department of Haematology, The Prince of Wales Hospital , Sydney, NSW 2031, Australia
                School of Clinical Medicine, University of New South Wales , Sydney, NSW 2033, Australia
                Adult Cancer Program, Lowy Cancer Research Centre, University of New South Wales , Sydney, NSW 2033, Australia
                Cellular Genomics Futures Institute, University of New South Wales , Sydney, NSW 2033, Australia
                Author notes
                To whom correspondence should be addressed. fabio.zanini@ 123456unsw.edu.au
                Author information
                https://orcid.org/0000-0002-7399-8014
                https://orcid.org/0000-0003-4868-1805
                https://orcid.org/0000-0002-7651-883X
                https://orcid.org/0000-0002-0509-8962
                https://orcid.org/0000-0001-7097-8539
                Article
                btac166
                10.1093/bioinformatics/btac166
                9113351
                35561197
                39c44083-1de8-463e-adc6-0e8db02360d6
                © The Author(s) 2022. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 17 November 2021
                : 04 February 2022
                : 14 March 2022
                : 17 March 2022
                : 31 March 2022
                Page count
                Pages: 3
                Funding
                Funded by: European Molecular Biology Organization Fellowship;
                Award ID: ALTF 269–2016
                Award ID: GNT1200271
                Funded by: National Health and Medical Research Council, DOI 10.13039/501100000925;
                Categories
                Applications Notes
                Gene Expression
                AcademicSubjects/SCI01060

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article