0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Beware the Jaccard: the choice of similarity measure is important and non-trivial in genomic colocalisation analysis

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The generation and systematic collection of genome-wide data is ever-increasing. This vast amount of data has enabled researchers to study relations between a variety of genomic and epigenomic features, including genetic variation, gene regulation and phenotypic traits. Such relations are typically investigated by comparatively assessing genomic co-occurrence. Technically, this corresponds to assessing the similarity of pairs of genome-wide binary vectors. A variety of similarity measures have been proposed for this problem in other fields like ecology. However, while several of these measures have been employed for assessing genomic co-occurrence, their appropriateness for the genomic setting has never been investigated. We show that the choice of similarity measure may strongly influence results and propose two alternative modelling assumptions that can be used to guide this choice. On both simulated and real genomic data, the Jaccard index is strongly altered by dataset size and should be used with caution. The Forbes coefficient (fold change) and tetrachoric correlation are less influenced by dataset size, but one should be aware of increased variance for small datasets.

          All results on simulated and real data can be inspected and reproduced at https://hyperbrowser.uio.no/sim-measure.

          Related collections

          Author and article information

          Journal
          Briefings in Bioinformatics
          Oxford University Press (OUP)
          1467-5463
          1477-4054
          October 17 2019
          October 17 2019
          Affiliations
          [1 ]Department of Informatics, University of Oslo, Oslo, Norway
          [2 ]Department of Mathematics, University of Oslo, Oslo, Norway
          [3 ]Science Institute, University of Iceland, Reykjavik, Iceland
          [4 ]Statistics For Innovation, Norwegian Computing Center, Oslo, Norway
          [5 ]Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, Oslo, Norway
          [6 ]Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Oslo, Norway
          Article
          10.1093/bib/bbz083
          31624847
          21724745-2a65-4bab-9e04-ead504a837f9
          © 2019

          https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model

          History

          Comments

          Comment on this article