67
views
0
recommends
+1 Recommend
1 collections
    1
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      MCLEAN: Multilevel Clustering Exploration As Network

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Finding useful patterns in datasets has attracted considerable interest in the field of visual analytics. One of the most common tasks is the identification and representation of clusters. However, this is non-trivial in heterogeneous datasets since the data needs to be analyzed from different perspectives. Indeed, highly variable patterns may mask underlying trends in the dataset. Dendrograms are graphical representations resulting from agglomerative hierarchical clustering and provide a framework for viewing the clustering at different levels of detail. However, dendrograms become cluttered when the dataset gets large, and the single cut of the dendrogram to demarcate different clusters can be insufficient in heterogeneous datasets. In this work, we propose a visual analytics methodology called MCLEAN that offers a general approach for guiding the user through the exploration and detection of clusters. Powered by a graph-based transformation of the relational data, it supports a scalable environment for representation of heterogeneous datasets by changing the spatialization. We thereby combine multilevel representations of the clustered dataset with community finding algorithms. Our approach entails displaying the results of the heuristics to users, providing a setting from which to start the exploration and data analysis. To evaluate our proposed approach, we conduct a qualitative user study, where participants are asked to explore a heterogeneous dataset, comparing the results obtained by MCLEAN with the dendrogram. These qualitative results reveal that MCLEAN is an effective way of aiding users in the detection of clusters in heterogeneous datasets. The proposed methodology is implemented in an R package available at https://bitbucket.org/vda-lab/mclean.

          Most cited references30

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Maps of random walks on complex networks reveal community structure

          To comprehend the multipartite organization of large-scale biological and social systems, we introduce a new information theoretic approach that reveals community structure in weighted and directed networks. The method decomposes a network into modules by optimally compressing a description of information flows on the network. The result is a map that both simplifies and highlights the regularities in the structure and their relationships. We illustrate the method by making a map of scientific communication as captured in the citation patterns of more than 6000 journals. We discover a multicentric organization with fields that vary dramatically in size and degree of integration into the network of science. Along the backbone of the network -- including physics, chemistry, molecular biology, and medicine -- information flows bidirectionally, but the map reveals a directional pattern of citation from the applied fields to the basic sciences.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R.

            Hierarchical clustering is a widely used method for detecting clusters in genomic data. Clusters are defined by cutting branches off the dendrogram. A common but inflexible method uses a constant height cutoff value; this method exhibits suboptimal performance on complicated dendrograms. We present the Dynamic Tree Cut R package that implements novel dynamic branch cutting methods for detecting clusters in a dendrogram depending on their shape. Compared to the constant height cutoff method, our techniques offer the following advantages: (1) they are capable of identifying nested clusters; (2) they are flexible-cluster shape parameters can be tuned to suit the application at hand; (3) they are suitable for automation; and (4) they can optionally combine the advantages of hierarchical clustering and partitioning around medoids, giving better detection of outliers. We illustrate the use of these methods by applying them to protein-protein interaction network data and to a simulated gene expression data set. The Dynamic Tree Cut method is implemented in an R package available at http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Maps of random walks on complex networks reveal community structure

              To comprehend the multipartite organization of large-scale biological and social systems, we introduce an information theoretic approach that reveals community structure in weighted and directed networks. We use the probability flow of random walks on a network as a proxy for information flows in the real system and decompose the network into modules by compressing a description of the probability flow. The result is a map that both simplifies and highlights the regularities in the structure and their relationships. We illustrate the method by making a map of scientific communication as captured in the citation patterns of >6,000 journals. We discover a multicentric organization with fields that vary dramatically in size and degree of integration into the network of science. Along the backbone of the network-including physics, chemistry, molecular biology, and medicine-information flows bidirectionally, but the map reveals a directional pattern of citation from the applied fields to the basic sciences.
                Bookmark

                Author and article information

                Contributors
                Journal
                PeerJ Comput Sci
                PeerJ Comput Sci
                peerj-cs
                peerj-cs
                PeerJ Computer Science
                PeerJ Inc. (San Francisco, USA )
                2376-5992
                29 January 2018
                2018
                : 4
                : e145
                Affiliations
                [-1] Department of Electrical Engineering (ESAT) STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven , Leuven, Belgium
                [-2] imec, KU Leuven , Leuven, Belgium
                Article
                cs-145
                10.7717/peerj-cs.145
                7924466
                39d5cac7-a969-40ff-89db-2a4c1ad78466
                ©2018 Alcaide and Aerts

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

                History
                : 4 December 2017
                : 10 January 2018
                Funding
                Funded by: imec strategic funding 2017
                Funded by: IWT SBO Accumulate
                Award ID: 150056
                Funded by: KU Leuven CoE PFV/10/016 SymBioSys
                This research was supported by imec strategic funding 2017, IWT SBO Accumulate 150056, and KU Leuven CoE PFV/10/016 SymBioSys. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Data Science
                Visual Analytics

                exploratory data analysis,graph and network visualization,hierarchical clustering,visual analytics

                Comments

                Comment on this article