0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      PathSim : meta path-based top-K similarity search in heterogeneous information networks

      1 , 1 , 2 , 3 , 4
      Proceedings of the VLDB Endowment
      VLDB Endowment

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Similarity search is a primitive operation in database and Web search engines. With the advent of large-scale heterogeneous information networks that consist of multi-typed, interconnected objects, such as the bibliographic networks and social media networks, it is important to study similarity search in such networks. Intuitively, two objects are similar if they are linked by many paths in the network. However, most existing similarity measures are defined for homogeneous networks. Different semantic meanings behind paths are not taken into consideration. Thus they cannot be directly applied to heterogeneous networks.

          In this paper, we study similarity search that is defined among the same type of objects in heterogeneous networks. Moreover, by considering different linkage paths in a network, one could derive various similarity semantics. Therefore, we introduce the concept of meta path-based similarity , where a meta path is a path consisting of a sequence of relations defined between different object types ( i.e. , structural paths at the meta level). No matter whether a user would like to explicitly specify a path combination given sufficient domain knowledge, or choose the best path by experimental trials, or simply provide training examples to learn it, meta path forms a common base for a network-based similarity search engine. In particular, under the meta path framework we define a novel similarity measure called PathSim that is able to find peer objects in the network ( e.g. , find authors in the similar field and with similar reputation), which turns out to be more meaningful in many scenarios compared with random-walk based similarity measures. In order to support fast online query processing for PathSim queries, we develop an efficient solution that partially materializes short meta paths and then concatenates them online to compute top- k results. Experiments on real data sets demonstrate the effectiveness and efficiency of our proposed paradigm.

          Related collections

          Most cited references20

          • Record: found
          • Abstract: not found
          • Article: not found

          Normalized cuts and image segmentation

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Measures of the Amount of Ecologic Association Between Species

            Lee Dice (1945)
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Cumulated gain-based evaluation of IR techniques

                Bookmark

                Author and article information

                Journal
                Proceedings of the VLDB Endowment
                Proc. VLDB Endow.
                VLDB Endowment
                2150-8097
                August 2011
                August 2011
                : 4
                : 11
                : 992-1003
                Affiliations
                [1 ]University of Illinois at Urbana-Champaign
                [2 ]University of California at Santa Barbara
                [3 ]University of Illinois at Chicago
                [4 ]Microsoft Corporation
                Article
                10.14778/3402707.3402736
                11113b36-581d-45d3-b172-911cdfb28641
                © 2011
                History

                Comments

                Comment on this article