18
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Sequence of purchases in credit card data reveal life styles in urban populations

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          From our most basic consumption to secondary needs, our spending habits reflect our life styles. Yet, in computational social sciences there is an open question about the existence of ubiquitous trends in spending habits by various groups at urban scale. Limited information collected by expenditure surveys have not proven conclusive in this regard. This is because, the frequency of purchases by type is highly uneven and follows a Zipf-like distribution. In this work, we apply text compression techniques to the purchase codes of credit card data to detect the significant sequences of transactions of each user. Five groups of consumers emerge when grouped by their similarity based on these sequences. Remarkably, individuals in each consumer group are also similar in age, total expenditure, gender, and the diversity of their social and mobility networks extracted by their mobile phone records. By properly deconstructing transaction data with Zipf-like distributions, we find that it can give us insights on collective behavior.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Modularity and community structure in networks

          M. Newman (2006)
          Many networks of interest in the sciences, including a variety of social and biological networks, are found to divide naturally into communities or modules. The problem of detecting and characterizing this community structure has attracted considerable recent attention. One of the most sensitive detection methods is optimization of the quality function known as "modularity" over the possible divisions of a network, but direct application of this method using, for instance, simulated annealing is computationally costly. Here we show that the modularity can be reformulated in terms of the eigenvectors of a new characteristic matrix for the network, which we call the modularity matrix, and that this reformulation leads to a spectral algorithm for community detection that returns results of better quality than competing methods in noticeably shorter running times. We demonstrate the algorithm with applications to several network data sets.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Power-law distributions in empirical data

            Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution -- the part of the distribution representing large but rare events -- and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data while in others the power law is ruled out.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Fast unfolding of communities in large networks

              We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks. .
                Bookmark

                Author and article information

                Journal
                2017-03-01
                Article
                1703.00409
                f10bdaf1-4315-4232-b5d8-82a9804aee67

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                28 pages, 15 figures
                physics.soc-ph cs.IT cs.SI math.IT stat.AP

                Social & Information networks,General physics,Numerical methods,Applications,Information systems & theory

                Comments

                Comment on this article