108
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Twitter is a social media giant famous for the exchange of short, 140-character messages called "tweets". In the scientific community, the microblogging site is known for openness in sharing its data. It provides a glance into its millions of users and billions of tweets through a "Streaming API" which provides a sample of all tweets matching some parameters preset by the API user. The API service has been used by many researchers, companies, and governmental institutions that want to extract knowledge in accordance with a diverse array of questions pertaining to social media. The essential drawback of the Twitter API is the lack of documentation concerning what and how much data users get. This leads researchers to question whether the sampled data is a valid representation of the overall activity on Twitter. In this work we embark on answering this question by comparing data collected using Twitter's sampled API service with data collected using the full, albeit costly, Firehose stream that includes every single published tweet. We compare both datasets using common statistical metrics as well as metrics that allow us to compare topics, networks, and locations of tweets. The results of our work will help researchers and practitioners understand the implications of using the Streaming API.

          Related collections

          Most cited references5

          • Record: found
          • Abstract: not found
          • Article: not found

          The centrality index of a graph

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Effects of missing data in social networks

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              The stability of centrality measures when networks are sampled

                Bookmark

                Author and article information

                Journal
                1306.5204

                Social & Information networks,General physics
                Social & Information networks, General physics

                Comments

                Comment on this article