46
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      Submit your digital health research with an established publisher
      - celebrating 25 years of open access

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Garbage in, Garbage Out: Data Collection, Quality Assessment and Reporting Standards for Social Media Data Use in Health Research, Infodemiology and Digital Disease Detection

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Social media have transformed the communications landscape. People increasingly obtain news and health information online and via social media. Social media platforms also serve as novel sources of rich observational data for health research (including infodemiology, infoveillance, and digital disease detection detection). While the number of studies using social data is growing rapidly, very few of these studies transparently outline their methods for collecting, filtering, and reporting those data. Keywords and search filters applied to social data form the lens through which researchers may observe what and how people communicate about a given topic. Without a properly focused lens, research conclusions may be biased or misleading. Standards of reporting data sources and quality are needed so that data scientists and consumers of social media research can evaluate and compare methods and findings across studies.

          Objective

          We aimed to develop and apply a framework of social media data collection and quality assessment and to propose a reporting standard, which researchers and reviewers may use to evaluate and compare the quality of social data across studies.

          Methods

          We propose a conceptual framework consisting of three major steps in collecting social media data: develop, apply, and validate search filters. This framework is based on two criteria: retrieval precision (how much of retrieved data is relevant) and retrieval recall (how much of the relevant data is retrieved). We then discuss two conditions that estimation of retrieval precision and recall rely on—accurate human coding and full data collection—and how to calculate these statistics in cases that deviate from the two ideal conditions. We then apply the framework on a real-world example using approximately 4 million tobacco-related tweets collected from the Twitter firehose.

          Results

          We developed and applied a search filter to retrieve e-cigarette–related tweets from the archive based on three keyword categories: devices, brands, and behavior. The search filter retrieved 82,205 e-cigarette–related tweets from the archive and was validated. Retrieval precision was calculated above 95% in all cases. Retrieval recall was 86% assuming ideal conditions (no human coding errors and full data collection), 75% when unretrieved messages could not be archived, 86% assuming no false negative errors by coders, and 93% allowing both false negative and false positive errors by human coders.

          Conclusions

          This paper sets forth a conceptual framework for the filtering and quality evaluation of social data that addresses several common challenges and moves toward establishing a standard of reporting social data. Researchers should clearly delineate data sources, how data were accessed and collected, and the search filter building process and how retrieval precision and recall were calculated. The proposed framework can be adapted to other public social media platforms.

          Related collections

          Most cited references36

          • Record: found
          • Abstract: not found
          • Book: not found

          Categorical Data Analysis

            Bookmark
            • Record: found
            • Abstract: not found
            • Book: not found

            Epidemiology

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              A cross-sectional examination of marketing of electronic cigarettes on Twitter

              Background Rapid increases in marketing of e-cigarettes coincide with growth in e-cigarette use in recent years; however, little is known about how e-cigarettes are marketed on social media platforms. Methods Keywords were used to collect tweets related to e-cigarettes from the Twitter Firehose between 1 May 2012 and 30 June 2012. Tweets were coded for smoking cessation mentions, as well as health and safety mentions, and were classified as commercial or non-commercial (‘organic’) tweets using a combination of Naïve Bayes machine learning methods, keyword algorithms and human coding. Metadata associated with each tweet were used to examine the characteristics of accounts tweeting about e-cigarettes. Results 73 672 tweets related to e-cigarettes were captured in the study period, 90% of which were classified as commercial tweets. Accounts tweeting commercial e-cigarette content were associated with lower Klout scores, a measure of influence. Commercial tweeting was largely driven by a small group of highly active accounts, and 94% of commercial tweets included links to websites, many of which sell or promote e-cigarettes. Approximately 10% of commercial and organic tweets mentioned smoking cessation, and 34% of commercial tweets included mentions of prices or discounts for e-cigarettes. Conclusions Twitter appears to be an important marketing platform for e-cigarettes. Tweets related to e-cigarettes were overwhelmingly commercial, and a substantial proportion mentioned smoking cessation. E-cigarette marketing on Twitter may have public health implications. Continued surveillance of e-cigarette marketing on social media platforms is needed.
                Bookmark

                Author and article information

                Contributors
                Journal
                J Med Internet Res
                J. Med. Internet Res
                JMIR
                Journal of Medical Internet Research
                JMIR Publications Inc. (Toronto, Canada )
                1439-4456
                1438-8871
                February 2016
                26 February 2016
                : 18
                : 2
                : e41
                Affiliations
                [1] 1Health Media Collaboratory Institute for Health Research and Policy University of Illinois at Chicago Chicago, ILUnited States
                Author notes
                Corresponding Author: Yoonsang Kim ykim96@ 123456uic.edu
                Author information
                http://orcid.org/0000-0002-1685-1753
                http://orcid.org/0000-0002-1646-5422
                http://orcid.org/0000-0001-9278-9990
                Article
                v18i2e41
                10.2196/jmir.4738
                4788740
                26920122
                01d0217b-9a5e-4d88-a037-6613809d9dcd
                ©Yoonsang Kim, Jidong Huang, Sherry Emery. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 26.02.2016.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.

                History
                : 21 May 2015
                : 18 October 2015
                : 9 December 2015
                : 4 January 2016
                Categories
                Original Paper
                Original Paper

                Medicine
                social media,precision and recall,sensitivity and specificity,search filter,twitter,standard reporting,infodemiology,infoveillance,digital disease detection

                Comments

                Comment on this article