33
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Mining Anonymity: Identifying Sensitive Accounts on Twitter

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We explore the feasibility of automatically finding accounts that publish sensitive content on Twitter. One natural approach to this problem is to first create a list of sensitive keywords, and then identify Twitter accounts that use these words in their tweets. But such an approach may overlook sensitive accounts that are not covered by the subjective choice of keywords. In this paper, we instead explore finding sensitive accounts by examining the percentage of anonymous and identifiable followers the accounts have. This approach is motivated by an earlier study showing that sensitive accounts typically have a large percentage of anonymous followers and a small percentage of identifiable followers. To this end, we first considered the problem of automatically determining if a Twitter account is anonymous or identifiable. We find that simple techniques, such as checking for name-list membership, perform poorly. We designed a machine learning classifier that classifies accounts as anonymous or identifiable. We then classified an account as sensitive based on the percentages of anonymous and identifiable followers the account has. We applied our approach to approximately 100,000 accounts with 404 million active followers. The approach uncovered accounts that were sensitive for a diverse number of reasons. These accounts span across varied themes, including those that are not commonly proposed as sensitive or those that relate to socially stigmatized topics. To validate our approach, we applied Latent Dirichlet Allocation (LDA) topic analysis to the tweets in the detected sensitive and non-sensitive accounts. LDA showed that the sensitive and non-sensitive accounts obtained from the methodology are tweeting about distinctly different topics. Our results show that it is indeed possible to objectively identify sensitive accounts at the scale of Twitter.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Why we twitter

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found

            Tracking Suicide Risk Factors Through Twitter in the US

            Background: Suicide is a leading cause of death in the United States. Social media such as Twitter is an emerging surveillance tool that may assist researchers in tracking suicide risk factors in real time. Aims: To identify suicide-related risk factors through Twitter conversations by matching on geographic suicide rates from vital statistics data. Method: At-risk tweets were filtered from the Twitter stream using keywords and phrases created from suicide risk factors. Tweets were grouped by state and departures from expectation were calculated. The values for suicide tweeters were compared against national data of actual suicide rates from the Centers for Disease Control and Prevention. Results: A total of 1,659,274 tweets were analyzed over a 3-month period with 37,717 identified as at-risk for suicide. Midwestern and western states had a higher proportion of suicide-related tweeters than expected, while the reverse was true for southern and eastern states. A strong correlation was observed between state Twitter-derived data and actual state age-adjusted suicide data. Conclusion: Twitter may be a viable tool for real-time monitoring of suicide risk factors on a large scale. This study demonstrates that individuals who are at risk for suicide may be detected through social media.
              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Suspended accounts in retrospect

                Bookmark

                Author and article information

                Journal
                2017-02-01
                Article
                1702.00164
                5734068c-0cd4-40ef-8e16-99f0d97782de

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                A shorter 4-page version of this work has been published as a poster in the International AAAI Conference on Web and Social Media (ICWSM), 2016
                cs.SI cs.CR

                Social & Information networks,Security & Cryptology
                Social & Information networks, Security & Cryptology

                Comments

                Comment on this article