8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The Language of Legal and Illegal Activity on the Darknet

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The non-indexed parts of the Internet (the Darknet) have become a haven for both legal and illegal anonymous activity. Given the magnitude of these networks, scalably monitoring their activity necessarily relies on automated tools, and notably on NLP tools. However, little is known about what characteristics texts communicated through the Darknet have, and how well off-the-shelf NLP tools do on this domain. This paper tackles this gap and performs an in-depth investigation of the characteristics of legal and illegal text in the Darknet, comparing it to a clear net website with similar content as a control condition. Taking drug-related websites as a test case, we find that texts for selling legal and illegal drugs have several linguistic characteristics that distinguish them from one another, as well as from the control condition, among them the distribution of POS tags, and the coverage of their named entities in Wikipedia.

          Related collections

          Most cited references3

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Domain adaptation with structural correspondence learning

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            ToRank: Identifying the most influential suspicious domains in the Tor network

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Quantifying paedophile activity in a large p2p system

                Bookmark

                Author and article information

                Journal
                14 May 2019
                Article
                1905.05543
                9caa3a5d-c7dd-47f6-8c09-ab1dd94a6f47

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Accepted to ACL 2019
                cs.CL

                Theoretical computer science
                Theoretical computer science

                Comments

                Comment on this article