Blog
About

1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Content-based data leakage detection using extended fingerprinting

      Preprint

      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Protecting sensitive information from unauthorized disclosure is a major concern of every organization. As an organizations employees need to access such information in order to carry out their daily work, data leakage detection is both an essential and challenging task. Whether caused by malicious intent or an inadvertent mistake, data loss can result in significant damage to the organization. Fingerprinting is a content-based method used for detecting data leakage. In fingerprinting, signatures of known confidential content are extracted and matched with outgoing content in order to detect leakage of sensitive content. Existing fingerprinting methods, however, suffer from two major limitations. First, fingerprinting can be bypassed by rephrasing (or minor modification) of the confidential content, and second, usually the whole content of document is fingerprinted (including non-confidential parts), resulting in false alarms. In this paper we propose an extension to the fingerprinting approach that is based on sorted k-skip-n-grams. The proposed method is able to produce a fingerprint of the core confidential content which ignores non-relevant (non-confidential) sections. In addition, the proposed fingerprint method is more robust to rephrasing and can also be used to detect a previously unseen confidential document and therefore provide better detection of intentional leakage incidents.

          Related collections

          Most cited references 18

          • Record: found
          • Abstract: not found
          • Article: not found

          A STATISTICAL INTERPRETATION OF TERM SPECIFICITY AND ITS APPLICATION IN RETRIEVAL

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Approximate nearest neighbors

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Similarity estimation techniques from rounding algorithms

                Bookmark

                Author and article information

                Journal
                2013-02-08
                Article
                1302.2028

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                Custom metadata
                cs.CR

                Security & Cryptology

                Comments

                Comment on this article