1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Adversarial misuse, particularly through `jailbreaking' that circumvents a model's safety and ethical protocols, poses a significant challenge for Large Language Models (LLMs). This paper delves into the mechanisms behind such successful attacks, introducing a hypothesis for the safety mechanism of aligned LLMs: intent security recognition followed by response generation. Grounded in this hypothesis, we propose CodeChameleon, a novel jailbreak framework based on personalized encryption tactics. To elude the intent security recognition phase, we reformulate tasks into a code completion format, enabling users to encrypt queries using personalized encryption functions. To guarantee response generation functionality, we embed a decryption function within the instructions, which allows the LLM to decrypt and execute the encrypted queries successfully. We conduct extensive experiments on 7 LLMs, achieving state-of-the-art average Attack Success Rate (ASR). Remarkably, our method achieves an 86.6\% ASR on GPT-4-1106.

          Related collections

          Author and article information

          Journal
          26 February 2024
          Article
          2402.16717
          13e26376-1046-4560-85c5-ad47e010c25c

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          cs.CL cs.AI cs.CR

          Theoretical computer science,Security & Cryptology,Artificial intelligence
          Theoretical computer science, Security & Cryptology, Artificial intelligence

          Comments

          Comment on this article