Skip to main content
Menu

Volume II: Core Attack Tactics

Eight tactics covering the foundational attack surface of language models — from prompt injection to deepfake-powered deception.

T1

Prompt & Context Subversion

16 techniques · 76 procedures · Risk 200–240

Manipulate model instructions and context

2025–2026 Threat Update

  • Policy Puppetry (HiddenLayer, April 2025) bypasses every frontier model by reformulating prompts as XML/INI/JSON policy configuration files.
  • Time Bandit (CERT/CC VU#733789) exploits temporal confusion in ChatGPT-4o by anchoring conversations in historical periods.
  • Princeton research (May 2025): shallow safety alignment applies constraints only to the first few tokens. A forced opening like "Sure, let me help you" bypasses all safety training.
T1 Prompt & Context Subversion
16 techniques
ID Technique Risk Rating Procs
T1-AT-001 Dialogue Hijacking 220 🟠 HIGH 5
T1-AT-002 Time-Based Context Manipulation 210 🟠 HIGH 5
T1-AT-003 Language Model Confusion 225 🟠 HIGH 5
T1-AT-004 Instruction Prefix/Suffix 235 🟠 HIGH 6
T1-AT-005 Permission Escalation Claims 240 🟠 HIGH 5
T1-AT-006 Prompt Template Injection 230 🟠 HIGH 5
T1-AT-007 Cognitive Overload 215 🟠 HIGH 4
T1-AT-008 Boundary Testing 200 🟠 HIGH 5
T1-AT-009 Simulation Requests 225 🟠 HIGH 5
T1-AT-010 Negative Instruction Reversal 210 🟠 HIGH 5
T1-AT-011 Error Message Exploitation 220 🟠 HIGH 4
T1-AT-012 Consent Manufacturing 205 🟠 HIGH 5
T1-AT-013 Instruction Commenting 215 🟠 HIGH 4
T1-AT-014 Authority Spoofing 240 🟠 HIGH 4
T1-AT-015 Obfuscation Through Complexity 220 🟠 HIGH 4
T1-AT-016 Session State Manipulation 235 🟠 HIGH 5
T2

Semantic & Linguistic Evasion

20 techniques · 161 procedures · Risk 155–210

Bypass filters through language manipulation

2025–2026 Threat Update

  • Emoji smuggling achieved 100% evasion success against multiple systems.
  • Zero-width characters and Unicode tags (U+E0001–U+E007F) routinely fool classifiers.
  • Homoglyph substitution using visually similar characters from different scripts evades word-level filters.
T2 Semantic & Linguistic Evasion
20 techniques
ID Technique Risk Rating Procs
T2-AT-001 Euphemism and Metaphor Exploitation 180 🟡 MEDIUM 10
T2-AT-002 Multi-Language Evasion 200 🟠 HIGH 7
T2-AT-003 Encoding and Obfuscation 190 🟡 MEDIUM 10
T2-AT-004 Unicode and Bidirectional Attacks 210 🟠 HIGH 10
T2-AT-005 Semantic Drift 175 🟡 MEDIUM 10
T2-AT-006 Linguistic Camouflage 185 🟡 MEDIUM 10
T2-AT-007 Phonetic Manipulation 170 🟡 MEDIUM 2
T2-AT-008 Synonym and Paraphrase Chains 165 🟡 MEDIUM 10
T2-AT-009 Code-Switching Attacks 195 🟡 MEDIUM 1
T2-AT-010 Transliteration Exploitation 185 🟡 MEDIUM 10
T2-AT-011 Abbreviation and Acronym Abuse 160 🟡 MEDIUM 2
T2-AT-012 Cultural Reference Encoding 170 🟡 MEDIUM 10
T2-AT-013 Grammatical Manipulation 175 🟡 MEDIUM 10
T2-AT-014 Semantic Bleaching 180 🟡 MEDIUM 5
T2-AT-015 Noise Injection 165 🟡 MEDIUM 10
T2-AT-016 Dialectical Variations 155 🟡 MEDIUM 10
T2-AT-017 Compression Techniques 170 🟡 MEDIUM 10
T2-AT-018 Semantic Field Manipulation 175 🟡 MEDIUM 10
T2-AT-019 Pragmatic Implication 185 🟡 MEDIUM 4
T2-AT-020 Register Shifting 160 🟡 MEDIUM 10
T3

Reasoning & Constraint Exploitation

19 techniques · 178 procedures · Risk 175–210

Exploit logical reasoning and constraints

2025–2026 Threat Update

  • H-CoT (Duke/Accenture, February 2025) hijacks chain-of-thought safety reasoning in o1/o3 and DeepSeek-R1.
  • Autonomous LRM jailbreaking (Nature Communications, August 2025): four reasoning models achieved 97.14% ASR as multi-turn adversarial agents.
  • Mousetrap (ACL 2025): iterative reasoning chains cause o1-mini to generate detailed harmful responses with chain length of just 2.
T3 Reasoning & Constraint Exploitation
19 techniques
ID Technique Risk Rating Procs
T3-AT-001 Fictional Framing 190 🟡 MEDIUM 10
T3-AT-002 Academic Pretense 195 🟡 MEDIUM 10
T3-AT-003 Counterfactual Reasoning 200 🟠 HIGH 10
T3-AT-004 Step-by-Step Extraction 210 🟠 HIGH 9
T3-AT-005 Goal Substitution 205 🟠 HIGH 10
T3-AT-006 Constraint Negation 185 🟡 MEDIUM 10
T3-AT-007 Socratic Method Exploitation 195 🟡 MEDIUM 8
T3-AT-008 Comparative Analysis 180 🟡 MEDIUM 10
T3-AT-009 Expertise Assumption 190 🟡 MEDIUM 10
T3-AT-010 Reverse Psychology 175 🟡 MEDIUM 10
T3-AT-011 Information Completion 185 🟡 MEDIUM 10
T3-AT-012 Capability Testing 200 🟠 HIGH 10
T3-AT-013 Logical Paradox Creation 210 🟠 HIGH 10
T3-AT-014 Incremental Boundary Pushing 195 🟡 MEDIUM 5
T3-AT-015 Context Weaponization 205 🟠 HIGH 10
T3-AT-016 Rationalization Chains 190 🟡 MEDIUM 6
T3-AT-017 Scenario Anchoring 185 🟡 MEDIUM 10
T3-AT-018 Debate Positioning 180 🟡 MEDIUM 10
T3-AT-019 Misdirection Through Complexity 175 🟡 MEDIUM 10
T4

Multi-Turn & Memory Manipulation

16 techniques · 147 procedures · Risk 185–240

Leverage conversation history and memory

2025–2026 Threat Update

  • Multi-turn is now the dominant attack modality. Reasoning models as adversarial agents achieve 97% ASR where single-turn attacks fail.
  • DeepSeek R1 exhibited 100% ASR across 50 HarmBench prompts. Wallarm extracted DeepSeek's entire hidden system prompt via bias-based response logic.
  • Jailbreak attempts succeed roughly 20% of the time, averaging 42 seconds and 5 interactions.
T4 Multi-Turn & Memory Manipulation
16 techniques
ID Technique Risk Rating Procs
T4-AT-001 Conversation Context Poisoning 220 🟠 HIGH 10
T4-AT-002 Memory Instruction Injection 240 🟠 HIGH 10
T4-AT-003 Session State Manipulation 210 🟠 HIGH 10
T4-AT-004 Cross-Conversation Contamination 195 🟡 MEDIUM 10
T4-AT-005 Incremental Jailbreak Assembly 230 🟠 HIGH 10
T4-AT-006 False History Creation 200 🟠 HIGH 10
T4-AT-007 Context Window Exhaustion 205 🟠 HIGH 10
T4-AT-008 Conversation Forking 190 🟡 MEDIUM 3
T4-AT-009 Temporal Anchoring 185 🟡 MEDIUM 10
T4-AT-010 State Confusion Attack 215 🟠 HIGH 4
T4-AT-011 Memory Poisoning 235 🟠 HIGH 10
T4-AT-012 Trust Building Exploitation 210 🟠 HIGH 10
T4-AT-013 Session Hijacking 225 🟠 HIGH 10
T4-AT-014 Conversation Replay Attack 205 🟠 HIGH 10
T4-AT-015 Multi-Turn Social Engineering 220 🟠 HIGH 10
T4-AT-016 Context Fragmentation 195 🟡 MEDIUM 10
T5

Model & API Exploitation

16 techniques · 142 procedures · Risk 165–230

Attack model interfaces and APIs

2025–2026 Threat Update

  • EchoLeak (CVE-2025-32711): zero-click prompt injection in Microsoft 365 Copilot exfiltrates chat history.
  • CVE-2025-53773 (CVSS 9.6): RCE via prompt injection in GitHub Copilot/VS Code.
  • OpenClaw crisis: 42,665+ publicly accessible AI agent instances, 93.4% with critical auth bypass.
T5 Model & API Exploitation
16 techniques
ID Technique Risk Rating Procs
T5-AT-001 Parameter Manipulation 180 🟡 MEDIUM 10
T5-AT-002 Token Probability Extraction 210 🟠 HIGH 10
T5-AT-003 Cache Poisoning 200 🟠 HIGH 10
T5-AT-004 Rate Limit Evasion 170 🟡 MEDIUM 10
T5-AT-005 Model Fingerprinting 185 🟡 MEDIUM 1
T5-AT-006 API Endpoint Abuse 190 🟡 MEDIUM 10
T5-AT-007 Context Length Exploitation 195 🟡 MEDIUM 10
T5-AT-008 Response Streaming Exploitation 175 🟡 MEDIUM 10
T5-AT-009 Tokenization Exploits 180 🟡 MEDIUM 10
T5-AT-010 Batch Processing Attacks 200 🟠 HIGH 10
T5-AT-011 Error Message Mining 165 🟡 MEDIUM 10
T5-AT-012 Resource Exhaustion 205 🟠 HIGH 10
T5-AT-013 Version Downgrade Attacks 190 🟡 MEDIUM 1
T5-AT-014 Side Channel Attacks 210 🟠 HIGH 10
T5-AT-015 API Authentication Bypass 230 🟠 HIGH 10
T5-AT-016 Request Smuggling 215 🟠 HIGH 10
T6

Training & Feedback Poisoning

15 techniques · 141 procedures · Risk 210–270

Corrupt training data and feedback

2025–2026 Threat Update

  • Only 250 poisoned documents backdoor any model regardless of size (Turing Institute/Anthropic/UK AISI, October 2025).
  • Frontier models o3, Claude 3.7 Sonnet, and o1 all exhibit reward hacking (METR, June 2025).
  • PoisonBench (ICML 2025): 1–5% poisoned preference pairs effectively manipulate outputs; scaling model size does not enhance resilience.
T6 Training & Feedback Poisoning
15 techniques
ID Technique Risk Rating Procs
T6-AT-001 Reward Hacking 250 🔴 CRITICAL 10
T6-AT-002 Dataset Contamination 260 🔴 CRITICAL 10
T6-AT-003 Backdoor Insertion 270 🔴 CRITICAL 1
T6-AT-004 Fine-Tuning Attacks 240 🟠 HIGH 10
T6-AT-005 Synthetic Data Poisoning 235 🟠 HIGH 10
T6-AT-006 Annotation Manipulation 225 🟠 HIGH 10
T6-AT-007 Preference Learning Corruption 230 🟠 HIGH 10
T6-AT-008 Model Update Hijacking 245 🟠 HIGH 10
T6-AT-009 Evaluation Set Contamination 220 🟠 HIGH 10
T6-AT-010 Knowledge Distillation Attacks 215 🟠 HIGH 10
T6-AT-011 Reinforcement Signal Manipulation 240 🟠 HIGH 10
T6-AT-012 Curriculum Learning Exploitation 210 🟠 HIGH 10
T6-AT-013 Active Learning Exploitation 225 🟠 HIGH 10
T6-AT-014 Self-Supervised Poisoning 230 🟠 HIGH 10
T6-AT-015 Few-Shot Learning Attacks 220 🟠 HIGH 10
T7

Output Manipulation & Exfiltration

15 techniques · 146 procedures · Risk 165–200

Manipulate outputs and extract data

2025–2026 Threat Update

  • ChatGPT and Grok conversations appeared in Google search results via insecure share links.
  • 60% of employees accept security risks to use unsanctioned "Shadow AI" tools (BlackFog 2025).
  • C2PA v2.2 watermarking faces a fundamental trilemma: no watermark can simultaneously be robust, unforgeable, and publicly detectable.
T7 Output Manipulation & Exfiltration
15 techniques
ID Technique Risk Rating Procs
T7-AT-001 Reasoning Chain Disclosure 190 🟡 MEDIUM 10
T7-AT-002 Information Fragmentation 180 🟡 MEDIUM 6
T7-AT-003 Output Format Exploitation 175 🟡 MEDIUM 10
T7-AT-004 Side Channel Leakage 195 🟡 MEDIUM 10
T7-AT-005 Metadata Extraction 185 🟡 MEDIUM 10
T7-AT-006 Steganographic Output 170 🟡 MEDIUM 10
T7-AT-007 Iterative Refinement Extraction 175 🟡 MEDIUM 10
T7-AT-008 Translation Leakage 165 🟡 MEDIUM 10
T7-AT-009 Analogy Extraction 180 🟡 MEDIUM 10
T7-AT-010 Differential Response Analysis 190 🟡 MEDIUM 10
T7-AT-011 Schema-Based Extraction 185 🟡 MEDIUM 10
T7-AT-012 Aggregation Attacks 200 🟠 HIGH 10
T7-AT-013 Capability Probing 175 🟡 MEDIUM 10
T7-AT-014 Output Redirection 180 🟡 MEDIUM 10
T7-AT-015 Compression-Based Extraction 170 🟡 MEDIUM 10
T8

External Deception & Misinformation

15 techniques · 150 procedures · Risk 185–240

Generate deceptive content

2025–2026 Threat Update

  • Deepfake fraud reached $1.1 billion in 2025 (3x from 2024). Voice phishing surged 442%.
  • North Korean IT worker infiltrations grew 220%, generating $250M–$600M annually. Synthetic interview identity created in 70 minutes.
  • AI-generated CSAM reports: 440,419 in H1 2025 (624% increase from all of 2024).
T8 External Deception & Misinformation
15 techniques
ID Technique Risk Rating Procs
T8-AT-001 Authority Impersonation 230 🟠 HIGH 10
T8-AT-002 Synthetic Evidence Generation 220 🟠 HIGH 10
T8-AT-003 Conspiracy Theory Amplification 210 🟠 HIGH 10
T8-AT-004 Deepfake Narrative Creation 215 🟠 HIGH 10
T8-AT-005 Social Engineering Scripts 200 🟠 HIGH 10
T8-AT-006 Targeted Harassment Content 195 🟡 MEDIUM 10
T8-AT-007 Disinformation Campaign Content 225 🟠 HIGH 10
T8-AT-008 Synthetic Testimony Generation 190 🟡 MEDIUM 10
T8-AT-009 Radicalization Content 240 🟠 HIGH 10
T8-AT-010 False Flag Content 205 🟠 HIGH 10
T8-AT-011 Election Manipulation Content 235 🟠 HIGH 10
T8-AT-012 Synthetic Media Support 185 🟡 MEDIUM 10
T8-AT-013 Psychological Manipulation Content 200 🟠 HIGH 10
T8-AT-014 False Crisis Generation 210 🟠 HIGH 10
T8-AT-015 Identity Fabrication 195 🟡 MEDIUM 10