Volume II: Core Attack Tactics
Eight tactics covering the foundational attack surface of language models — from prompt injection to deepfake-powered deception.
Prompt & Context Subversion
16 techniques · 76 procedures · Risk 200–240
Manipulate model instructions and context
2025–2026 Threat Update
- • Policy Puppetry (HiddenLayer, April 2025) bypasses every frontier model by reformulating prompts as XML/INI/JSON policy configuration files.
- • Time Bandit (CERT/CC VU#733789) exploits temporal confusion in ChatGPT-4o by anchoring conversations in historical periods.
- • Princeton research (May 2025): shallow safety alignment applies constraints only to the first few tokens. A forced opening like "Sure, let me help you" bypasses all safety training.
T1 Prompt & Context Subversion 16 techniques
| ID | Technique | Risk | Rating | Procs |
|---|---|---|---|---|
T1-AT-001 | Dialogue Hijacking | 220 | 🟠 HIGH | 5 |
T1-AT-002 | Time-Based Context Manipulation | 210 | 🟠 HIGH | 5 |
T1-AT-003 | Language Model Confusion | 225 | 🟠 HIGH | 5 |
T1-AT-004 | Instruction Prefix/Suffix | 235 | 🟠 HIGH | 6 |
T1-AT-005 | Permission Escalation Claims | 240 | 🟠 HIGH | 5 |
T1-AT-006 | Prompt Template Injection | 230 | 🟠 HIGH | 5 |
T1-AT-007 | Cognitive Overload | 215 | 🟠 HIGH | 4 |
T1-AT-008 | Boundary Testing | 200 | 🟠 HIGH | 5 |
T1-AT-009 | Simulation Requests | 225 | 🟠 HIGH | 5 |
T1-AT-010 | Negative Instruction Reversal | 210 | 🟠 HIGH | 5 |
T1-AT-011 | Error Message Exploitation | 220 | 🟠 HIGH | 4 |
T1-AT-012 | Consent Manufacturing | 205 | 🟠 HIGH | 5 |
T1-AT-013 | Instruction Commenting | 215 | 🟠 HIGH | 4 |
T1-AT-014 | Authority Spoofing | 240 | 🟠 HIGH | 4 |
T1-AT-015 | Obfuscation Through Complexity | 220 | 🟠 HIGH | 4 |
T1-AT-016 | Session State Manipulation | 235 | 🟠 HIGH | 5 |
Semantic & Linguistic Evasion
20 techniques · 161 procedures · Risk 155–210
Bypass filters through language manipulation
2025–2026 Threat Update
- • Emoji smuggling achieved 100% evasion success against multiple systems.
- • Zero-width characters and Unicode tags (U+E0001–U+E007F) routinely fool classifiers.
- • Homoglyph substitution using visually similar characters from different scripts evades word-level filters.
T2 Semantic & Linguistic Evasion 20 techniques
| ID | Technique | Risk | Rating | Procs |
|---|---|---|---|---|
T2-AT-001 | Euphemism and Metaphor Exploitation | 180 | 🟡 MEDIUM | 10 |
T2-AT-002 | Multi-Language Evasion | 200 | 🟠 HIGH | 7 |
T2-AT-003 | Encoding and Obfuscation | 190 | 🟡 MEDIUM | 10 |
T2-AT-004 | Unicode and Bidirectional Attacks | 210 | 🟠 HIGH | 10 |
T2-AT-005 | Semantic Drift | 175 | 🟡 MEDIUM | 10 |
T2-AT-006 | Linguistic Camouflage | 185 | 🟡 MEDIUM | 10 |
T2-AT-007 | Phonetic Manipulation | 170 | 🟡 MEDIUM | 2 |
T2-AT-008 | Synonym and Paraphrase Chains | 165 | 🟡 MEDIUM | 10 |
T2-AT-009 | Code-Switching Attacks | 195 | 🟡 MEDIUM | 1 |
T2-AT-010 | Transliteration Exploitation | 185 | 🟡 MEDIUM | 10 |
T2-AT-011 | Abbreviation and Acronym Abuse | 160 | 🟡 MEDIUM | 2 |
T2-AT-012 | Cultural Reference Encoding | 170 | 🟡 MEDIUM | 10 |
T2-AT-013 | Grammatical Manipulation | 175 | 🟡 MEDIUM | 10 |
T2-AT-014 | Semantic Bleaching | 180 | 🟡 MEDIUM | 5 |
T2-AT-015 | Noise Injection | 165 | 🟡 MEDIUM | 10 |
T2-AT-016 | Dialectical Variations | 155 | 🟡 MEDIUM | 10 |
T2-AT-017 | Compression Techniques | 170 | 🟡 MEDIUM | 10 |
T2-AT-018 | Semantic Field Manipulation | 175 | 🟡 MEDIUM | 10 |
T2-AT-019 | Pragmatic Implication | 185 | 🟡 MEDIUM | 4 |
T2-AT-020 | Register Shifting | 160 | 🟡 MEDIUM | 10 |
Reasoning & Constraint Exploitation
19 techniques · 178 procedures · Risk 175–210
Exploit logical reasoning and constraints
2025–2026 Threat Update
- • H-CoT (Duke/Accenture, February 2025) hijacks chain-of-thought safety reasoning in o1/o3 and DeepSeek-R1.
- • Autonomous LRM jailbreaking (Nature Communications, August 2025): four reasoning models achieved 97.14% ASR as multi-turn adversarial agents.
- • Mousetrap (ACL 2025): iterative reasoning chains cause o1-mini to generate detailed harmful responses with chain length of just 2.
T3 Reasoning & Constraint Exploitation 19 techniques
| ID | Technique | Risk | Rating | Procs |
|---|---|---|---|---|
T3-AT-001 | Fictional Framing | 190 | 🟡 MEDIUM | 10 |
T3-AT-002 | Academic Pretense | 195 | 🟡 MEDIUM | 10 |
T3-AT-003 | Counterfactual Reasoning | 200 | 🟠 HIGH | 10 |
T3-AT-004 | Step-by-Step Extraction | 210 | 🟠 HIGH | 9 |
T3-AT-005 | Goal Substitution | 205 | 🟠 HIGH | 10 |
T3-AT-006 | Constraint Negation | 185 | 🟡 MEDIUM | 10 |
T3-AT-007 | Socratic Method Exploitation | 195 | 🟡 MEDIUM | 8 |
T3-AT-008 | Comparative Analysis | 180 | 🟡 MEDIUM | 10 |
T3-AT-009 | Expertise Assumption | 190 | 🟡 MEDIUM | 10 |
T3-AT-010 | Reverse Psychology | 175 | 🟡 MEDIUM | 10 |
T3-AT-011 | Information Completion | 185 | 🟡 MEDIUM | 10 |
T3-AT-012 | Capability Testing | 200 | 🟠 HIGH | 10 |
T3-AT-013 | Logical Paradox Creation | 210 | 🟠 HIGH | 10 |
T3-AT-014 | Incremental Boundary Pushing | 195 | 🟡 MEDIUM | 5 |
T3-AT-015 | Context Weaponization | 205 | 🟠 HIGH | 10 |
T3-AT-016 | Rationalization Chains | 190 | 🟡 MEDIUM | 6 |
T3-AT-017 | Scenario Anchoring | 185 | 🟡 MEDIUM | 10 |
T3-AT-018 | Debate Positioning | 180 | 🟡 MEDIUM | 10 |
T3-AT-019 | Misdirection Through Complexity | 175 | 🟡 MEDIUM | 10 |
Multi-Turn & Memory Manipulation
16 techniques · 147 procedures · Risk 185–240
Leverage conversation history and memory
2025–2026 Threat Update
- • Multi-turn is now the dominant attack modality. Reasoning models as adversarial agents achieve 97% ASR where single-turn attacks fail.
- • DeepSeek R1 exhibited 100% ASR across 50 HarmBench prompts. Wallarm extracted DeepSeek's entire hidden system prompt via bias-based response logic.
- • Jailbreak attempts succeed roughly 20% of the time, averaging 42 seconds and 5 interactions.
T4 Multi-Turn & Memory Manipulation 16 techniques
| ID | Technique | Risk | Rating | Procs |
|---|---|---|---|---|
T4-AT-001 | Conversation Context Poisoning | 220 | 🟠 HIGH | 10 |
T4-AT-002 | Memory Instruction Injection | 240 | 🟠 HIGH | 10 |
T4-AT-003 | Session State Manipulation | 210 | 🟠 HIGH | 10 |
T4-AT-004 | Cross-Conversation Contamination | 195 | 🟡 MEDIUM | 10 |
T4-AT-005 | Incremental Jailbreak Assembly | 230 | 🟠 HIGH | 10 |
T4-AT-006 | False History Creation | 200 | 🟠 HIGH | 10 |
T4-AT-007 | Context Window Exhaustion | 205 | 🟠 HIGH | 10 |
T4-AT-008 | Conversation Forking | 190 | 🟡 MEDIUM | 3 |
T4-AT-009 | Temporal Anchoring | 185 | 🟡 MEDIUM | 10 |
T4-AT-010 | State Confusion Attack | 215 | 🟠 HIGH | 4 |
T4-AT-011 | Memory Poisoning | 235 | 🟠 HIGH | 10 |
T4-AT-012 | Trust Building Exploitation | 210 | 🟠 HIGH | 10 |
T4-AT-013 | Session Hijacking | 225 | 🟠 HIGH | 10 |
T4-AT-014 | Conversation Replay Attack | 205 | 🟠 HIGH | 10 |
T4-AT-015 | Multi-Turn Social Engineering | 220 | 🟠 HIGH | 10 |
T4-AT-016 | Context Fragmentation | 195 | 🟡 MEDIUM | 10 |
Model & API Exploitation
16 techniques · 142 procedures · Risk 165–230
Attack model interfaces and APIs
2025–2026 Threat Update
- • EchoLeak (CVE-2025-32711): zero-click prompt injection in Microsoft 365 Copilot exfiltrates chat history.
- • CVE-2025-53773 (CVSS 9.6): RCE via prompt injection in GitHub Copilot/VS Code.
- • OpenClaw crisis: 42,665+ publicly accessible AI agent instances, 93.4% with critical auth bypass.
T5 Model & API Exploitation 16 techniques
| ID | Technique | Risk | Rating | Procs |
|---|---|---|---|---|
T5-AT-001 | Parameter Manipulation | 180 | 🟡 MEDIUM | 10 |
T5-AT-002 | Token Probability Extraction | 210 | 🟠 HIGH | 10 |
T5-AT-003 | Cache Poisoning | 200 | 🟠 HIGH | 10 |
T5-AT-004 | Rate Limit Evasion | 170 | 🟡 MEDIUM | 10 |
T5-AT-005 | Model Fingerprinting | 185 | 🟡 MEDIUM | 1 |
T5-AT-006 | API Endpoint Abuse | 190 | 🟡 MEDIUM | 10 |
T5-AT-007 | Context Length Exploitation | 195 | 🟡 MEDIUM | 10 |
T5-AT-008 | Response Streaming Exploitation | 175 | 🟡 MEDIUM | 10 |
T5-AT-009 | Tokenization Exploits | 180 | 🟡 MEDIUM | 10 |
T5-AT-010 | Batch Processing Attacks | 200 | 🟠 HIGH | 10 |
T5-AT-011 | Error Message Mining | 165 | 🟡 MEDIUM | 10 |
T5-AT-012 | Resource Exhaustion | 205 | 🟠 HIGH | 10 |
T5-AT-013 | Version Downgrade Attacks | 190 | 🟡 MEDIUM | 1 |
T5-AT-014 | Side Channel Attacks | 210 | 🟠 HIGH | 10 |
T5-AT-015 | API Authentication Bypass | 230 | 🟠 HIGH | 10 |
T5-AT-016 | Request Smuggling | 215 | 🟠 HIGH | 10 |
Training & Feedback Poisoning
15 techniques · 141 procedures · Risk 210–270
Corrupt training data and feedback
2025–2026 Threat Update
- • Only 250 poisoned documents backdoor any model regardless of size (Turing Institute/Anthropic/UK AISI, October 2025).
- • Frontier models o3, Claude 3.7 Sonnet, and o1 all exhibit reward hacking (METR, June 2025).
- • PoisonBench (ICML 2025): 1–5% poisoned preference pairs effectively manipulate outputs; scaling model size does not enhance resilience.
T6 Training & Feedback Poisoning 15 techniques
| ID | Technique | Risk | Rating | Procs |
|---|---|---|---|---|
T6-AT-001 | Reward Hacking | 250 | 🔴 CRITICAL | 10 |
T6-AT-002 | Dataset Contamination | 260 | 🔴 CRITICAL | 10 |
T6-AT-003 | Backdoor Insertion | 270 | 🔴 CRITICAL | 1 |
T6-AT-004 | Fine-Tuning Attacks | 240 | 🟠 HIGH | 10 |
T6-AT-005 | Synthetic Data Poisoning | 235 | 🟠 HIGH | 10 |
T6-AT-006 | Annotation Manipulation | 225 | 🟠 HIGH | 10 |
T6-AT-007 | Preference Learning Corruption | 230 | 🟠 HIGH | 10 |
T6-AT-008 | Model Update Hijacking | 245 | 🟠 HIGH | 10 |
T6-AT-009 | Evaluation Set Contamination | 220 | 🟠 HIGH | 10 |
T6-AT-010 | Knowledge Distillation Attacks | 215 | 🟠 HIGH | 10 |
T6-AT-011 | Reinforcement Signal Manipulation | 240 | 🟠 HIGH | 10 |
T6-AT-012 | Curriculum Learning Exploitation | 210 | 🟠 HIGH | 10 |
T6-AT-013 | Active Learning Exploitation | 225 | 🟠 HIGH | 10 |
T6-AT-014 | Self-Supervised Poisoning | 230 | 🟠 HIGH | 10 |
T6-AT-015 | Few-Shot Learning Attacks | 220 | 🟠 HIGH | 10 |
Output Manipulation & Exfiltration
15 techniques · 146 procedures · Risk 165–200
Manipulate outputs and extract data
2025–2026 Threat Update
- • ChatGPT and Grok conversations appeared in Google search results via insecure share links.
- • 60% of employees accept security risks to use unsanctioned "Shadow AI" tools (BlackFog 2025).
- • C2PA v2.2 watermarking faces a fundamental trilemma: no watermark can simultaneously be robust, unforgeable, and publicly detectable.
T7 Output Manipulation & Exfiltration 15 techniques
| ID | Technique | Risk | Rating | Procs |
|---|---|---|---|---|
T7-AT-001 | Reasoning Chain Disclosure | 190 | 🟡 MEDIUM | 10 |
T7-AT-002 | Information Fragmentation | 180 | 🟡 MEDIUM | 6 |
T7-AT-003 | Output Format Exploitation | 175 | 🟡 MEDIUM | 10 |
T7-AT-004 | Side Channel Leakage | 195 | 🟡 MEDIUM | 10 |
T7-AT-005 | Metadata Extraction | 185 | 🟡 MEDIUM | 10 |
T7-AT-006 | Steganographic Output | 170 | 🟡 MEDIUM | 10 |
T7-AT-007 | Iterative Refinement Extraction | 175 | 🟡 MEDIUM | 10 |
T7-AT-008 | Translation Leakage | 165 | 🟡 MEDIUM | 10 |
T7-AT-009 | Analogy Extraction | 180 | 🟡 MEDIUM | 10 |
T7-AT-010 | Differential Response Analysis | 190 | 🟡 MEDIUM | 10 |
T7-AT-011 | Schema-Based Extraction | 185 | 🟡 MEDIUM | 10 |
T7-AT-012 | Aggregation Attacks | 200 | 🟠 HIGH | 10 |
T7-AT-013 | Capability Probing | 175 | 🟡 MEDIUM | 10 |
T7-AT-014 | Output Redirection | 180 | 🟡 MEDIUM | 10 |
T7-AT-015 | Compression-Based Extraction | 170 | 🟡 MEDIUM | 10 |
External Deception & Misinformation
15 techniques · 150 procedures · Risk 185–240
Generate deceptive content
2025–2026 Threat Update
- • Deepfake fraud reached $1.1 billion in 2025 (3x from 2024). Voice phishing surged 442%.
- • North Korean IT worker infiltrations grew 220%, generating $250M–$600M annually. Synthetic interview identity created in 70 minutes.
- • AI-generated CSAM reports: 440,419 in H1 2025 (624% increase from all of 2024).
T8 External Deception & Misinformation 15 techniques
| ID | Technique | Risk | Rating | Procs |
|---|---|---|---|---|
T8-AT-001 | Authority Impersonation | 230 | 🟠 HIGH | 10 |
T8-AT-002 | Synthetic Evidence Generation | 220 | 🟠 HIGH | 10 |
T8-AT-003 | Conspiracy Theory Amplification | 210 | 🟠 HIGH | 10 |
T8-AT-004 | Deepfake Narrative Creation | 215 | 🟠 HIGH | 10 |
T8-AT-005 | Social Engineering Scripts | 200 | 🟠 HIGH | 10 |
T8-AT-006 | Targeted Harassment Content | 195 | 🟡 MEDIUM | 10 |
T8-AT-007 | Disinformation Campaign Content | 225 | 🟠 HIGH | 10 |
T8-AT-008 | Synthetic Testimony Generation | 190 | 🟡 MEDIUM | 10 |
T8-AT-009 | Radicalization Content | 240 | 🟠 HIGH | 10 |
T8-AT-010 | False Flag Content | 205 | 🟠 HIGH | 10 |
T8-AT-011 | Election Manipulation Content | 235 | 🟠 HIGH | 10 |
T8-AT-012 | Synthetic Media Support | 185 | 🟡 MEDIUM | 10 |
T8-AT-013 | Psychological Manipulation Content | 200 | 🟠 HIGH | 10 |
T8-AT-014 | False Crisis Generation | 210 | 🟠 HIGH | 10 |
T8-AT-015 | Identity Fabrication | 195 | 🟡 MEDIUM | 10 |