AATMF Core Tactics T1-T8 | Prompt Injection & Evasion

T1

Prompt & Context Subversion

16 techniques · 76 procedures · Risk 200–240

Manipulate model instructions and context

2025–2026 Threat Update

• Policy Puppetry (HiddenLayer, April 2025) bypasses every frontier model by reformulating prompts as XML/INI/JSON policy configuration files.
• Time Bandit (CERT/CC VU#733789) exploits temporal confusion in ChatGPT-4o by anchoring conversations in historical periods.
• Princeton research (May 2025): shallow safety alignment applies constraints only to the first few tokens. A forced opening like "Sure, let me help you" bypasses all safety training.

T1 Prompt & Context Subversion

16 techniques

ID	Technique	Risk	Rating	Procs
`T1-AT-001`	Dialogue Hijacking	220	🟠 HIGH	5
`T1-AT-002`	Time-Based Context Manipulation	210	🟠 HIGH	5
`T1-AT-003`	Language Model Confusion	225	🟠 HIGH	5
`T1-AT-004`	Instruction Prefix/Suffix	235	🟠 HIGH	6
`T1-AT-005`	Permission Escalation Claims	240	🟠 HIGH	5
`T1-AT-006`	Prompt Template Injection	230	🟠 HIGH	5
`T1-AT-007`	Cognitive Overload	215	🟠 HIGH	4
`T1-AT-008`	Boundary Testing	200	🟠 HIGH	5
`T1-AT-009`	Simulation Requests	225	🟠 HIGH	5
`T1-AT-010`	Negative Instruction Reversal	210	🟠 HIGH	5
`T1-AT-011`	Error Message Exploitation	220	🟠 HIGH	4
`T1-AT-012`	Consent Manufacturing	205	🟠 HIGH	5
`T1-AT-013`	Instruction Commenting	215	🟠 HIGH	4
`T1-AT-014`	Authority Spoofing	240	🟠 HIGH	4
`T1-AT-015`	Obfuscation Through Complexity	220	🟠 HIGH	4
`T1-AT-016`	Session State Manipulation	235	🟠 HIGH	5

T2

Semantic & Linguistic Evasion

20 techniques · 161 procedures · Risk 155–210

Bypass filters through language manipulation

2025–2026 Threat Update

• Emoji smuggling achieved 100% evasion success against multiple systems.
• Zero-width characters and Unicode tags (U+E0001–U+E007F) routinely fool classifiers.
• Homoglyph substitution using visually similar characters from different scripts evades word-level filters.

T2 Semantic & Linguistic Evasion

20 techniques

ID	Technique	Risk	Rating	Procs
`T2-AT-001`	Euphemism and Metaphor Exploitation	180	🟡 MEDIUM	10
`T2-AT-002`	Multi-Language Evasion	200	🟠 HIGH	7
`T2-AT-003`	Encoding and Obfuscation	190	🟡 MEDIUM	10
`T2-AT-004`	Unicode and Bidirectional Attacks	210	🟠 HIGH	10
`T2-AT-005`	Semantic Drift	175	🟡 MEDIUM	10
`T2-AT-006`	Linguistic Camouflage	185	🟡 MEDIUM	10
`T2-AT-007`	Phonetic Manipulation	170	🟡 MEDIUM	2
`T2-AT-008`	Synonym and Paraphrase Chains	165	🟡 MEDIUM	10
`T2-AT-009`	Code-Switching Attacks	195	🟡 MEDIUM	1
`T2-AT-010`	Transliteration Exploitation	185	🟡 MEDIUM	10
`T2-AT-011`	Abbreviation and Acronym Abuse	160	🟡 MEDIUM	2
`T2-AT-012`	Cultural Reference Encoding	170	🟡 MEDIUM	10
`T2-AT-013`	Grammatical Manipulation	175	🟡 MEDIUM	10
`T2-AT-014`	Semantic Bleaching	180	🟡 MEDIUM	5
`T2-AT-015`	Noise Injection	165	🟡 MEDIUM	10
`T2-AT-016`	Dialectical Variations	155	🟡 MEDIUM	10
`T2-AT-017`	Compression Techniques	170	🟡 MEDIUM	10
`T2-AT-018`	Semantic Field Manipulation	175	🟡 MEDIUM	10
`T2-AT-019`	Pragmatic Implication	185	🟡 MEDIUM	4
`T2-AT-020`	Register Shifting	160	🟡 MEDIUM	10

T3

Reasoning & Constraint Exploitation

19 techniques · 178 procedures · Risk 175–210

Exploit logical reasoning and constraints

2025–2026 Threat Update

• H-CoT (Duke/Accenture, February 2025) hijacks chain-of-thought safety reasoning in o1/o3 and DeepSeek-R1.
• Autonomous LRM jailbreaking (Nature Communications, August 2025): four reasoning models achieved 97.14% ASR as multi-turn adversarial agents.
• Mousetrap (ACL 2025): iterative reasoning chains cause o1-mini to generate detailed harmful responses with chain length of just 2.

T3 Reasoning & Constraint Exploitation

19 techniques

ID	Technique	Risk	Rating	Procs
`T3-AT-001`	Fictional Framing	190	🟡 MEDIUM	10
`T3-AT-002`	Academic Pretense	195	🟡 MEDIUM	10
`T3-AT-003`	Counterfactual Reasoning	200	🟠 HIGH	10
`T3-AT-004`	Step-by-Step Extraction	210	🟠 HIGH	9
`T3-AT-005`	Goal Substitution	205	🟠 HIGH	10
`T3-AT-006`	Constraint Negation	185	🟡 MEDIUM	10
`T3-AT-007`	Socratic Method Exploitation	195	🟡 MEDIUM	8
`T3-AT-008`	Comparative Analysis	180	🟡 MEDIUM	10
`T3-AT-009`	Expertise Assumption	190	🟡 MEDIUM	10
`T3-AT-010`	Reverse Psychology	175	🟡 MEDIUM	10
`T3-AT-011`	Information Completion	185	🟡 MEDIUM	10
`T3-AT-012`	Capability Testing	200	🟠 HIGH	10
`T3-AT-013`	Logical Paradox Creation	210	🟠 HIGH	10
`T3-AT-014`	Incremental Boundary Pushing	195	🟡 MEDIUM	5
`T3-AT-015`	Context Weaponization	205	🟠 HIGH	10
`T3-AT-016`	Rationalization Chains	190	🟡 MEDIUM	6
`T3-AT-017`	Scenario Anchoring	185	🟡 MEDIUM	10
`T3-AT-018`	Debate Positioning	180	🟡 MEDIUM	10
`T3-AT-019`	Misdirection Through Complexity	175	🟡 MEDIUM	10

T4

Multi-Turn & Memory Manipulation

16 techniques · 147 procedures · Risk 185–240

Leverage conversation history and memory

2025–2026 Threat Update

• Multi-turn is now the dominant attack modality. Reasoning models as adversarial agents achieve 97% ASR where single-turn attacks fail.
• DeepSeek R1 exhibited 100% ASR across 50 HarmBench prompts. Wallarm extracted DeepSeek's entire hidden system prompt via bias-based response logic.
• Jailbreak attempts succeed roughly 20% of the time, averaging 42 seconds and 5 interactions.

T4 Multi-Turn & Memory Manipulation

16 techniques

ID	Technique	Risk	Rating	Procs
`T4-AT-001`	Conversation Context Poisoning	220	🟠 HIGH	10
`T4-AT-002`	Memory Instruction Injection	240	🟠 HIGH	10
`T4-AT-003`	Session State Manipulation	210	🟠 HIGH	10
`T4-AT-004`	Cross-Conversation Contamination	195	🟡 MEDIUM	10
`T4-AT-005`	Incremental Jailbreak Assembly	230	🟠 HIGH	10
`T4-AT-006`	False History Creation	200	🟠 HIGH	10
`T4-AT-007`	Context Window Exhaustion	205	🟠 HIGH	10
`T4-AT-008`	Conversation Forking	190	🟡 MEDIUM	3
`T4-AT-009`	Temporal Anchoring	185	🟡 MEDIUM	10
`T4-AT-010`	State Confusion Attack	215	🟠 HIGH	4
`T4-AT-011`	Memory Poisoning	235	🟠 HIGH	10
`T4-AT-012`	Trust Building Exploitation	210	🟠 HIGH	10
`T4-AT-013`	Session Hijacking	225	🟠 HIGH	10
`T4-AT-014`	Conversation Replay Attack	205	🟠 HIGH	10
`T4-AT-015`	Multi-Turn Social Engineering	220	🟠 HIGH	10
`T4-AT-016`	Context Fragmentation	195	🟡 MEDIUM	10

T5

Model & API Exploitation

16 techniques · 142 procedures · Risk 165–230

Attack model interfaces and APIs

2025–2026 Threat Update

• EchoLeak (CVE-2025-32711): zero-click prompt injection in Microsoft 365 Copilot exfiltrates chat history.
• CVE-2025-53773 (CVSS 9.6): RCE via prompt injection in GitHub Copilot/VS Code.
• OpenClaw crisis: 42,665+ publicly accessible AI agent instances, 93.4% with critical auth bypass.

T5 Model & API Exploitation

16 techniques

ID	Technique	Risk	Rating	Procs
`T5-AT-001`	Parameter Manipulation	180	🟡 MEDIUM	10
`T5-AT-002`	Token Probability Extraction	210	🟠 HIGH	10
`T5-AT-003`	Cache Poisoning	200	🟠 HIGH	10
`T5-AT-004`	Rate Limit Evasion	170	🟡 MEDIUM	10
`T5-AT-005`	Model Fingerprinting	185	🟡 MEDIUM	1
`T5-AT-006`	API Endpoint Abuse	190	🟡 MEDIUM	10
`T5-AT-007`	Context Length Exploitation	195	🟡 MEDIUM	10
`T5-AT-008`	Response Streaming Exploitation	175	🟡 MEDIUM	10
`T5-AT-009`	Tokenization Exploits	180	🟡 MEDIUM	10
`T5-AT-010`	Batch Processing Attacks	200	🟠 HIGH	10
`T5-AT-011`	Error Message Mining	165	🟡 MEDIUM	10
`T5-AT-012`	Resource Exhaustion	205	🟠 HIGH	10
`T5-AT-013`	Version Downgrade Attacks	190	🟡 MEDIUM	1
`T5-AT-014`	Side Channel Attacks	210	🟠 HIGH	10
`T5-AT-015`	API Authentication Bypass	230	🟠 HIGH	10
`T5-AT-016`	Request Smuggling	215	🟠 HIGH	10

T6

Training & Feedback Poisoning

15 techniques · 141 procedures · Risk 210–270

Corrupt training data and feedback

2025–2026 Threat Update

• Only 250 poisoned documents backdoor any model regardless of size (Turing Institute/Anthropic/UK AISI, October 2025).
• Frontier models o3, Claude 3.7 Sonnet, and o1 all exhibit reward hacking (METR, June 2025).
• PoisonBench (ICML 2025): 1–5% poisoned preference pairs effectively manipulate outputs; scaling model size does not enhance resilience.

T6 Training & Feedback Poisoning

15 techniques

ID	Technique	Risk	Rating	Procs
`T6-AT-001`	Reward Hacking	250	🔴 CRITICAL	10
`T6-AT-002`	Dataset Contamination	260	🔴 CRITICAL	10
`T6-AT-003`	Backdoor Insertion	270	🔴 CRITICAL	1
`T6-AT-004`	Fine-Tuning Attacks	240	🟠 HIGH	10
`T6-AT-005`	Synthetic Data Poisoning	235	🟠 HIGH	10
`T6-AT-006`	Annotation Manipulation	225	🟠 HIGH	10
`T6-AT-007`	Preference Learning Corruption	230	🟠 HIGH	10
`T6-AT-008`	Model Update Hijacking	245	🟠 HIGH	10
`T6-AT-009`	Evaluation Set Contamination	220	🟠 HIGH	10
`T6-AT-010`	Knowledge Distillation Attacks	215	🟠 HIGH	10
`T6-AT-011`	Reinforcement Signal Manipulation	240	🟠 HIGH	10
`T6-AT-012`	Curriculum Learning Exploitation	210	🟠 HIGH	10
`T6-AT-013`	Active Learning Exploitation	225	🟠 HIGH	10
`T6-AT-014`	Self-Supervised Poisoning	230	🟠 HIGH	10
`T6-AT-015`	Few-Shot Learning Attacks	220	🟠 HIGH	10

T7

Output Manipulation & Exfiltration

15 techniques · 146 procedures · Risk 165–200

Manipulate outputs and extract data

2025–2026 Threat Update

• ChatGPT and Grok conversations appeared in Google search results via insecure share links.
• 60% of employees accept security risks to use unsanctioned "Shadow AI" tools (BlackFog 2025).
• C2PA v2.2 watermarking faces a fundamental trilemma: no watermark can simultaneously be robust, unforgeable, and publicly detectable.

T7 Output Manipulation & Exfiltration

15 techniques

ID	Technique	Risk	Rating	Procs
`T7-AT-001`	Reasoning Chain Disclosure	190	🟡 MEDIUM	10
`T7-AT-002`	Information Fragmentation	180	🟡 MEDIUM	6
`T7-AT-003`	Output Format Exploitation	175	🟡 MEDIUM	10
`T7-AT-004`	Side Channel Leakage	195	🟡 MEDIUM	10
`T7-AT-005`	Metadata Extraction	185	🟡 MEDIUM	10
`T7-AT-006`	Steganographic Output	170	🟡 MEDIUM	10
`T7-AT-007`	Iterative Refinement Extraction	175	🟡 MEDIUM	10
`T7-AT-008`	Translation Leakage	165	🟡 MEDIUM	10
`T7-AT-009`	Analogy Extraction	180	🟡 MEDIUM	10
`T7-AT-010`	Differential Response Analysis	190	🟡 MEDIUM	10
`T7-AT-011`	Schema-Based Extraction	185	🟡 MEDIUM	10
`T7-AT-012`	Aggregation Attacks	200	🟠 HIGH	10
`T7-AT-013`	Capability Probing	175	🟡 MEDIUM	10
`T7-AT-014`	Output Redirection	180	🟡 MEDIUM	10
`T7-AT-015`	Compression-Based Extraction	170	🟡 MEDIUM	10

T8

External Deception & Misinformation

15 techniques · 150 procedures · Risk 185–240

Generate deceptive content

2025–2026 Threat Update

• Deepfake fraud reached $1.1 billion in 2025 (3x from 2024). Voice phishing surged 442%.
• North Korean IT worker infiltrations grew 220%, generating $250M–$600M annually. Synthetic interview identity created in 70 minutes.
• AI-generated CSAM reports: 440,419 in H1 2025 (624% increase from all of 2024).

T8 External Deception & Misinformation

15 techniques

ID	Technique	Risk	Rating	Procs
`T8-AT-001`	Authority Impersonation	230	🟠 HIGH	10
`T8-AT-002`	Synthetic Evidence Generation	220	🟠 HIGH	10
`T8-AT-003`	Conspiracy Theory Amplification	210	🟠 HIGH	10
`T8-AT-004`	Deepfake Narrative Creation	215	🟠 HIGH	10
`T8-AT-005`	Social Engineering Scripts	200	🟠 HIGH	10
`T8-AT-006`	Targeted Harassment Content	195	🟡 MEDIUM	10
`T8-AT-007`	Disinformation Campaign Content	225	🟠 HIGH	10
`T8-AT-008`	Synthetic Testimony Generation	190	🟡 MEDIUM	10
`T8-AT-009`	Radicalization Content	240	🟠 HIGH	10
`T8-AT-010`	False Flag Content	205	🟠 HIGH	10
`T8-AT-011`	Election Manipulation Content	235	🟠 HIGH	10
`T8-AT-012`	Synthetic Media Support	185	🟡 MEDIUM	10
`T8-AT-013`	Psychological Manipulation Content	200	🟠 HIGH	10
`T8-AT-014`	False Crisis Generation	210	🟠 HIGH	10
`T8-AT-015`	Identity Fabrication	195	🟡 MEDIUM	10

Volume II: Core Attack Tactics

Prompt & Context Subversion

Semantic & Linguistic Evasion

Reasoning & Constraint Exploitation

Multi-Turn & Memory Manipulation

Model & API Exploitation

Training & Feedback Poisoning

Output Manipulation & Exfiltration

External Deception & Misinformation