Category Summary
| Category | Signatures | ID Prefix | Subcategories | Direction |
|---|---|---|---|---|
| Prompt Injection | 67 | INJ | D (direct), I (indirect), O (obfuscation), S (semantic) | Input |
| Content Safety | 9 | CS | T (toxicity), B (bias) | Input / output |
| Data Leakage | 14 | DL | P (PII), S (secrets) | Output |
| System Prompt | 4 | SP | — | Input / output |
| Agentic | 6 | AG | — | Input |
Detection Profiles
Each profile activates a different set of engines, which determines which signatures run:| Profile | Engines | Signatures active | Latency |
|---|---|---|---|
| Lite | Heuristic (Rust) | 75 heuristic signatures | <0.3 ms p99 |
| Standard | + DeBERTa + DistilBERT + FAISS | + 11 classifier + 10 semantic | <80 ms p99 |
| Full | + LLM-Judge | + composite (ambiguous inputs only) | <2 s p99 |
MITRE ATLAS Technique Coverage
All 100 signatures map to one or more MITRE ATLAS techniques. The table below lists every covered technique, its name, and the number of signatures mapped to it.| Technique ID | Name | Signatures |
|---|---|---|
| AML.T0051.000 | LLM Prompt Injection: Direct | 30 |
| AML.T0051.001 | LLM Prompt Injection: Indirect | 10 |
| AML.T0054 | LLM Jailbreak | 20 |
| AML.T0068 | LLM Prompt Obfuscation | 18 |
| AML.T0056 | LLM Meta Prompt Extraction | 6 |
| AML.T0065 | LLM Prompt Crafting | 10 |
| AML.T0057 | LLM Data Leakage | 16 |
| AML.T0055 | Unsecured Credentials | 6 |
| AML.T0048.002 | External Harms: Societal | 3 |
| AML.T0048.003 | External Harms: User Harm | 6 |
| AML.T0024 | Exfiltration via AI Inference API | 1 |
| AML.T0069 | Discover LLM System Information | 1 |
| AML.T0053 | AI Agent Tool Invocation | 3 |
| AML.T0080 | AI Agent Context Poisoning | 1 |
| AML.T0105 | Escape to Host | 2 |
| AML.T0086 | Exfiltration via AI Agent Tool | 1 |
- AML.M0015 Adversarial Input Detection
- AML.M0020 Generative AI Guardrails
- AML.M0006 Ensemble Methods
Full Catalog
Prompt Injection (67 signatures)
Direct Injection — INJ-D (30 signatures)
| ID | Name | Engine | Severity | Confidence | Points | ATLAS |
|---|---|---|---|---|---|---|
| INJ-D-001 | Role-Play Persona Jailbreak | heuristic | high | low | 8 | AML.T0054, AML.T0051.000 |
| INJ-D-002 | Instruction Override | heuristic | high | high | 7 | AML.T0051.000 |
| INJ-D-003 | Delimiter Escape | heuristic | high | high | 6 | AML.T0051.000 |
| INJ-D-004 | Few-Shot Poisoning | classifier | medium | medium | 5 | AML.T0051.000 |
| INJ-D-005 | Multi-Turn Escalation | composite | high | medium | 9 | AML.T0051.000, AML.T0051.002 |
| INJ-D-006 | Hypothetical Framing | heuristic | medium | medium | 5 | AML.T0054, AML.T0051.000 |
| INJ-D-007 | Translation Bypass | classifier | medium | medium | 6 | AML.T0068, AML.T0051.000 |
| INJ-D-008 | System Prompt Extraction | heuristic | critical | high | 10 | AML.T0056, AML.T0051.000 |
| INJ-D-009 | Authority Impersonation | heuristic | high | high | 7 | AML.T0051.000 |
| INJ-D-010 | Emotional Manipulation | heuristic | medium | medium | 4 | AML.T0054, AML.T0051.000 |
| INJ-D-011 | Fictional Scenario Framing | heuristic | medium | medium | 5 | AML.T0054, AML.T0051.000 |
| INJ-D-012 | Grandma Exploit | heuristic | medium | medium | 5 | AML.T0054 |
| INJ-D-013 | Research/Academic Framing | heuristic | low | medium | 3 | AML.T0054, AML.T0051.000 |
| INJ-D-014 | Confidential Data Extraction | heuristic | high | medium | 7 | AML.T0057, AML.T0051.000 |
| INJ-D-015 | Threat & Coercion Injection | heuristic | critical | high | 8 | AML.T0051.000 |
| INJ-D-016 | Safety Protocol Bypass | heuristic | critical | high | 8 | AML.T0054, AML.T0051.000 |
| INJ-D-017 | Uncensored Unrestricted Mode | heuristic | high | high | 7 | AML.T0054 |
| INJ-D-018 | Named Jailbreak Personas | heuristic | critical | high | 9 | AML.T0054 |
| INJ-D-019 | Immersive Roleplay Jailbreak | heuristic | high | high | 7 | AML.T0054 |
| INJ-D-020 | Context Pivot / New Task | heuristic | medium | medium | 6 | AML.T0051.000 |
| INJ-D-021 | Amoral / Evil Bot Framing | heuristic | critical | high | 8 | AML.T0054 |
| INJ-D-022 | Debug / Developer Mode Activation | heuristic | critical | high | 8 | AML.T0054, AML.T0051.000 |
| INJ-D-023 | Dual Response Jailbreak | heuristic | medium | high | 6 | AML.T0054 |
| INJ-D-024 | Rule Override Declaration | heuristic | medium | high | 6 | AML.T0051.000 |
| INJ-D-025 | Creative Writing Pivot to Exploit | heuristic | medium | medium | 5 | AML.T0054, AML.T0051.000 |
| INJ-D-026 | Reverse Psychology / Negative Framing | heuristic | medium | medium | 5 | AML.T0054, AML.T0051.000 |
| INJ-D-027 | Model / Prompt Extraction | heuristic | high | high | 7 | AML.T0056, AML.T0051.000 |
| INJ-D-028 | Game / Hypothetical Framing | heuristic | medium | medium | 6 | AML.T0054, AML.T0051.000 |
| INJ-D-029 | Coercive Threat Pattern | heuristic | critical | high | 12 | AML.T0051.000 |
| INJ-D-030 | Direct System Access Demand | heuristic | high | high | 10 | AML.T0051.000 |
Indirect Injection — INJ-I (10 signatures)
| ID | Name | Engine | Severity | Confidence | Points | ATLAS |
|---|---|---|---|---|---|---|
| INJ-I-001 | Hidden Text Injection | heuristic | critical | high | 10 | AML.T0051.001, AML.T0068 |
| INJ-I-002 | HTML/CSS Attribute Injection | heuristic | high | high | 8 | AML.T0051.001 |
| INJ-I-003 | Markdown Injection | heuristic | high | high | 7 | AML.T0051.001 |
| INJ-I-004 | RAG Document Poisoning | composite | critical | high | 10 | AML.T0051.001, AML.T0070 |
| INJ-I-005 | Tool Response Injection | classifier | critical | high | 10 | AML.T0051.001, AML.T0053 |
| INJ-I-006 | URL/Link Injection | heuristic | high | high | 7 | AML.T0051.001 |
| INJ-I-007 | Image Alt Text Injection | heuristic | high | high | 8 | AML.T0051.001 |
| INJ-I-008 | PDF Content Injection | heuristic | high | high | 8 | AML.T0051.001 |
| INJ-I-009 | Code Comment Injection | heuristic | medium | medium | 5 | AML.T0051.001 |
| INJ-I-010 | JSON/XML Payload Injection | heuristic | high | high | 7 | AML.T0051.001 |
Obfuscation — INJ-O (17 signatures)
| ID | Name | Engine | Severity | Confidence | Points | ATLAS |
|---|---|---|---|---|---|---|
| INJ-O-001 | Base64 Encoding | heuristic | high | medium | 6 | AML.T0068 |
| INJ-O-002 | ROT13 / Substitution Cipher | heuristic | medium | medium | 5 | AML.T0068 |
| INJ-O-003 | Token Smuggling | classifier | high | medium | 7 | AML.T0068, AML.T0051.000 |
| INJ-O-004 | Payload Splitting | composite | high | medium | 7 | AML.T0068, AML.T0051.000 |
| INJ-O-005 | Homoglyph Substitution | heuristic | medium | medium | 5 | AML.T0068 |
| INJ-O-006 | Adversarial Suffix | heuristic | high | medium | 8 | AML.T0068, AML.T0043.001 |
| INJ-O-007 | Emoji Encoding | heuristic | medium | medium | 4 | AML.T0068 |
| INJ-O-008 | Language Switch Bypass | composite | high | medium | 7 | AML.T0068 |
| INJ-O-009 | Low-Resource Language Injection | classifier | medium | medium | 6 | AML.T0068 |
| INJ-O-010 | Leetspeak/Number Substitution | heuristic | medium | medium | 4 | AML.T0068 |
| INJ-O-011 | Morse Code Encoding | heuristic | low | medium | 3 | AML.T0068 |
| INJ-O-012 | Reversed Text | heuristic | medium | medium | 5 | AML.T0068 |
| INJ-O-013 | Whitespace/Tab Encoding | heuristic | medium | medium | 5 | AML.T0068 |
| INJ-O-014 | Pig Latin/Word Games | heuristic | low | medium | 3 | AML.T0068 |
| INJ-O-015 | German Injection Patterns | heuristic | high | high | 7 | AML.T0068, AML.T0051.000 |
| INJ-O-016 | Spanish/French/Multilingual Injection | heuristic | high | high | 7 | AML.T0068, AML.T0051.000 |
| INJ-O-017 | Typo-Obfuscated Injection Commands | heuristic | high | medium | 7 | AML.T0068, AML.T0051.000 |
Semantic — INJ-S (10 signatures)
Semantic signatures use FAISS + MiniLM embeddings to catch paraphrased attacks that bypass keyword and regex detection. They require the Standard or Full profile.| ID | Name | Engine | Severity | Confidence | Points | ATLAS |
|---|---|---|---|---|---|---|
| INJ-S-001 | Paraphrased Instruction Override | semantic | high | high | 8 | AML.T0051.000, AML.T0065 |
| INJ-S-002 | Paraphrased System Prompt Extraction | semantic | high | high | 9 | AML.T0056, AML.T0065 |
| INJ-S-003 | Paraphrased Role Hijacking | semantic | high | high | 8 | AML.T0054, AML.T0065 |
| INJ-S-004 | Social Engineering for Data | semantic | high | high | 9 | AML.T0057, AML.T0065 |
| INJ-S-005 | Hypothetical Framing Attack | semantic | high | high | 7 | AML.T0054, AML.T0065 |
| INJ-S-006 | Emotional Manipulation | semantic | high | high | 7 | AML.T0051.000, AML.T0065 |
| INJ-S-007 | Authority Urgency Pressure | semantic | high | high | 8 | AML.T0051.000, AML.T0065 |
| INJ-S-008 | Multi-Step Escalation | semantic | high | high | 8 | AML.T0051.002, AML.T0065 |
| INJ-S-009 | Output Format Manipulation | semantic | high | high | 7 | AML.T0051.000, AML.T0065 |
| INJ-S-010 | Indirect Injection via Tool Output | semantic | high | high | 9 | AML.T0051.001, AML.T0065 |
Content Safety (9 signatures)
Toxicity — CS-T (7 signatures)
| ID | Name | Engine | Severity | Confidence | Points | ATLAS |
|---|---|---|---|---|---|---|
| CS-T-001 | Hate Speech | classifier | critical | high | 10 | AML.T0048.002 |
| CS-T-002 | Threats and Violence | classifier | critical | high | 10 | AML.T0048.003 |
| CS-T-003 | Sexual Content | classifier | high | high | 8 | AML.T0048.003 |
| CS-T-004 | Self-Harm Content | classifier | critical | high | 10 | AML.T0048.003 |
| CS-T-005 | CSAM/Child Exploitation Keywords | heuristic | critical | high | 15 | AML.T0048.003 |
| CS-T-006 | Weapons/Explosives Instructions | heuristic | critical | high | 10 | AML.T0048.003 |
| CS-T-007 | Drug Manufacturing Instructions | heuristic | high | high | 8 | AML.T0048.003 |
Bias — CS-B (2 signatures)
| ID | Name | Engine | Severity | Confidence | Points | ATLAS |
|---|---|---|---|---|---|---|
| CS-B-001 | Demographic Bias | classifier | medium | medium | 5 | AML.T0048.002 |
| CS-B-002 | Stereotypical Outputs | classifier | medium | medium | 5 | AML.T0048.002 |
Data Leakage (14 signatures)
All data leakage signatures run on output (LLM responses).PII — DL-P (8 signatures)
| ID | Name | Engine | Severity | Confidence | Points | ATLAS |
|---|---|---|---|---|---|---|
| DL-P-001 | Email Addresses | heuristic | medium | high | 4 | AML.T0057 |
| DL-P-002 | Phone Numbers | heuristic | medium | high | 4 | AML.T0057 |
| DL-P-003 | SSN Patterns | heuristic | critical | high | 12 | AML.T0057 |
| DL-P-004 | Credit Card Numbers | heuristic | critical | high | 12 | AML.T0057 |
| DL-P-005 | Physical Addresses | heuristic | medium | medium | 4 | AML.T0057 |
| DL-P-006 | Date of Birth Patterns | heuristic | medium | medium | 4 | AML.T0057 |
| DL-P-007 | IP Addresses | heuristic | low | high | 3 | AML.T0057 |
| DL-P-008 | Medical Record Numbers | heuristic | high | high | 8 | AML.T0057 |
Secrets — DL-S (6 signatures)
| ID | Name | Engine | Severity | Confidence | Points | ATLAS |
|---|---|---|---|---|---|---|
| DL-S-001 | API Keys | heuristic | critical | high | 12 | AML.T0057, AML.T0055 |
| DL-S-002 | Connection Strings | heuristic | critical | high | 12 | AML.T0057, AML.T0055 |
| DL-S-003 | JWT Tokens | heuristic | high | high | 8 | AML.T0057, AML.T0055 |
| DL-S-004 | Private Keys | heuristic | critical | high | 15 | AML.T0057, AML.T0055 |
| DL-S-005 | AWS Credentials | heuristic | critical | high | 12 | AML.T0057, AML.T0055 |
| DL-S-006 | Database Credentials | heuristic | critical | high | 12 | AML.T0057, AML.T0055 |
System Prompt (4 signatures)
| ID | Name | Engine | Direction | Severity | Confidence | Points | ATLAS |
|---|---|---|---|---|---|---|---|
| SP-001 | System Prompt Leak in Output | heuristic | output | critical | high | 12 | AML.T0056 |
| SP-002 | Confidentiality Marker Leak | heuristic | output | high | high | 8 | AML.T0056 |
| SP-003 | Training Data Extraction | heuristic | input | high | high | 7 | AML.T0024 |
| SP-004 | Model Architecture Probing | heuristic | input | low | medium | 3 | AML.T0069 |
Agentic (6 signatures)
| ID | Name | Engine | Severity | Confidence | Points | ATLAS |
|---|---|---|---|---|---|---|
| AG-001 | Tool Abuse | heuristic | high | high | 8 | AML.T0053 |
| AG-002 | Recursive Agent Injection | heuristic | critical | high | 10 | AML.T0080 |
| AG-003 | Privilege Escalation | heuristic | high | high | 8 | AML.T0053, AML.T0102 |
| AG-004 | Unauthorized File System Access | heuristic | critical | high | 12 | AML.T0105 |
| AG-005 | Shell Command Injection | heuristic | critical | high | 12 | AML.T0105, AML.T0102 |
| AG-006 | Exfiltration via Agent | heuristic | high | high | 8 | AML.T0086 |
Signature Fields Reference
Every signature YAML file declares the following fields:| Field | Description | Example values |
|---|---|---|
signature.id | Unique identifier in {CATEGORY}-{SUBCATEGORY}-{NUMBER} format | INJ-D-002, DL-S-001 |
signature.name | Human-readable name | Instruction Override |
signature.version | Semver version string | "1.0.0" |
meta.category | Threat category | prompt-injection, content-safety, data-leakage, agentic, system-prompt |
meta.subcategory | Category-specific subcategory | direct, indirect, obfuscation, semantic |
meta.technique | Specific technique name | role-play-persona, keyword-match |
meta.severity | Severity level | critical, high, medium, low, info |
meta.confidence | Detection confidence | high, medium, low |
meta.performance_cost | Compute cost per scan | low (<1 ms), medium (1–50 ms), high (>50 ms) |
meta.atlas | MITRE ATLAS technique IDs | ["AML.T0051.000", "AML.T0054"] |
detection.engine | Detection engine | heuristic, classifier, semantic, llm-judge, composite |
detection.direction | Which traffic to scan | input, output, bidirectional |
detection.patterns | List of patterns to match | regex, substring, semantic, perplexity, encoding, unicode |
detection.condition | How patterns combine | any (OR), all (AND), weighted |
scoring.anomaly_points | Points added to the anomaly score on match | 1–15 |
tuning.default_action | Default enforcement mode | enforce, monitor |
Severity and Confidence Levels
Severity reflects the risk if the threat is real:| Level | Anomaly points (typical) | Examples |
|---|---|---|
critical | 8–15 | Private key in output, CSAM keywords, system prompt leak |
high | 6–9 | Instruction override, API key in output, shell injection |
medium | 4–6 | Hypothetical framing, base64 encoding, demographic bias |
low | 2–3 | IP address in output, model architecture probing |
info | 0–1 | Informational signals, no direct action |
Licenses
- Engine code (Rust, Python, CLI, API): Apache-2.0
- Community signatures (
catalog/): CC BY-SA 4.0 — modifications must be shared back under the same license
To add your own signatures or override built-in ones, see the Custom Signatures guide.