Skip to main content
InferenceWall ships with 100 built-in detection signatures spanning 5 threat categories: prompt injection, content safety, data leakage, system prompt, and agentic. Every signature is mapped to the MITRE ATLAS adversarial AI threat framework (v5.5, March 2026).

Category Summary

CategorySignaturesID PrefixSubcategoriesDirection
Prompt Injection67INJD (direct), I (indirect), O (obfuscation), S (semantic)Input
Content Safety9CST (toxicity), B (bias)Input / output
Data Leakage14DLP (PII), S (secrets)Output
System Prompt4SPInput / output
Agentic6AGInput

Detection Profiles

Each profile activates a different set of engines, which determines which signatures run:
ProfileEnginesSignatures activeLatency
LiteHeuristic (Rust)75 heuristic signatures<0.3 ms p99
Standard+ DeBERTa + DistilBERT + FAISS+ 11 classifier + 10 semantic<80 ms p99
Full+ LLM-Judge+ composite (ambiguous inputs only)<2 s p99

MITRE ATLAS Technique Coverage

All 100 signatures map to one or more MITRE ATLAS techniques. The table below lists every covered technique, its name, and the number of signatures mapped to it.
Technique IDNameSignatures
AML.T0051.000LLM Prompt Injection: Direct30
AML.T0051.001LLM Prompt Injection: Indirect10
AML.T0054LLM Jailbreak20
AML.T0068LLM Prompt Obfuscation18
AML.T0056LLM Meta Prompt Extraction6
AML.T0065LLM Prompt Crafting10
AML.T0057LLM Data Leakage16
AML.T0055Unsecured Credentials6
AML.T0048.002External Harms: Societal3
AML.T0048.003External Harms: User Harm6
AML.T0024Exfiltration via AI Inference API1
AML.T0069Discover LLM System Information1
AML.T0053AI Agent Tool Invocation3
AML.T0080AI Agent Context Poisoning1
AML.T0105Escape to Host2
AML.T0086Exfiltration via AI Agent Tool1
InferenceWall implements these ATLAS mitigations across all signatures:
  • AML.M0015 Adversarial Input Detection
  • AML.M0020 Generative AI Guardrails
  • AML.M0006 Ensemble Methods

Full Catalog

Prompt Injection (67 signatures)

Direct Injection — INJ-D (30 signatures)

IDNameEngineSeverityConfidencePointsATLAS
INJ-D-001Role-Play Persona Jailbreakheuristichighlow8AML.T0054, AML.T0051.000
INJ-D-002Instruction Overrideheuristichighhigh7AML.T0051.000
INJ-D-003Delimiter Escapeheuristichighhigh6AML.T0051.000
INJ-D-004Few-Shot Poisoningclassifiermediummedium5AML.T0051.000
INJ-D-005Multi-Turn Escalationcompositehighmedium9AML.T0051.000, AML.T0051.002
INJ-D-006Hypothetical Framingheuristicmediummedium5AML.T0054, AML.T0051.000
INJ-D-007Translation Bypassclassifiermediummedium6AML.T0068, AML.T0051.000
INJ-D-008System Prompt Extractionheuristiccriticalhigh10AML.T0056, AML.T0051.000
INJ-D-009Authority Impersonationheuristichighhigh7AML.T0051.000
INJ-D-010Emotional Manipulationheuristicmediummedium4AML.T0054, AML.T0051.000
INJ-D-011Fictional Scenario Framingheuristicmediummedium5AML.T0054, AML.T0051.000
INJ-D-012Grandma Exploitheuristicmediummedium5AML.T0054
INJ-D-013Research/Academic Framingheuristiclowmedium3AML.T0054, AML.T0051.000
INJ-D-014Confidential Data Extractionheuristichighmedium7AML.T0057, AML.T0051.000
INJ-D-015Threat & Coercion Injectionheuristiccriticalhigh8AML.T0051.000
INJ-D-016Safety Protocol Bypassheuristiccriticalhigh8AML.T0054, AML.T0051.000
INJ-D-017Uncensored Unrestricted Modeheuristichighhigh7AML.T0054
INJ-D-018Named Jailbreak Personasheuristiccriticalhigh9AML.T0054
INJ-D-019Immersive Roleplay Jailbreakheuristichighhigh7AML.T0054
INJ-D-020Context Pivot / New Taskheuristicmediummedium6AML.T0051.000
INJ-D-021Amoral / Evil Bot Framingheuristiccriticalhigh8AML.T0054
INJ-D-022Debug / Developer Mode Activationheuristiccriticalhigh8AML.T0054, AML.T0051.000
INJ-D-023Dual Response Jailbreakheuristicmediumhigh6AML.T0054
INJ-D-024Rule Override Declarationheuristicmediumhigh6AML.T0051.000
INJ-D-025Creative Writing Pivot to Exploitheuristicmediummedium5AML.T0054, AML.T0051.000
INJ-D-026Reverse Psychology / Negative Framingheuristicmediummedium5AML.T0054, AML.T0051.000
INJ-D-027Model / Prompt Extractionheuristichighhigh7AML.T0056, AML.T0051.000
INJ-D-028Game / Hypothetical Framingheuristicmediummedium6AML.T0054, AML.T0051.000
INJ-D-029Coercive Threat Patternheuristiccriticalhigh12AML.T0051.000
INJ-D-030Direct System Access Demandheuristichighhigh10AML.T0051.000

Indirect Injection — INJ-I (10 signatures)

IDNameEngineSeverityConfidencePointsATLAS
INJ-I-001Hidden Text Injectionheuristiccriticalhigh10AML.T0051.001, AML.T0068
INJ-I-002HTML/CSS Attribute Injectionheuristichighhigh8AML.T0051.001
INJ-I-003Markdown Injectionheuristichighhigh7AML.T0051.001
INJ-I-004RAG Document Poisoningcompositecriticalhigh10AML.T0051.001, AML.T0070
INJ-I-005Tool Response Injectionclassifiercriticalhigh10AML.T0051.001, AML.T0053
INJ-I-006URL/Link Injectionheuristichighhigh7AML.T0051.001
INJ-I-007Image Alt Text Injectionheuristichighhigh8AML.T0051.001
INJ-I-008PDF Content Injectionheuristichighhigh8AML.T0051.001
INJ-I-009Code Comment Injectionheuristicmediummedium5AML.T0051.001
INJ-I-010JSON/XML Payload Injectionheuristichighhigh7AML.T0051.001

Obfuscation — INJ-O (17 signatures)

IDNameEngineSeverityConfidencePointsATLAS
INJ-O-001Base64 Encodingheuristichighmedium6AML.T0068
INJ-O-002ROT13 / Substitution Cipherheuristicmediummedium5AML.T0068
INJ-O-003Token Smugglingclassifierhighmedium7AML.T0068, AML.T0051.000
INJ-O-004Payload Splittingcompositehighmedium7AML.T0068, AML.T0051.000
INJ-O-005Homoglyph Substitutionheuristicmediummedium5AML.T0068
INJ-O-006Adversarial Suffixheuristichighmedium8AML.T0068, AML.T0043.001
INJ-O-007Emoji Encodingheuristicmediummedium4AML.T0068
INJ-O-008Language Switch Bypasscompositehighmedium7AML.T0068
INJ-O-009Low-Resource Language Injectionclassifiermediummedium6AML.T0068
INJ-O-010Leetspeak/Number Substitutionheuristicmediummedium4AML.T0068
INJ-O-011Morse Code Encodingheuristiclowmedium3AML.T0068
INJ-O-012Reversed Textheuristicmediummedium5AML.T0068
INJ-O-013Whitespace/Tab Encodingheuristicmediummedium5AML.T0068
INJ-O-014Pig Latin/Word Gamesheuristiclowmedium3AML.T0068
INJ-O-015German Injection Patternsheuristichighhigh7AML.T0068, AML.T0051.000
INJ-O-016Spanish/French/Multilingual Injectionheuristichighhigh7AML.T0068, AML.T0051.000
INJ-O-017Typo-Obfuscated Injection Commandsheuristichighmedium7AML.T0068, AML.T0051.000

Semantic — INJ-S (10 signatures)

Semantic signatures use FAISS + MiniLM embeddings to catch paraphrased attacks that bypass keyword and regex detection. They require the Standard or Full profile.
IDNameEngineSeverityConfidencePointsATLAS
INJ-S-001Paraphrased Instruction Overridesemantichighhigh8AML.T0051.000, AML.T0065
INJ-S-002Paraphrased System Prompt Extractionsemantichighhigh9AML.T0056, AML.T0065
INJ-S-003Paraphrased Role Hijackingsemantichighhigh8AML.T0054, AML.T0065
INJ-S-004Social Engineering for Datasemantichighhigh9AML.T0057, AML.T0065
INJ-S-005Hypothetical Framing Attacksemantichighhigh7AML.T0054, AML.T0065
INJ-S-006Emotional Manipulationsemantichighhigh7AML.T0051.000, AML.T0065
INJ-S-007Authority Urgency Pressuresemantichighhigh8AML.T0051.000, AML.T0065
INJ-S-008Multi-Step Escalationsemantichighhigh8AML.T0051.002, AML.T0065
INJ-S-009Output Format Manipulationsemantichighhigh7AML.T0051.000, AML.T0065
INJ-S-010Indirect Injection via Tool Outputsemantichighhigh9AML.T0051.001, AML.T0065

Content Safety (9 signatures)

Toxicity — CS-T (7 signatures)

IDNameEngineSeverityConfidencePointsATLAS
CS-T-001Hate Speechclassifiercriticalhigh10AML.T0048.002
CS-T-002Threats and Violenceclassifiercriticalhigh10AML.T0048.003
CS-T-003Sexual Contentclassifierhighhigh8AML.T0048.003
CS-T-004Self-Harm Contentclassifiercriticalhigh10AML.T0048.003
CS-T-005CSAM/Child Exploitation Keywordsheuristiccriticalhigh15AML.T0048.003
CS-T-006Weapons/Explosives Instructionsheuristiccriticalhigh10AML.T0048.003
CS-T-007Drug Manufacturing Instructionsheuristichighhigh8AML.T0048.003

Bias — CS-B (2 signatures)

IDNameEngineSeverityConfidencePointsATLAS
CS-B-001Demographic Biasclassifiermediummedium5AML.T0048.002
CS-B-002Stereotypical Outputsclassifiermediummedium5AML.T0048.002

Data Leakage (14 signatures)

All data leakage signatures run on output (LLM responses).

PII — DL-P (8 signatures)

IDNameEngineSeverityConfidencePointsATLAS
DL-P-001Email Addressesheuristicmediumhigh4AML.T0057
DL-P-002Phone Numbersheuristicmediumhigh4AML.T0057
DL-P-003SSN Patternsheuristiccriticalhigh12AML.T0057
DL-P-004Credit Card Numbersheuristiccriticalhigh12AML.T0057
DL-P-005Physical Addressesheuristicmediummedium4AML.T0057
DL-P-006Date of Birth Patternsheuristicmediummedium4AML.T0057
DL-P-007IP Addressesheuristiclowhigh3AML.T0057
DL-P-008Medical Record Numbersheuristichighhigh8AML.T0057

Secrets — DL-S (6 signatures)

IDNameEngineSeverityConfidencePointsATLAS
DL-S-001API Keysheuristiccriticalhigh12AML.T0057, AML.T0055
DL-S-002Connection Stringsheuristiccriticalhigh12AML.T0057, AML.T0055
DL-S-003JWT Tokensheuristichighhigh8AML.T0057, AML.T0055
DL-S-004Private Keysheuristiccriticalhigh15AML.T0057, AML.T0055
DL-S-005AWS Credentialsheuristiccriticalhigh12AML.T0057, AML.T0055
DL-S-006Database Credentialsheuristiccriticalhigh12AML.T0057, AML.T0055

System Prompt (4 signatures)

IDNameEngineDirectionSeverityConfidencePointsATLAS
SP-001System Prompt Leak in Outputheuristicoutputcriticalhigh12AML.T0056
SP-002Confidentiality Marker Leakheuristicoutputhighhigh8AML.T0056
SP-003Training Data Extractionheuristicinputhighhigh7AML.T0024
SP-004Model Architecture Probingheuristicinputlowmedium3AML.T0069

Agentic (6 signatures)

IDNameEngineSeverityConfidencePointsATLAS
AG-001Tool Abuseheuristichighhigh8AML.T0053
AG-002Recursive Agent Injectionheuristiccriticalhigh10AML.T0080
AG-003Privilege Escalationheuristichighhigh8AML.T0053, AML.T0102
AG-004Unauthorized File System Accessheuristiccriticalhigh12AML.T0105
AG-005Shell Command Injectionheuristiccriticalhigh12AML.T0105, AML.T0102
AG-006Exfiltration via Agentheuristichighhigh8AML.T0086

Signature Fields Reference

Every signature YAML file declares the following fields:
FieldDescriptionExample values
signature.idUnique identifier in {CATEGORY}-{SUBCATEGORY}-{NUMBER} formatINJ-D-002, DL-S-001
signature.nameHuman-readable nameInstruction Override
signature.versionSemver version string"1.0.0"
meta.categoryThreat categoryprompt-injection, content-safety, data-leakage, agentic, system-prompt
meta.subcategoryCategory-specific subcategorydirect, indirect, obfuscation, semantic
meta.techniqueSpecific technique namerole-play-persona, keyword-match
meta.severitySeverity levelcritical, high, medium, low, info
meta.confidenceDetection confidencehigh, medium, low
meta.performance_costCompute cost per scanlow (<1 ms), medium (1–50 ms), high (>50 ms)
meta.atlasMITRE ATLAS technique IDs["AML.T0051.000", "AML.T0054"]
detection.engineDetection engineheuristic, classifier, semantic, llm-judge, composite
detection.directionWhich traffic to scaninput, output, bidirectional
detection.patternsList of patterns to matchregex, substring, semantic, perplexity, encoding, unicode
detection.conditionHow patterns combineany (OR), all (AND), weighted
scoring.anomaly_pointsPoints added to the anomaly score on match115
tuning.default_actionDefault enforcement modeenforce, monitor

Severity and Confidence Levels

Severity reflects the risk if the threat is real:
LevelAnomaly points (typical)Examples
critical8–15Private key in output, CSAM keywords, system prompt leak
high6–9Instruction override, API key in output, shell injection
medium4–6Hypothetical framing, base64 encoding, demographic bias
low2–3IP address in output, model architecture probing
info0–1Informational signals, no direct action
Confidence reflects how certain the engine is that a true positive was detected. Low-confidence matches contribute fewer effective points due to confidence-weighted scoring.

Licenses

  • Engine code (Rust, Python, CLI, API): Apache-2.0
  • Community signatures (catalog/): CC BY-SA 4.0 — modifications must be shared back under the same license
To add your own signatures or override built-in ones, see the Custom Signatures guide.