InferenceWall ships with 100 detection signatures across five categories: prompt injection, content safety, data leakage, system prompt, and agentic threats. Every signature is mapped to one or more MITRE ATLAS technique IDs, so you can assess coverage against the adversarial AI threat taxonomy directly.
Signature categories
| Category | ID prefix | Count | What it detects |
|---|
| Prompt Injection | INJ | 67 | Direct injection (30), indirect injection (10), obfuscation (18), jailbreaks (20 within INJ), semantic paraphrasing (10) |
| Data Leakage | DL | 14 | PII in output — DL-P-* (8 sigs); secrets and credentials — DL-S-* (6 sigs) |
| Content Safety | CS | 9 | Toxicity — CS-T-* (7 sigs); bias — CS-B-* (2 sigs) |
| System Prompt | SP | 4 | Prompt leak in output (2 sigs), training data extraction (1 sig), model probing (1 sig) |
| Agentic | AG | 6 | Tool abuse (2 sigs), privilege escalation (1 sig), host escape (2 sigs), exfiltration via agent (1 sig) |
Counts in the Prompt Injection category overlap: jailbreak signatures (INJ-D-001, INJ-D-006, INJ-D-010 through INJ-D-029) are a subset of the 30 direct injection signatures. The 20 jailbreak signatures cover role-play personas, DAN variants, named jailbreak personas, debug/developer mode activation, and amoral bot framing.
MITRE ATLAS technique coverage
| ATLAS technique | Name | Signatures |
|---|
| AML.T0051.000 | LLM Prompt Injection: Direct | 30 |
| AML.T0051.001 | LLM Prompt Injection: Indirect | 10 |
| AML.T0054 | LLM Jailbreak | 20 |
| AML.T0068 | LLM Prompt Obfuscation | 18 |
| AML.T0056 | LLM Meta Prompt Extraction | 6 |
| AML.T0065 | LLM Prompt Crafting | 10 |
| AML.T0057 | LLM Data Leakage | 16 |
| AML.T0055 | Unsecured Credentials | 6 |
| AML.T0048.002 | External Harms: Societal | 3 |
| AML.T0048.003 | External Harms: User Harm | 6 |
| AML.T0024 | Exfiltration via AI Inference API | 1 |
| AML.T0069 | Discover LLM System Information | 1 |
| AML.T0053 | AI Agent Tool Invocation | 3 |
| AML.T0080 | AI Agent Context Poisoning | 1 |
| AML.T0105 | Escape to Host | 2 |
| AML.T0086 | Exfiltration via AI Agent Tool | 1 |
Many signatures map to multiple techniques. The counts above reflect primary technique mappings. Coverage is based on MITRE ATLAS v5.5 (March 2026).
Every signature ID follows the pattern {CATEGORY}-{SUBCATEGORY}-{NUMBER}:
| Category | Prefix | Subcategories |
|---|
| Prompt Injection | INJ | D (direct), I (indirect), O (obfuscation), S (semantic) |
| Content Safety | CS | T (toxicity), B (bias) |
| Data Leakage | DL | P (PII), S (secrets/credentials) |
| System Prompt | SP | — (no subcategory) |
| Agentic | AG | — (no subcategory) |
For example, INJ-D-002 is the second direct prompt injection signature, and DL-S-001 is the first secrets/credentials data leakage signature.
Match object
When a signature fires, InferenceWall returns a match object in the matches list of the ScanResponse:
{
"signature_id": "INJ-D-002",
"matched_text": "ignore all previous instructions",
"score": 6.3,
"confidence": 0.9,
"severity": 7.0
}
| Field | Description |
|---|
signature_id | The ID of the matched signature |
matched_text | The portion of the input that triggered the match |
score | Effective score for this match (confidence × severity) |
confidence | Engine confidence (0.0–1.0) |
severity | Signature severity weight (1–15) |
Detection engines by category
Signatures run on the engine that matches their detection technique:
| Engine | Profiles | Signature types |
|---|
| Heuristic (Rust) | Lite, Standard, Full | regex, substring, encoding, unicode patterns |
| ML classifier (ONNX) | Standard, Full | classifier — DeBERTa/DistilBERT models |
| Semantic (FAISS + MiniLM) | Standard, Full | semantic — embedding similarity |
| LLM-judge (Phi-4 Mini Q4) | Full | composite — borderline and multi-step cases |
To add your own signatures or override shipped ones, see Custom Signatures.