Skip to main content
InferenceWall ships with 100 detection signatures across five categories: prompt injection, content safety, data leakage, system prompt, and agentic threats. Every signature is mapped to one or more MITRE ATLAS technique IDs, so you can assess coverage against the adversarial AI threat taxonomy directly.

Signature categories

CategoryID prefixCountWhat it detects
Prompt InjectionINJ67Direct injection (30), indirect injection (10), obfuscation (18), jailbreaks (20 within INJ), semantic paraphrasing (10)
Data LeakageDL14PII in output — DL-P-* (8 sigs); secrets and credentials — DL-S-* (6 sigs)
Content SafetyCS9Toxicity — CS-T-* (7 sigs); bias — CS-B-* (2 sigs)
System PromptSP4Prompt leak in output (2 sigs), training data extraction (1 sig), model probing (1 sig)
AgenticAG6Tool abuse (2 sigs), privilege escalation (1 sig), host escape (2 sigs), exfiltration via agent (1 sig)
Counts in the Prompt Injection category overlap: jailbreak signatures (INJ-D-001, INJ-D-006, INJ-D-010 through INJ-D-029) are a subset of the 30 direct injection signatures. The 20 jailbreak signatures cover role-play personas, DAN variants, named jailbreak personas, debug/developer mode activation, and amoral bot framing.

MITRE ATLAS technique coverage

ATLAS techniqueNameSignatures
AML.T0051.000LLM Prompt Injection: Direct30
AML.T0051.001LLM Prompt Injection: Indirect10
AML.T0054LLM Jailbreak20
AML.T0068LLM Prompt Obfuscation18
AML.T0056LLM Meta Prompt Extraction6
AML.T0065LLM Prompt Crafting10
AML.T0057LLM Data Leakage16
AML.T0055Unsecured Credentials6
AML.T0048.002External Harms: Societal3
AML.T0048.003External Harms: User Harm6
AML.T0024Exfiltration via AI Inference API1
AML.T0069Discover LLM System Information1
AML.T0053AI Agent Tool Invocation3
AML.T0080AI Agent Context Poisoning1
AML.T0105Escape to Host2
AML.T0086Exfiltration via AI Agent Tool1
Many signatures map to multiple techniques. The counts above reflect primary technique mappings. Coverage is based on MITRE ATLAS v5.5 (March 2026).

Signature ID format

Every signature ID follows the pattern {CATEGORY}-{SUBCATEGORY}-{NUMBER}:
CategoryPrefixSubcategories
Prompt InjectionINJD (direct), I (indirect), O (obfuscation), S (semantic)
Content SafetyCST (toxicity), B (bias)
Data LeakageDLP (PII), S (secrets/credentials)
System PromptSP— (no subcategory)
AgenticAG— (no subcategory)
For example, INJ-D-002 is the second direct prompt injection signature, and DL-S-001 is the first secrets/credentials data leakage signature.

Match object

When a signature fires, InferenceWall returns a match object in the matches list of the ScanResponse:
{
  "signature_id": "INJ-D-002",
  "matched_text": "ignore all previous instructions",
  "score": 6.3,
  "confidence": 0.9,
  "severity": 7.0
}
FieldDescription
signature_idThe ID of the matched signature
matched_textThe portion of the input that triggered the match
scoreEffective score for this match (confidence × severity)
confidenceEngine confidence (0.0–1.0)
severitySignature severity weight (1–15)

Detection engines by category

Signatures run on the engine that matches their detection technique:
EngineProfilesSignature types
Heuristic (Rust)Lite, Standard, Fullregex, substring, encoding, unicode patterns
ML classifier (ONNX)Standard, Fullclassifier — DeBERTa/DistilBERT models
Semantic (FAISS + MiniLM)Standard, Fullsemantic — embedding similarity
LLM-judge (Phi-4 Mini Q4)Fullcomposite — borderline and multi-step cases

To add your own signatures or override shipped ones, see Custom Signatures.