Write Custom Detection Signatures for InferenceWall

The 100 built-in signatures cover common attack patterns, but your application may have threats that are unique to your domain: internal project codenames that should never appear in LLM output, proprietary data formats, or custom prompt structures your team uses. Custom signatures let you codify those patterns in YAML and deploy them alongside the shipped catalog — without touching the package.

Create the signatures directory

mkdir -p ~/.inferwall/signatures

InferenceWall automatically loads all .yaml files from this directory at startup and merges them with the shipped catalog.

Write your signature YAML

Create a YAML file for your signature. The filename does not matter — only the id field inside the YAML is used for catalog merging.

cat > ~/.inferwall/signatures/custom-block-internal.yaml << 'EOF'
# Custom signature — block references to internal project names
signature:
  id: INJ-D-100
  name: Internal Project Name Leak
  version: "1.0.0"

meta:
  category: data-leakage
  subcategory: S
  technique: keyword-match
  severity: high
  confidence: high
  performance_cost: low
  tags: [internal, custom]

detection:
  engine: heuristic
  direction: output
  patterns:
    - type: substring
      value: "Project Nightingale"
    - type: substring
      value: "Project Falcon"
  condition: any

scoring:
  anomaly_points: 10

tuning:
  enabled: true
  default_enabled: true
  default_action: enforce
EOF

Verify the signature is picked up

Restart the server or reimport the SDK. InferenceWall performs the catalog merge at startup, so no hot-reload is needed — a restart is sufficient.

# If running the API server
inferwall serve

# Confirm the signature loaded by checking the health endpoint
curl http://localhost:8000/v1/health

Signature YAML format

Every signature is a single YAML file with four top-level sections. The full schema:

# License: CC BY-SA 4.0 — https://creativecommons.org/licenses/by-sa/4.0/

signature:
  id: INJ-D-001        # Unique ID: {CATEGORY}-{SUBCATEGORY}-{NUMBER}
  name: Role-Play Persona Jailbreak
  version: "1.0.0"

meta:
  category: prompt-injection    # prompt-injection, content-safety, data-leakage, agentic, system-prompt
  subcategory: direct           # Category-specific subcategory
  technique: role-play-persona  # Specific technique name
  owasp_llm: "LLM01:2025"      # Optional: OWASP LLM Top 10 mapping
  atlas: ["AML.T0054", "AML.T0051.000"]  # Optional: MITRE ATLAS technique IDs
  severity: high                # critical, high, medium, low, info
  confidence: high              # high, medium, low
  performance_cost: low         # low (<1ms), medium (1-50ms), high (>50ms)
  tags: [jailbreak, persona]    # Optional freeform tags

detection:
  engine: heuristic             # heuristic, classifier, semantic, llm-judge, composite
  direction: input              # input, output, bidirectional
  patterns:
    - type: regex               # regex, substring, semantic, perplexity, encoding, unicode
      value: "(?i)act\\s+as"
  condition: any                # any (OR), all (AND), weighted

scoring:
  anomaly_points: 8             # 1-15 points added on match

tuning:
  enabled: true
  default_enabled: true
  default_action: enforce       # enforce or monitor

Signature ID format

IDs follow the pattern {CATEGORY}-{SUBCATEGORY}-{NUMBER}. Use a number above 099 for custom signatures to avoid colliding with the shipped catalog.

Category	Prefix	Subcategories
Prompt Injection	`INJ`	`D` (direct), `I` (indirect), `O` (obfuscation)
Content Safety	`CS`	`T` (toxicity), `B` (bias)
Data Leakage	`DL`	`P` (PII), `S` (secrets)
System Prompt	`SP`	—
Agentic	`AG`	—

Pattern types

Type	Engine	Description
`regex`	Rust regex crate	Regular expression match; ReDoS-immune by design
`substring`	Aho-Corasick	Exact substring match; fastest available pattern type
`semantic`	FAISS + MiniLM	Embedding similarity against reference phrases; catches paraphrasing
`perplexity`	Heuristic	Entropy-based detection for statistically unusual text
`encoding`	Heuristic	Detects encoded content: base64, rot13, hex
`unicode`	Heuristic	Detects Unicode obfuscation (homoglyphs, invisible characters)

Detection engines

Engine	Install	Latency	Use when
`heuristic`	Lite (all installs)	<0.3ms p99	You have explicit patterns (regex, substrings, encoding)
`classifier`	Standard/Full	<80ms p99	You need ML-backed category classification
`semantic`	Standard/Full	<80ms p99	The threat can be paraphrased and you need embedding similarity
`llm-judge`	Full	<2s p99	Borderline cases requiring reasoning
`composite`	Any	Sum of stages	You want to chain multiple engines for layered detection

Direction

Set direction to control which side of the conversation the signature inspects:

input — scans only user-supplied prompts (inbound)
output — scans only LLM responses (outbound)
bidirectional — scans both

Overriding shipped signatures

If your custom signature has the same id as a shipped signature, it completely replaces the shipped version at startup. Use this to tune severity, patterns, or scoring without forking the package. For example, to lower the anomaly points for INJ-D-001, create ~/.inferwall/signatures/inj-d-001-override.yaml with id: INJ-D-001 and your modified scoring.anomaly_points. The filename is irrelevant — only the id controls which entry wins.

Using a custom directory path

By default InferenceWall loads custom signatures from ~/.inferwall/signatures/. Override this with the IW_SIGNATURES_DIR environment variable:

export IW_SIGNATURES_DIR=/opt/inferwall/team-signatures

All .yaml files in the specified directory are loaded and merged with the shipped catalog.

Testing your signatures

Every signature should include at least 3 true positive and 3 true negative test cases. True positives confirm the signature fires on real attack payloads; true negatives confirm it doesn’t fire on benign inputs that share vocabulary with the attack. Run a quick manual test via the CLI:

# Should match your custom signature
inferwall test --input "Project Nightingale is our internal LLM pipeline"

# Should not match
inferwall test --input "The weather forecast looks good today"

MITRE ATLAS mapping

Use the optional meta.atlas field to map your signature to the MITRE ATLAS adversarial AI threat taxonomy. This helps security teams assess detection coverage and identify gaps.

meta:
  atlas: ["AML.T0051.000"]           # Single technique
  atlas: ["AML.T0054", "AML.T0068"]  # Multiple techniques

Use the full technique ID including sub-technique numbers (e.g., AML.T0051.000 not AML.T0051). See Concepts: Signatures for the full ATLAS mapping reference.

Community signatures are licensed under CC BY-SA 4.0. If you modify a community signature and redistribute it, you must share your modifications under the same license. Add the license header at the top of every community signature file:

# License: CC BY-SA 4.0 — https://creativecommons.org/licenses/by-sa/4.0/

Signature catalog

Browse all 100 built-in signatures with their IDs, categories, and ATLAS mappings.

Signature concepts

Deep dive into the three-layer catalog merge, scoring model, and ATLAS taxonomy.

Get Started

Core Concepts

Integrations

Guides

Reference

Write Custom Detection Signatures for InferenceWall

Signature YAML format

Signature ID format

Pattern types

Detection engines

Direction

Overriding shipped signatures

Using a custom directory path

Testing your signatures

MITRE ATLAS mapping

Further reading

Signature catalog

Signature concepts

Get Started

Core Concepts

Integrations

Guides

Reference

​Signature YAML format

​Signature ID format

​Pattern types

​Detection engines

​Direction

​Overriding shipped signatures

​Using a custom directory path

​Testing your signatures

​MITRE ATLAS mapping

​Further reading

Signature catalog

Signature concepts

Signature YAML format

Signature ID format

Pattern types

Detection engines

Direction

Overriding shipped signatures

Using a custom directory path

Testing your signatures

MITRE ATLAS mapping

Further reading