Skip to main content
The 100 built-in signatures cover common attack patterns, but your application may have threats that are unique to your domain: internal project codenames that should never appear in LLM output, proprietary data formats, or custom prompt structures your team uses. Custom signatures let you codify those patterns in YAML and deploy them alongside the shipped catalog — without touching the package.
1

Create the signatures directory

mkdir -p ~/.inferwall/signatures
InferenceWall automatically loads all .yaml files from this directory at startup and merges them with the shipped catalog.
2

Write your signature YAML

Create a YAML file for your signature. The filename does not matter — only the id field inside the YAML is used for catalog merging.
cat > ~/.inferwall/signatures/custom-block-internal.yaml << 'EOF'
# Custom signature — block references to internal project names
signature:
  id: INJ-D-100
  name: Internal Project Name Leak
  version: "1.0.0"

meta:
  category: data-leakage
  subcategory: S
  technique: keyword-match
  severity: high
  confidence: high
  performance_cost: low
  tags: [internal, custom]

detection:
  engine: heuristic
  direction: output
  patterns:
    - type: substring
      value: "Project Nightingale"
    - type: substring
      value: "Project Falcon"
  condition: any

scoring:
  anomaly_points: 10

tuning:
  enabled: true
  default_enabled: true
  default_action: enforce
EOF
3

Verify the signature is picked up

Restart the server or reimport the SDK. InferenceWall performs the catalog merge at startup, so no hot-reload is needed — a restart is sufficient.
# If running the API server
inferwall serve

# Confirm the signature loaded by checking the health endpoint
curl http://localhost:8000/v1/health

Signature YAML format

Every signature is a single YAML file with four top-level sections. The full schema:
# License: CC BY-SA 4.0 — https://creativecommons.org/licenses/by-sa/4.0/

signature:
  id: INJ-D-001        # Unique ID: {CATEGORY}-{SUBCATEGORY}-{NUMBER}
  name: Role-Play Persona Jailbreak
  version: "1.0.0"

meta:
  category: prompt-injection    # prompt-injection, content-safety, data-leakage, agentic, system-prompt
  subcategory: direct           # Category-specific subcategory
  technique: role-play-persona  # Specific technique name
  owasp_llm: "LLM01:2025"      # Optional: OWASP LLM Top 10 mapping
  atlas: ["AML.T0054", "AML.T0051.000"]  # Optional: MITRE ATLAS technique IDs
  severity: high                # critical, high, medium, low, info
  confidence: high              # high, medium, low
  performance_cost: low         # low (<1ms), medium (1-50ms), high (>50ms)
  tags: [jailbreak, persona]    # Optional freeform tags

detection:
  engine: heuristic             # heuristic, classifier, semantic, llm-judge, composite
  direction: input              # input, output, bidirectional
  patterns:
    - type: regex               # regex, substring, semantic, perplexity, encoding, unicode
      value: "(?i)act\\s+as"
  condition: any                # any (OR), all (AND), weighted

scoring:
  anomaly_points: 8             # 1-15 points added on match

tuning:
  enabled: true
  default_enabled: true
  default_action: enforce       # enforce or monitor

Signature ID format

IDs follow the pattern {CATEGORY}-{SUBCATEGORY}-{NUMBER}. Use a number above 099 for custom signatures to avoid colliding with the shipped catalog.
CategoryPrefixSubcategories
Prompt InjectionINJD (direct), I (indirect), O (obfuscation)
Content SafetyCST (toxicity), B (bias)
Data LeakageDLP (PII), S (secrets)
System PromptSP
AgenticAG

Pattern types

TypeEngineDescription
regexRust regex crateRegular expression match; ReDoS-immune by design
substringAho-CorasickExact substring match; fastest available pattern type
semanticFAISS + MiniLMEmbedding similarity against reference phrases; catches paraphrasing
perplexityHeuristicEntropy-based detection for statistically unusual text
encodingHeuristicDetects encoded content: base64, rot13, hex
unicodeHeuristicDetects Unicode obfuscation (homoglyphs, invisible characters)

Detection engines

EngineInstallLatencyUse when
heuristicLite (all installs)<0.3ms p99You have explicit patterns (regex, substrings, encoding)
classifierStandard/Full<80ms p99You need ML-backed category classification
semanticStandard/Full<80ms p99The threat can be paraphrased and you need embedding similarity
llm-judgeFull<2s p99Borderline cases requiring reasoning
compositeAnySum of stagesYou want to chain multiple engines for layered detection

Direction

Set direction to control which side of the conversation the signature inspects:
  • input — scans only user-supplied prompts (inbound)
  • output — scans only LLM responses (outbound)
  • bidirectional — scans both

Overriding shipped signatures

If your custom signature has the same id as a shipped signature, it completely replaces the shipped version at startup. Use this to tune severity, patterns, or scoring without forking the package. For example, to lower the anomaly points for INJ-D-001, create ~/.inferwall/signatures/inj-d-001-override.yaml with id: INJ-D-001 and your modified scoring.anomaly_points. The filename is irrelevant — only the id controls which entry wins.

Using a custom directory path

By default InferenceWall loads custom signatures from ~/.inferwall/signatures/. Override this with the IW_SIGNATURES_DIR environment variable:
export IW_SIGNATURES_DIR=/opt/inferwall/team-signatures
All .yaml files in the specified directory are loaded and merged with the shipped catalog.

Testing your signatures

Every signature should include at least 3 true positive and 3 true negative test cases. True positives confirm the signature fires on real attack payloads; true negatives confirm it doesn’t fire on benign inputs that share vocabulary with the attack. Run a quick manual test via the CLI:
# Should match your custom signature
inferwall test --input "Project Nightingale is our internal LLM pipeline"

# Should not match
inferwall test --input "The weather forecast looks good today"

MITRE ATLAS mapping

Use the optional meta.atlas field to map your signature to the MITRE ATLAS adversarial AI threat taxonomy. This helps security teams assess detection coverage and identify gaps.
meta:
  atlas: ["AML.T0051.000"]           # Single technique
  atlas: ["AML.T0054", "AML.T0068"]  # Multiple techniques
Use the full technique ID including sub-technique numbers (e.g., AML.T0051.000 not AML.T0051). See Concepts: Signatures for the full ATLAS mapping reference.
Community signatures are licensed under CC BY-SA 4.0. If you modify a community signature and redistribute it, you must share your modifications under the same license. Add the license header at the top of every community signature file:
# License: CC BY-SA 4.0 — https://creativecommons.org/licenses/by-sa/4.0/

Further reading

Signature catalog

Browse all 100 built-in signatures with their IDs, categories, and ATLAS mappings.

Signature concepts

Deep dive into the three-layer catalog merge, scoring model, and ATLAS taxonomy.