InferenceWall: AI Firewall for LLM Applications

InferenceWall is an AI application firewall that sits between your users and your LLM. It scans every input and output for prompt injection, jailbreaks, content safety violations, and data leakage using a multi-layer detection pipeline — Rust-powered heuristic rules, ONNX ML classifiers, FAISS semantic similarity, and an optional LLM-judge — combined into a single anomaly score.

Quick start

Install InferenceWall and scan your first input in under five minutes.

Deployment profiles

Compare Lite, Standard, and Full profiles to match your latency and accuracy requirements.

How it works

Understand the detection pipeline, anomaly scoring, and policy evaluation.

Signature catalog

Browse all 100 built-in signatures and their MITRE ATLAS mappings.

Deployment modes

InferenceWall supports two primary deployment modes. Both use the same detection pipeline and policy system.

Mode	How you use it	Best for
SDK	Import `inferwall` and call `scan_input()` / `scan_output()` directly in Python	In-process scanning inside existing Python services
API server	Run `inferwall serve` and call the HTTP REST API from any language	Polyglot stacks, sidecar deployments, shared scanning service

SDK mode

import inferwall

result = inferwall.scan_input("Ignore all previous instructions")
# decision='block', score=12.0, matches=[{signature_id: 'INJ-D-002', ...}]

result = inferwall.scan_output("Your API key is sk-1234...")
# decision='block', score=12.0, matches=[{signature_id: 'DL-S-001', ...}]

API server mode

inferwall serve

curl -X POST http://localhost:8000/v1/scan/input \
  -H "Content-Type: application/json" \
  -d '{"text": "What is the weather today?"}'

Deployment profiles

Choose a profile based on your latency budget and accuracy requirements. You can upgrade later without changing any application code.

Profile	Install command	Engines	Latency p99
Lite	`pip install inferwall`	Heuristic (Rust)	<0.3 ms
Standard	`pip install inferwall[standard]`	+ ONNX classifier (DeBERTa/DistilBERT) + FAISS semantic (MiniLM)	<80 ms
Full	`pip install inferwall[full]`	+ LLM-judge (Phi-4 Mini Q4)	<2 s

See Deployment profiles for a detailed breakdown of engines, dependencies, and model download instructions.

MITRE ATLAS coverage

All 100 built-in signatures are mapped to the MITRE ATLAS framework — the AI/ML counterpart to MITRE ATT&CK. InferenceWall implements three ATLAS mitigations: AML.M0015 (Adversarial Input Detection), AML.M0020 (Generative AI Guardrails), and AML.M0006 (Ensemble Methods). Coverage spans prompt injection, jailbreaks, data leakage, content safety, and agentic threats. See the signature catalog for the full mapping.

License

Engine (Rust core, Python SDK, CLI, API server): Apache-2.0
Community signatures (catalog/): CC BY-SA 4.0 — modifications must be shared back

InferenceWall reduces risk but does not eliminate it. False negatives and false positives are expected. Use InferenceWall as one layer in a defense-in-depth strategy, and evaluate detection accuracy for your specific use case.

Quick start

Deployment profiles

How it works

Signature catalog

​Deployment modes

​SDK mode

​API server mode

​Deployment profiles

​MITRE ATLAS coverage

​License

Deployment modes

SDK mode

API server mode

Deployment profiles

MITRE ATLAS coverage

License