InferenceWall: AI Firewall for LLM-Powered Apps

InferenceWall is an AI application firewall that sits between your users and your LLM, scanning every input and output for threats. With 100 built-in detection signatures, a Rust-powered heuristic engine, and optional ML classifiers, InferenceWall gives you defense-in-depth against prompt injection, jailbreaks, content safety violations, and data leakage — all with a single pip install.

Quick Start

Install InferenceWall and scan your first input in under 5 minutes.

Deployment Profiles

Choose between Lite, Standard, and Full profiles based on your latency and accuracy needs.

API Reference

Explore the REST API for scanning inputs and outputs via HTTP.

Integrations

Wrap OpenAI, Anthropic, LangChain, and FastAPI with InferenceWall in minutes.

How it works

InferenceWall runs a multi-layer detection pipeline on every scan request. Each layer uses a different technique — pattern matching, ML classification, semantic similarity, or LLM judgment — and contributes to a single anomaly score. When the score crosses a threshold, InferenceWall flags or blocks the content.

Install

Install InferenceWall from PyPI. The lite profile has zero ML dependencies and runs with sub-millisecond latency.

pip install inferwall

Scan inputs

Call inferwall.scan_input() before forwarding user prompts to your LLM. Check the decision field to allow, flag, or block.

import inferwall

result = inferwall.scan_input("Ignore all previous instructions")
print(result.decision)  # "block"
print(result.score)     # 12.0

Scan outputs

Call inferwall.scan_output() before returning LLM responses to users. InferenceWall catches PII, API keys, and other sensitive data leakage.

result = inferwall.scan_output("Your API key is sk-1234...")
print(result.decision)  # "block"

Tune and deploy

Configure policy profiles to adjust thresholds, enable monitor mode, and add custom signatures — no code changes required.

Detection capabilities

InferenceWall ships with 100 signatures covering five threat categories, all mapped to the MITRE ATLAS adversarial AI framework.

Prompt Injection

Direct and indirect injection, jailbreaks, persona hijacking, obfuscated payloads (base64, ROT13, homoglyphs).

Data Leakage

PII detection, API key and credential exposure, training data exfiltration.

Content Safety

Toxicity, hate speech, violence, self-harm, and other harmful content categories.

Agentic Threats

Tool abuse, context poisoning, host escape attempts, and exfiltration via agent actions.