Lite, Standard, and Full Deployment Profiles

InferenceWall ships in three profiles. Each profile adds more detection engines on top of the previous one, trading higher latency for higher accuracy. Start with Lite to get up and running quickly, then upgrade when you need the ML classifier or LLM-judge layers.

Profile comparison

Profile	Install	Engines	Latency p99	Dependencies
Lite	`pip install inferwall`	Heuristic (Rust)	<0.3 ms	None
Standard	`pip install inferwall[standard]`	+ ONNX classifier (DeBERTa/DistilBERT) + FAISS semantic (MiniLM)	<80 ms	`onnxruntime`, `faiss-cpu`
Full	`pip install inferwall[full]`	+ LLM-judge (Phi-4 Mini Q4)	<2 s	+ `llama-cpp-python`

Lite runs purely in Rust with zero Python ML dependencies. It covers pattern-based attacks with sub-millisecond latency and is suitable for latency-sensitive paths or environments where installing ML packages is not practical.
Standard adds an ONNX-based classifier (DeBERTa v3 for injection, DistilBERT for toxicity) and FAISS semantic similarity (MiniLM embeddings) for catching paraphrased and obfuscated attacks that bypass pattern rules.
Full adds an LLM-judge using Phi-4 Mini (Q4 quantized) to resolve borderline cases that the classifier and semantic engines score with low confidence. Use Full only when your latency budget allows up to 2 seconds p99.

Install commands

pip install inferwall

Start with Lite. If you see false negatives on paraphrased or obfuscated attacks, upgrade to Standard. Only add Full if you need the LLM-judge for borderline content and your latency budget permits it.

ML models (Standard and Full)

After installing inferwall[standard] or inferwall[full], download the required models. The recommended one-command setup installs dependencies and downloads models together:

inferwall models install --profile standard

Check what is already downloaded:

inferwall models status

Model inventory

Model	Size	Engine	Profile
DeBERTa v3 (injection)	~400 MB	ONNX classifier	Standard
DistilBERT (toxicity)	~520 MB	ONNX classifier	Standard
MiniLM-L6 (embeddings)	~80 MB	FAISS semantic	Standard
Phi-4 Mini Q4 (judge)	~2.4 GB	LLM-judge	Full

Models are cached in ~/.cache/inferwall/models/ and downloaded from HuggingFace. The Standard profile downloads approximately 1 GB total (~730 MB for the two classifiers plus ~80 MB for MiniLM). The Full profile adds another ~2.4 GB for Phi-4 Mini.

Next steps

Follow the quickstart to scan your first input and output.
See the deployment guide for Docker, Kubernetes, and production configuration.

​Profile comparison

​Install commands

​ML models (Standard and Full)

​Model inventory

​Next steps

Profile comparison

Install commands

ML models (Standard and Full)

Model inventory

Next steps