InferenceWall ships in three profiles. Each profile adds more detection engines on top of the previous one, trading higher latency for higher accuracy. Start with Lite to get up and running quickly, then upgrade when you need the ML classifier or LLM-judge layers.
Profile comparison
| Profile | Install | Engines | Latency p99 | Dependencies |
|---|
| Lite | pip install inferwall | Heuristic (Rust) | <0.3 ms | None |
| Standard | pip install inferwall[standard] | + ONNX classifier (DeBERTa/DistilBERT) + FAISS semantic (MiniLM) | <80 ms | onnxruntime, faiss-cpu |
| Full | pip install inferwall[full] | + LLM-judge (Phi-4 Mini Q4) | <2 s | + llama-cpp-python |
- Lite runs purely in Rust with zero Python ML dependencies. It covers pattern-based attacks with sub-millisecond latency and is suitable for latency-sensitive paths or environments where installing ML packages is not practical.
- Standard adds an ONNX-based classifier (DeBERTa v3 for injection, DistilBERT for toxicity) and FAISS semantic similarity (MiniLM embeddings) for catching paraphrased and obfuscated attacks that bypass pattern rules.
- Full adds an LLM-judge using Phi-4 Mini (Q4 quantized) to resolve borderline cases that the classifier and semantic engines score with low confidence. Use Full only when your latency budget allows up to 2 seconds p99.
Install commands
Start with Lite. If you see false negatives on paraphrased or obfuscated attacks, upgrade to Standard. Only add Full if you need the LLM-judge for borderline content and your latency budget permits it.
ML models (Standard and Full)
After installing inferwall[standard] or inferwall[full], download the required models. The recommended one-command setup installs dependencies and downloads models together:
inferwall models install --profile standard
Check what is already downloaded:
Model inventory
| Model | Size | Engine | Profile |
|---|
| DeBERTa v3 (injection) | ~400 MB | ONNX classifier | Standard |
| DistilBERT (toxicity) | ~520 MB | ONNX classifier | Standard |
| MiniLM-L6 (embeddings) | ~80 MB | FAISS semantic | Standard |
| Phi-4 Mini Q4 (judge) | ~2.4 GB | LLM-judge | Full |
Models are cached in ~/.cache/inferwall/models/ and downloaded from HuggingFace. The Standard profile downloads approximately 1 GB total (~730 MB for the two classifiers plus ~80 MB for MiniLM). The Full profile adds another ~2.4 GB for Phi-4 Mini.
Next steps
- Follow the quickstart to scan your first input and output.
- See the deployment guide for Docker, Kubernetes, and production configuration.