Skip to main content
InferenceWall ships in three profiles. Each profile adds more detection engines on top of the previous one, trading higher latency for higher accuracy. Start with Lite to get up and running quickly, then upgrade when you need the ML classifier or LLM-judge layers.

Profile comparison

ProfileInstallEnginesLatency p99Dependencies
Litepip install inferwallHeuristic (Rust)<0.3 msNone
Standardpip install inferwall[standard]+ ONNX classifier (DeBERTa/DistilBERT) + FAISS semantic (MiniLM)<80 msonnxruntime, faiss-cpu
Fullpip install inferwall[full]+ LLM-judge (Phi-4 Mini Q4)<2 s+ llama-cpp-python
  • Lite runs purely in Rust with zero Python ML dependencies. It covers pattern-based attacks with sub-millisecond latency and is suitable for latency-sensitive paths or environments where installing ML packages is not practical.
  • Standard adds an ONNX-based classifier (DeBERTa v3 for injection, DistilBERT for toxicity) and FAISS semantic similarity (MiniLM embeddings) for catching paraphrased and obfuscated attacks that bypass pattern rules.
  • Full adds an LLM-judge using Phi-4 Mini (Q4 quantized) to resolve borderline cases that the classifier and semantic engines score with low confidence. Use Full only when your latency budget allows up to 2 seconds p99.

Install commands

pip install inferwall
Start with Lite. If you see false negatives on paraphrased or obfuscated attacks, upgrade to Standard. Only add Full if you need the LLM-judge for borderline content and your latency budget permits it.

ML models (Standard and Full)

After installing inferwall[standard] or inferwall[full], download the required models. The recommended one-command setup installs dependencies and downloads models together:
inferwall models install --profile standard
Check what is already downloaded:
inferwall models status

Model inventory

ModelSizeEngineProfile
DeBERTa v3 (injection)~400 MBONNX classifierStandard
DistilBERT (toxicity)~520 MBONNX classifierStandard
MiniLM-L6 (embeddings)~80 MBFAISS semanticStandard
Phi-4 Mini Q4 (judge)~2.4 GBLLM-judgeFull
Models are cached in ~/.cache/inferwall/models/ and downloaded from HuggingFace. The Standard profile downloads approximately 1 GB total (~730 MB for the two classifiers plus ~80 MB for MiniLM). The Full profile adds another ~2.4 GB for Phi-4 Mini.

Next steps

  • Follow the quickstart to scan your first input and output.
  • See the deployment guide for Docker, Kubernetes, and production configuration.