Deploy InferenceWall in Production

InferenceWall supports three deployment paths: a Python package installed directly on the host, a Docker container, or a Kubernetes deployment via Helm. All three paths expose the same API server and SDK. Choose based on your existing infrastructure.

Installation

# Lite — heuristic engine only, zero ML deps
pip install inferwall

# Standard — adds ONNX classifier + FAISS semantic engine
pip install inferwall[standard]

# Full — adds LLM-judge for borderline cases
pip install inferwall[full]

Pre-built wheels are available for Linux x86_64, Linux aarch64, macOS arm64, and Windows x86_64. Requires Python >= 3.10.

Deployment profiles

Profile	Install	Engines	Latency
Lite	`pip install inferwall`	Heuristic (Rust)	<0.3ms p99
Standard	`pip install inferwall[standard]`	+ Classifier (ONNX) + Semantic (FAISS)	<80ms p99
Full	`pip install inferwall[full]`	+ LLM-Judge	<2s p99

Post-install setup

Generate API keys

inferwall admin setup

This generates a scan key (iwk_scan_…) and an admin key (iwk_admin_…) and writes them to .env.local.

Set environment variables

Export the generated keys before starting the server:

export IW_API_KEY=iwk_scan_yourkey
export IW_ADMIN_KEY=iwk_admin_yourkey

Or source the generated file directly:

source .env.local

Start the server

inferwall serve

# Or source the generated env file and serve in one command
source .env.local && inferwall serve

The server listens on 0.0.0.0:8000 by default.

Install ML models (Standard and Full only)

If you installed the standard or full profile, download the ML models:

inferwall models install --profile standard

Models are cached in ~/.cache/inferwall/models/ and downloaded from HuggingFace (~730 MB for Standard).

Run a health check

Confirm the server is up and signatures are loaded:

curl http://localhost:8000/v1/health

In development, you can skip API key setup entirely. Run inferwall serve without setting IW_API_KEY or IW_ADMIN_KEY and scan without any Authorization header. Dev mode is not suitable for production.

Environment variables

Variable	Description	Default
`IW_API_KEY`	Scan API key	None (dev mode)
`IW_ADMIN_KEY`	Admin API key	None (dev mode)
`IW_HOST`	Server bind host	`0.0.0.0`
`IW_PORT`	Server port	`8000`
`IW_TLS`	TLS mode: `auto`, `off`, or `acme`	`off`
`IW_PROFILE`	Deployment profile: `lite`, `standard`, `full`	`lite`
`IW_LOG_LEVEL`	Log verbosity: `debug`, `info`, `warning`, `error`	`info`
`IW_REDIS_URL`	Redis URL for distributed sessions	None

TLS modes

Mode	Behavior
`off`	Plain HTTP (default)
`auto`	TLS using a certificate at the path provided in `IW_TLS`
`acme`	Automatic certificate provisioning via ACME/Let’s Encrypt

Redis for distributed sessions

Set IW_REDIS_URL to enable distributed rate limiting and session state across multiple InferenceWall instances:

export IW_REDIS_URL=redis://redis:6379

When unset, InferenceWall uses in-process state, which is scoped to a single instance.

Health check endpoints

Endpoint	Purpose	Use in
`GET /v1/health/live`	Liveness — is the process alive?	Kubernetes `livenessProbe`
`GET /v1/health/ready`	Readiness — can it handle requests?	Kubernetes `readinessProbe`
`GET /v1/health`	Full health with signature count and engine status	Monitoring dashboards

# Liveness
curl http://localhost:8000/v1/health/live

# Readiness
curl http://localhost:8000/v1/health/ready

# Full health
curl http://localhost:8000/v1/health

Environment variables reference

Complete list of all environment variables with types, defaults, and valid values.

Health API

Response schemas for the liveness, readiness, and full health endpoints.

​Installation

​Deployment profiles

​Post-install setup

​Environment variables

​TLS modes

​Redis for distributed sessions

​Health check endpoints

​Further reading

Environment variables reference

Health API

Installation

Deployment profiles

Post-install setup

Environment variables

TLS modes

Redis for distributed sessions

Health check endpoints

Further reading