Skip to main content
InferenceWall exposes three health endpoints for monitoring and orchestration. Use them to integrate with load balancers, container orchestrators, and uptime monitors.

GET /v1/health

Full health status including the number of loaded signatures, engine status, and server uptime. Use this for detailed monitoring dashboards.

Example

curl http://localhost:8000/v1/health
{
  "status": "ok",
  "signature_count": 100,
  "engines": {
    "heuristic": "ready",
    "classifier": "ready",
    "semantic": "ready"
  },
  "uptime_secs": 3842
}

Response fields

status
string
Overall health status. ok when the server is running and all engines are ready.
signature_count
number
Total number of signatures currently loaded, including any custom signatures.
engines
object
Status of each detection engine. Values are ready or unavailable.
uptime_secs
number
Server uptime in seconds since the last start.

GET /v1/health/live

Liveness probe. Returns 200 OK if the server process is running. Use this to detect crashes and trigger restarts. This endpoint does not check whether the server is ready to handle scan requests — it only confirms the process is alive.

Example

curl http://localhost:8000/v1/health/live
Returns 200 OK when the process is running.

GET /v1/health/ready

Readiness probe. Returns 200 OK when the server has finished loading signatures and is ready to handle scan requests. Returns a non-200 status during startup or when the server is not ready.

Example

curl http://localhost:8000/v1/health/ready
Returns 200 OK when the server is ready to accept requests.

Kubernetes probe configuration

Use the liveness and readiness probes in your Kubernetes deployment to enable automatic restarts and traffic routing:
livenessProbe:
  httpGet:
    path: /v1/health/live
    port: 8000
  initialDelaySeconds: 5
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /v1/health/ready
    port: 8000
  initialDelaySeconds: 10
  periodSeconds: 5
Set a longer initialDelaySeconds on the readiness probe when running the Standard or Full profile, as ML model loading adds startup time.