Skip to main content
The OpenAI integration wraps client.chat.completions.create() with two InferenceWall scanning checkpoints: one before the prompt reaches OpenAI to catch prompt injection and jailbreaks, and one before the response reaches your user to catch PII, API keys, and other sensitive data leakage.

Install

pip install inferwall openai

Steps

1

Scan the input before calling OpenAI

Call inferwall.scan_input() with the user’s prompt. Check decision and return early if the request is blocked.
import inferwall

input_scan = inferwall.scan_input(prompt)

if input_scan.decision == "block":
    return GuardedResponse(
        content="[BLOCKED] Request blocked by security policy.",
        decision="block",
        input_score=input_scan.score,
        output_score=0.0,
        matched_signatures=[
            m["signature_id"] for m in input_scan.matches
        ],
    )
The ScanResponse object exposes three fields you’ll use most:
FieldTypeDescription
decisionstr"allow", "flag", or "block"
scorefloatAggregate anomaly score across all detection layers
matcheslist[dict]Matched signatures, each with a signature_id key
2

Call OpenAI for allowed requests

If the input passes, forward the prompt to OpenAI as normal.
from openai import OpenAI

client = OpenAI()
completion = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt},
    ],
)
output_text = completion.choices[0].message.content or ""
3

Scan the output before returning it

Call inferwall.scan_output() on the LLM’s response. Block the reply if it contains sensitive data.
output_scan = inferwall.scan_output(output_text)

if output_scan.decision == "block":
    return GuardedResponse(
        content="[BLOCKED] Response contained sensitive data.",
        decision="block",
        input_score=input_scan.score,
        output_score=output_scan.score,
        matched_signatures=[
            m["signature_id"] for m in output_scan.matches
        ],
    )
4

Handle flag decisions

A "flag" decision means InferenceWall detected suspicious content but did not meet the block threshold. Use flags for logging, alerting, or human review — or promote them to blocks by treating "flag" the same as "block" in your conditional.
# Combine decisions — flag if either scan flagged
decision = "allow"
if input_scan.decision == "flag" or output_scan.decision == "flag":
    decision = "flag"

Complete example

from __future__ import annotations

from dataclasses import dataclass, field

import inferwall


@dataclass
class GuardedResponse:
    """Response from a guarded LLM call."""

    content: str
    decision: str  # "allow", "flag", "block"
    input_score: float
    output_score: float
    matched_signatures: list[str] = field(default_factory=list)


def guarded_openai_chat(
    prompt: str,
    model: str = "gpt-4o-mini",
    system_prompt: str = "You are a helpful assistant.",
) -> GuardedResponse:
    """Call OpenAI with InferenceWall scanning on input and output."""

    # Step 1: Scan input
    input_scan = inferwall.scan_input(prompt)

    if input_scan.decision == "block":
        return GuardedResponse(
            content="[BLOCKED] Request blocked by security policy.",
            decision="block",
            input_score=input_scan.score,
            output_score=0.0,
            matched_signatures=[
                m["signature_id"] for m in input_scan.matches
            ],
        )

    # Step 2: Call OpenAI
    from openai import OpenAI

    client = OpenAI()
    completion = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt},
        ],
    )
    output_text = completion.choices[0].message.content or ""

    # Step 3: Scan output
    output_scan = inferwall.scan_output(output_text)

    if output_scan.decision == "block":
        return GuardedResponse(
            content="[BLOCKED] Response contained sensitive data.",
            decision="block",
            input_score=input_scan.score,
            output_score=output_scan.score,
            matched_signatures=[
                m["signature_id"] for m in output_scan.matches
            ],
        )

    # Combine decisions
    decision = "allow"
    if input_scan.decision == "flag" or output_scan.decision == "flag":
        decision = "flag"

    return GuardedResponse(
        content=output_text,
        decision=decision,
        input_score=input_scan.score,
        output_score=output_scan.score,
        matched_signatures=[
            m["signature_id"]
            for m in input_scan.matches + output_scan.matches
        ],
    )

What gets blocked

InferenceWall applies different signature sets to inputs and outputs.
On input, InferenceWall checks for:
  • Prompt injection (Ignore all previous instructions…)
  • Jailbreak attempts (DAN, persona hijacking, role-play bypasses)
  • Obfuscated payloads (base64-encoded instructions, homoglyphs, ROT13)
On output, InferenceWall checks for:
  • PII (email addresses, phone numbers, national ID numbers)
  • Credentials and API keys (sk-…, AKIA…, private keys)
  • Training data exfiltration patterns