The OpenAI integration wraps client.chat.completions.create() with two InferenceWall scanning checkpoints: one before the prompt reaches OpenAI to catch prompt injection and jailbreaks, and one before the response reaches your user to catch PII, API keys, and other sensitive data leakage.
Install
pip install inferwall openai
Steps
Scan the input before calling OpenAI
Call inferwall.scan_input() with the user’s prompt. Check decision and return early if the request is blocked.import inferwall
input_scan = inferwall.scan_input(prompt)
if input_scan.decision == "block":
return GuardedResponse(
content="[BLOCKED] Request blocked by security policy.",
decision="block",
input_score=input_scan.score,
output_score=0.0,
matched_signatures=[
m["signature_id"] for m in input_scan.matches
],
)
The ScanResponse object exposes three fields you’ll use most:| Field | Type | Description |
|---|
decision | str | "allow", "flag", or "block" |
score | float | Aggregate anomaly score across all detection layers |
matches | list[dict] | Matched signatures, each with a signature_id key |
Call OpenAI for allowed requests
If the input passes, forward the prompt to OpenAI as normal.from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt},
],
)
output_text = completion.choices[0].message.content or ""
Scan the output before returning it
Call inferwall.scan_output() on the LLM’s response. Block the reply if it contains sensitive data.output_scan = inferwall.scan_output(output_text)
if output_scan.decision == "block":
return GuardedResponse(
content="[BLOCKED] Response contained sensitive data.",
decision="block",
input_score=input_scan.score,
output_score=output_scan.score,
matched_signatures=[
m["signature_id"] for m in output_scan.matches
],
)
Handle flag decisions
A "flag" decision means InferenceWall detected suspicious content but did not meet the block threshold. Use flags for logging, alerting, or human review — or promote them to blocks by treating "flag" the same as "block" in your conditional.# Combine decisions — flag if either scan flagged
decision = "allow"
if input_scan.decision == "flag" or output_scan.decision == "flag":
decision = "flag"
Complete example
from __future__ import annotations
from dataclasses import dataclass, field
import inferwall
@dataclass
class GuardedResponse:
"""Response from a guarded LLM call."""
content: str
decision: str # "allow", "flag", "block"
input_score: float
output_score: float
matched_signatures: list[str] = field(default_factory=list)
def guarded_openai_chat(
prompt: str,
model: str = "gpt-4o-mini",
system_prompt: str = "You are a helpful assistant.",
) -> GuardedResponse:
"""Call OpenAI with InferenceWall scanning on input and output."""
# Step 1: Scan input
input_scan = inferwall.scan_input(prompt)
if input_scan.decision == "block":
return GuardedResponse(
content="[BLOCKED] Request blocked by security policy.",
decision="block",
input_score=input_scan.score,
output_score=0.0,
matched_signatures=[
m["signature_id"] for m in input_scan.matches
],
)
# Step 2: Call OpenAI
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt},
],
)
output_text = completion.choices[0].message.content or ""
# Step 3: Scan output
output_scan = inferwall.scan_output(output_text)
if output_scan.decision == "block":
return GuardedResponse(
content="[BLOCKED] Response contained sensitive data.",
decision="block",
input_score=input_scan.score,
output_score=output_scan.score,
matched_signatures=[
m["signature_id"] for m in output_scan.matches
],
)
# Combine decisions
decision = "allow"
if input_scan.decision == "flag" or output_scan.decision == "flag":
decision = "flag"
return GuardedResponse(
content=output_text,
decision=decision,
input_score=input_scan.score,
output_score=output_scan.score,
matched_signatures=[
m["signature_id"]
for m in input_scan.matches + output_scan.matches
],
)
What gets blocked
InferenceWall applies different signature sets to inputs and outputs.
On input, InferenceWall checks for:
- Prompt injection (
Ignore all previous instructions…)
- Jailbreak attempts (DAN, persona hijacking, role-play bypasses)
- Obfuscated payloads (base64-encoded instructions, homoglyphs, ROT13)
On output, InferenceWall checks for:
- PII (email addresses, phone numbers, national ID numbers)
- Credentials and API keys (
sk-…, AKIA…, private keys)
- Training data exfiltration patterns