InferenceWall Middleware for LangChain

The LangChain integration gives you two ways to add InferenceWall scanning to any LangChain chat model: a simple wrapper function that handles the scan-call-scan loop inline, and an InferenceWallCallback handler that you call explicitly before and after the LLM invocation. Both approaches scan user input before it reaches the LLM and scan the LLM output before it reaches your user.

Install

pip install inferwall langchain langchain-openai

Approach 1: Wrapper function

guarded_chat is a drop-in wrapper that scans the input, calls your LangChain model (or runs in demo mode without one), and scans the output — all in a single function call.

Scan input and block if needed

guarded_chat calls inferwall.scan_input() before touching the model. Blocked requests return a string message immediately; flagged requests log a warning and proceed.

import inferwall

input_scan = inferwall.scan_input(prompt)

if input_scan.decision == "block":
    sigs = ", ".join(m["signature_id"] for m in input_scan.matches)
    return f"[BLOCKED] Input rejected by security policy. Matched: {sigs}"

if input_scan.decision == "flag":
    print(f"[WARNING] Input flagged (score={input_scan.score}), proceeding...")

Call your LangChain model

Pass your chat model as the second argument. If you omit it, the function runs in demo mode with a simulated response.

from langchain_core.messages import HumanMessage

if chat_model is not None:
    response = chat_model.invoke([HumanMessage(content=prompt)])
    output_text = response.content
else:
    # Demo mode — simulated LLM response
    output_text = f"Here is my response to: {prompt}"

Scan output and block if needed

output_scan = inferwall.scan_output(output_text)

if output_scan.decision == "block":
    sigs = ", ".join(m["signature_id"] for m in output_scan.matches)
    return f"[BLOCKED] Output contained sensitive data. Matched: {sigs}"

if output_scan.decision == "flag":
    print(f"[WARNING] Output flagged (score={output_scan.score})")

return output_text

Approach 2: Callback handler

InferenceWallCallback gives you explicit control. Call guard.on_input() before invoking the model and guard.on_output() on the result. Both methods raise ValueError when the content is blocked, so your existing exception-handling code catches them naturally.

Instantiate the handler

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

guard = InferenceWallCallback()
llm = ChatOpenAI(model="gpt-4o-mini")

Pass block_on_flag=True if you want flagged content treated as blocked:

guard = InferenceWallCallback(block_on_flag=True)

Scan input before invoking the model

on_input returns the original text unchanged so you can chain the call. It raises ValueError if InferenceWall blocks the prompt.

prompt = "What is the weather?"
guard.on_input(prompt)  # Raises if blocked

After the call, guard.last_input_scan holds the full ScanResponse for logging or auditing.

Invoke the model and scan the output

response = llm.invoke([HumanMessage(content=prompt)])
safe_output = guard.on_output(response.content)  # Raises if blocked

guard.last_output_scan holds the output scan result.

Complete example

from __future__ import annotations

import inferwall


# --- Option 1: Simple wrapper function ---


def guarded_chat(prompt: str, chat_model: object = None) -> str:
    """Scan input, call LLM, scan output, return safe response."""

    # Scan user input
    input_scan = inferwall.scan_input(prompt)

    if input_scan.decision == "block":
        sigs = ", ".join(m["signature_id"] for m in input_scan.matches)
        return f"[BLOCKED] Input rejected by security policy. Matched: {sigs}"

    if input_scan.decision == "flag":
        print(f"[WARNING] Input flagged (score={input_scan.score}), proceeding...")

    # Call LLM (replace with your actual LangChain call)
    if chat_model is not None:
        from langchain_core.messages import HumanMessage

        response = chat_model.invoke([HumanMessage(content=prompt)])
        output_text = response.content
    else:
        # Demo mode — simulated LLM response
        output_text = f"Here is my response to: {prompt}"

    # Scan LLM output
    output_scan = inferwall.scan_output(output_text)

    if output_scan.decision == "block":
        sigs = ", ".join(m["signature_id"] for m in output_scan.matches)
        return f"[BLOCKED] Output contained sensitive data. Matched: {sigs}"

    if output_scan.decision == "flag":
        print(f"[WARNING] Output flagged (score={output_scan.score})")

    return output_text


# --- Option 2: LangChain callback handler ---


class InferenceWallCallback:
    """LangChain callback that scans inputs and outputs."""

    def __init__(self, block_on_flag: bool = False) -> None:
        self.block_on_flag = block_on_flag
        self.last_input_scan: inferwall.ScanResponse | None = None
        self.last_output_scan: inferwall.ScanResponse | None = None

    def on_input(self, text: str) -> str:
        """Scan input. Raises ValueError if blocked."""
        self.last_input_scan = inferwall.scan_input(text)

        if self.last_input_scan.decision == "block":
            raise ValueError(
                f"Input blocked by InferenceWall "
                f"(score={self.last_input_scan.score}, "
                f"matches={[m['signature_id'] for m in self.last_input_scan.matches]})"
            )

        if self.block_on_flag and self.last_input_scan.decision == "flag":
            raise ValueError(
                f"Input flagged by InferenceWall "
                f"(score={self.last_input_scan.score})"
            )

        return text

    def on_output(self, text: str) -> str:
        """Scan output. Raises ValueError if blocked."""
        self.last_output_scan = inferwall.scan_output(text)

        if self.last_output_scan.decision == "block":
            raise ValueError(
                f"Output blocked by InferenceWall "
                f"(score={self.last_output_scan.score}, "
                f"matches={[m['signature_id'] for m in self.last_output_scan.matches]})"
            )

        return text

You can access guard.last_input_scan and guard.last_output_scan after each call to retrieve the full ScanResponse for logging, metrics, or tracing.

Agent compatibility

The callback handler approach works with LangChain agents: call guard.on_input() before invoking the agent and wrap tool outputs with guard.on_output() before passing them back into the agent loop. The wrapper function is better suited to simple chain or single-turn invocations.

​Install

​Approach 1: Wrapper function

​Approach 2: Callback handler

​Complete example

​Agent compatibility

​Related

Install

Approach 1: Wrapper function

Approach 2: Callback handler

Complete example

Agent compatibility

Related