Introduce Guardrails

# Guardrails Concept

---

# What is a Guardrail?

- A mechanism that checks inputs and outputs against specific criteria
- When anomalies are detected, the call can be stopped, reported, and/or modified
- Guardrails can be deployed in various components of an agentic platform:
    - AI Gateway (Model Router): Check LLM calls
    - Agent Gateway: Check incoming requests to agents or outgoing final responses from agents
    - Tool Gateway: Check data passed from LLMs/agents to tools
- Examples of Guardrails
    - PII Protection: Detects when personal data is present
    - Secrets
    - Toxic Language
    - More Guardrails: https://guardrailsai.com/hub

```mermaid
sequenceDiagram
    autonumber
    participant A as Agent
    box DarkGreen Guardrails
      participant IV as InputGuard
      participant OF as OutputGuard
    end
    participant MR as AI Gateway (Model Router)

    A->>IV: Raw input
    activate IV
    IV->>MR: Validated and filtered input
    deactivate IV

    activate MR
    MR-->>OF: Raw output
    deactivate MR

    activate OF
    OF-->>A: Validated and filtered output
    deactivate OF

```

# How is a Guardrail defined?

- Guardrails consist of multiple Guards
- There are InputGuards and OutputGuards
- For a request, there is exactly one Guardrail, consisting of a list of Guards. The list should be processed in order.
- A Guard can return a modified (sanitized) input
- A Guard can throw an error that aborts the LLM call
    - TODO: How do we cleanly return the error?
- Input and output is always text for now, possibly streamed

# Solution Approaches

## Proxy

```mermaid
graph LR

      Agent <--> Guardrails
      Guardrails <--> MR

    Guardrails["AI Gateway
    (Guardrail Proxy)"]
		MR["AI Gateway
		(Model Router)"]

    %% Styling
    style Agent fill:#fff4e1
    style MR fill:#ffe1f5
    style Guardrails fill:#D2FCD6

```

- Separate deployment instances for separate responsibilities (Single Responsibility Principle)
- Independent of the gateway implementation used
- Dependent on the protocol
- Additional resource usage
- More complex infrastructure

### Possible Implementations in Kubernetes

- Separate reverse proxy in front of the gateway (own deployment)
    - This is how we already implemented the Agent Gateway (KrakenD-based)
- Sidecar container
    - This is how solutions like Envoy do it
    - A reverse proxy is still needed, but it runs alongside each container (Agent? or next to a Gateway?)
    - Resource usage is correspondingly high if a container runs for each agent
    - Challenge: If a service mesh is installed simultaneously that also wants to use sidecar containers, things get complicated
    - Consideration: If such a solution is targeted, one could also build on Envoy. This is how kgateway does it.
    - One could potentially also build on kgateway. agentgateway integrates there (though they are not fully independent of each other)
- eBPF filter -> not suitable, as processing the entire request/response body cannot easily run in the kernel due to limited resources

## Gateway-Integrated

Using the AI Gateway as an example: Guardrails are configured within the AI Gateway.

```mermaid
graph LR

      Agent <--> MR

		MR[AI Gateway]

    %% Styling
    style Agent fill:#fff4e1
    style MR fill:#ffe1f5

```

- There is **one** AI Gateway (KrakenD/LiteLLM/...) with **one** config for model routing and guardrails
- Simpler debugging and maintenance; Lower complexity for users; No network hop
- High dependency on the specific implementation. Requires that the AI Gateway supports guardrails and that they can be configured as generically as possible.

# Existing Solutions

## Guardrails

- https://github.com/guardrails-ai/guardrails
- https://protectai.github.io/llm-guard/get_started/quickstart/

## Gateways with Guardrail Support

- LiteLLM - https://docs.litellm.ai/docs/proxy/guardrails/quick_start
- agentgateway - https://github.com/agentgateway/agentgateway/issues/378
- [LiteLLM Docs](https://docs.litellm.ai/docs/proxy/guardrails/quick_start) Guardrails (configured via config file)
- [Traefik AI Gateway Docs](https://doc.traefik.io/traefik-hub/ai-gateway/middlewares/content-guard) Guardrails (configured via CRD)
- [Example Kong AI Gateway Guardrail Config](https://developer.konghq.com/plugins/ai-azure-content-safety/examples/block-predefined-categories/) (configured via HTTP request)
- [PortKey Guardrail Config Docs](https://portkey.ai/docs/api-reference/admin-api/control-plane/guardrails/create-guardrail) (configured via HTTP request)

# Implementation

## Requirements

- Guardrails from various vendors / libraries can be used
    - We explicitly do not want to (only) offer our own guardrails, but rather build on existing solutions that can be integrated
- Guardrails / Guards can be defined via CRDs independently of the implementation
    - Implementation == Gateway/Proxy that executes the guards; not the guardrail provider

## Brainstorming

- From agentgateway: We support currently regex, builtin filters, a custom webhook protocol, and OpenAI moderation API
- First protocols seem to be emerging. Perhaps we can build on them?
- How do we manage to integrate libraries? This probably requires a network protocol. One could look at the webhook protocol from agentgateway and deploy guards as containers

## Definition of Guards

- Available protocols
    - OpenAI moderation API - [https://platform.openai.com/docs/guides/moderation](https://platform.openai.com/docs/guides/moderation)
    - Bedrock - https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ApplyGuardrail.html
    - agentgateway / Envoy - https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter, https://github.com/agentgateway/agentgateway/issues/378

## Instantiating Guardrails

As outlined above, various approaches are conceivable. They are not mutually exclusive and could be used depending on the gateway.

- Configure the gateway accordingly
    - With LiteLLM, a corresponding config could be generated and the guardrails could run within LiteLLM
        - Custom guardrails (and providers) that are not natively supported could be added via https://docs.litellm.ai/docs/proxy/guardrails/custom_guardrail as a Python file mounted into the gateway (a small wrapper with an HTTP call to an externally running pod could also be built there)
    - This would also be conceivable with agentgateway: https://github.com/agentgateway/agentgateway/issues/378
- Create a reverse proxy that calls the guardrails

## Custom Resource Definitions

Rough draft of the CRDs. The following ideas:

- Separation of provider (remote service or locally running deployment) and guard
- Multiple guards can point to the same provider
- Assumption: Guards are identifiable by a name and optionally a version across all providers
    - Open: Parameterization of guards -> probably happens in the provider
- Guards are attached to the AI Gateway (later possibly also to other gateways)
    - Open: Also individually per agent? -> this would probably be an optional extension and would work analogously to sub-agents and tools (URL, etc. passed via env var and implemented in the framework)
- A provider speaks a specific protocol. This can be the provider's protocol or a more general one. Which standard APIs emerge remains to be seen. The goal is not to support all providers right from the start, but only those that have a public API and are already widely adopted.
- Unlike ToolServers, the design here is somewhat different and could also be interesting for ToolServers: Instead of passing many parameters from the CRD through to a deployment, the deployment is created separately and only referenced via a Kubernetes Service.

```yaml
apiVersion: runtime.agentic-layer.ai/v1alpha1
kind: AiGateway
metadata:
  name: ai-gateway
  namespace: default
spec:
  aiModels:
    - provider: openai
      name: gpt-3.5-turbo
    - provider: gemini
      name: gemini-2.5-flash
  guardrails:
    # Guard from this namespace
    - name: pii-guard
      namespace: default
    # All guards from the guardrails namespace (?) -> how to ensure order of guards?
    - kind: Guard
      namespace: guardrails
---
apiVersion: runtime.agentic-layer.ai/v1alpha1
kind: Guard
metadata:
  name: pii-guard
  namespace: default
spec:
  # Name of the guard at the provider
  name: pii-detection
  # Version of the guard at the provider (if supported)
  version: "1.0.0"
  # When to run the guard. Options: pre_call, post_call, during_call
  mode: pre_call
  # Description of the guard's purpose (for documentation purposes only)
  description: Guardrail to detect and handle Personally Identifiable Information (PII) in user
  # Reference to the GuardrailProvider that hosts this guard
  providerRef:
    name: custom-openai
    namespace: default
---
apiVersion: runtime.agentic-layer.ai/v1alpha1
kind: GuardrailProvider
metadata:
  name: custom-openai
spec:
  # Which guardrail protocol to use
  protocol: openai-moderation # https://platform.openai.com/docs/guides/moderation
  transportType: http # http, grpc, envoy-exec
  # Custom backend service (kubernetes service) to route requests to
  backendRef:
    name: custom-openai-guardrail-backend
    namespace: default
    port: 80
---
apiVersion: runtime.agentic-layer.ai/v1alpha1
kind: GuardrailProvider
metadata:
  name: bedrock
spec:
  protocol: bedrock # https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ApplyGuardrail.html
---

```

## Implementation Details for LiteLLM Gateway

### Operator

New reconciliation steps in the AI Gateway Operator:

1. Quick check whether all referenced guardrails exist
2. Optionally a quick syntax check of the guardrail spec (e.g., to verify that all guardrail plugins for LiteLLM exist)
3. Merging of the preliminary LiteLLM configuration with the referenced guardrail configs
4. Merging of the preliminary LiteLLM env vars with the referenced guardrail env vars

Important note: The guardrails must always be explicitly added with `default_on: true` in the LiteLLM config, otherwise users would have to explicitly enable all guardrails with every LLM call.

The AIGateway-LiteLLM operator must be explicitly configured to watch for changes to guardrails.

### Prompt Injection Guardrail (Special Case)

LiteLLM has a config setting to block prompt injections, but this is not a LiteLLM guardrail.

-> Consider an alternative solution

### Guardrail Metrics

Publish guardrail metrics about approved/denied requests to Prometheus

- A log-based metric for DeniedRequests should be created based on the LiteLLM logs.
- Background: LiteLLM can send per-request logs to an OTEL Collector. These logs contain standard information about applied guardrails and whether the request went through. This information would need to be parsed and aggregated.
- See
    - [https://docs.litellm.ai/docs/proxy/logging](https://docs.litellm.ai/docs/proxy/logging) and
    - [https://docs.litellm.ai/docs/proxy/logging_spec#standardloggingguardrailinformation](https://docs.litellm.ai/docs/proxy/logging_spec#standardloggingguardrailinformation)
- This metric should be visible in Prometheus


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Guardrails #61

Guardrails Concept

What is a Guardrail?

How is a Guardrail defined?

Solution Approaches

Proxy

Possible Implementations in Kubernetes

Gateway-Integrated

Existing Solutions

Guardrails

Gateways with Guardrail Support

Implementation

Requirements

Brainstorming

Definition of Guards

Instantiating Guardrails

Custom Resource Definitions

Implementation Details for LiteLLM Gateway

Operator

Prompt Injection Guardrail (Special Case)

Guardrail Metrics

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Introduce Guardrails #61

Description

Guardrails Concept

What is a Guardrail?

How is a Guardrail defined?

Solution Approaches

Proxy

Possible Implementations in Kubernetes

Gateway-Integrated

Existing Solutions

Guardrails

Gateways with Guardrail Support

Implementation

Requirements

Brainstorming

Definition of Guards

Instantiating Guardrails

Custom Resource Definitions

Implementation Details for LiteLLM Gateway

Operator

Prompt Injection Guardrail (Special Case)

Guardrail Metrics

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions