You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sequenceDiagram
autonumber
participant A as Agent
box DarkGreen Guardrails
participant IV as InputGuard
participant OF as OutputGuard
end
participant MR as AI Gateway (Model Router)
A->>IV: Raw input
activate IV
IV->>MR: Validated and filtered input
deactivate IV
activate MR
MR-->>OF: Raw output
deactivate MR
activate OF
OF-->>A: Validated and filtered output
deactivate OF
Loading
How is a Guardrail defined?
Guardrails consist of multiple Guards
There are InputGuards and OutputGuards
For a request, there is exactly one Guardrail, consisting of a list of Guards. The list should be processed in order.
A Guard can return a modified (sanitized) input
A Guard can throw an error that aborts the LLM call
TODO: How do we cleanly return the error?
Input and output is always text for now, possibly streamed
There is one AI Gateway (KrakenD/LiteLLM/...) with one config for model routing and guardrails
Simpler debugging and maintenance; Lower complexity for users; No network hop
High dependency on the specific implementation. Requires that the AI Gateway supports guardrails and that they can be configured as generically as possible.
Guardrails from various vendors / libraries can be used
We explicitly do not want to (only) offer our own guardrails, but rather build on existing solutions that can be integrated
Guardrails / Guards can be defined via CRDs independently of the implementation
Implementation == Gateway/Proxy that executes the guards; not the guardrail provider
Brainstorming
From agentgateway: We support currently regex, builtin filters, a custom webhook protocol, and OpenAI moderation API
First protocols seem to be emerging. Perhaps we can build on them?
How do we manage to integrate libraries? This probably requires a network protocol. One could look at the webhook protocol from agentgateway and deploy guards as containers
As outlined above, various approaches are conceivable. They are not mutually exclusive and could be used depending on the gateway.
Configure the gateway accordingly
With LiteLLM, a corresponding config could be generated and the guardrails could run within LiteLLM
Custom guardrails (and providers) that are not natively supported could be added via https://docs.litellm.ai/docs/proxy/guardrails/custom_guardrail as a Python file mounted into the gateway (a small wrapper with an HTTP call to an externally running pod could also be built there)
Separation of provider (remote service or locally running deployment) and guard
Multiple guards can point to the same provider
Assumption: Guards are identifiable by a name and optionally a version across all providers
Open: Parameterization of guards -> probably happens in the provider
Guards are attached to the AI Gateway (later possibly also to other gateways)
Open: Also individually per agent? -> this would probably be an optional extension and would work analogously to sub-agents and tools (URL, etc. passed via env var and implemented in the framework)
A provider speaks a specific protocol. This can be the provider's protocol or a more general one. Which standard APIs emerge remains to be seen. The goal is not to support all providers right from the start, but only those that have a public API and are already widely adopted.
Unlike ToolServers, the design here is somewhat different and could also be interesting for ToolServers: Instead of passing many parameters from the CRD through to a deployment, the deployment is created separately and only referenced via a Kubernetes Service.
apiVersion: runtime.agentic-layer.ai/v1alpha1kind: AiGatewaymetadata:
name: ai-gatewaynamespace: defaultspec:
aiModels:
- provider: openainame: gpt-3.5-turbo
- provider: gemininame: gemini-2.5-flashguardrails:
# Guard from this namespace
- name: pii-guardnamespace: default# All guards from the guardrails namespace (?) -> how to ensure order of guards?
- kind: Guardnamespace: guardrails
---
apiVersion: runtime.agentic-layer.ai/v1alpha1kind: Guardmetadata:
name: pii-guardnamespace: defaultspec:
# Name of the guard at the providername: pii-detection# Version of the guard at the provider (if supported)version: "1.0.0"# When to run the guard. Options: pre_call, post_call, during_callmode: pre_call# Description of the guard's purpose (for documentation purposes only)description: Guardrail to detect and handle Personally Identifiable Information (PII) in user# Reference to the GuardrailProvider that hosts this guardproviderRef:
name: custom-openainamespace: default
---
apiVersion: runtime.agentic-layer.ai/v1alpha1kind: GuardrailProvidermetadata:
name: custom-openaispec:
# Which guardrail protocol to useprotocol: openai-moderation # https://platform.openai.com/docs/guides/moderationtransportType: http # http, grpc, envoy-exec# Custom backend service (kubernetes service) to route requests tobackendRef:
name: custom-openai-guardrail-backendnamespace: defaultport: 80
---
apiVersion: runtime.agentic-layer.ai/v1alpha1kind: GuardrailProvidermetadata:
name: bedrockspec:
protocol: bedrock # https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ApplyGuardrail.html
---
Implementation Details for LiteLLM Gateway
Operator
New reconciliation steps in the AI Gateway Operator:
Quick check whether all referenced guardrails exist
Optionally a quick syntax check of the guardrail spec (e.g., to verify that all guardrail plugins for LiteLLM exist)
Merging of the preliminary LiteLLM configuration with the referenced guardrail configs
Merging of the preliminary LiteLLM env vars with the referenced guardrail env vars
Important note: The guardrails must always be explicitly added with default_on: true in the LiteLLM config, otherwise users would have to explicitly enable all guardrails with every LLM call.
The AIGateway-LiteLLM operator must be explicitly configured to watch for changes to guardrails.
Prompt Injection Guardrail (Special Case)
LiteLLM has a config setting to block prompt injections, but this is not a LiteLLM guardrail.
-> Consider an alternative solution
Guardrail Metrics
Publish guardrail metrics about approved/denied requests to Prometheus
A log-based metric for DeniedRequests should be created based on the LiteLLM logs.
Background: LiteLLM can send per-request logs to an OTEL Collector. These logs contain standard information about applied guardrails and whether the request went through. This information would need to be parsed and aggregated.
Guardrails Concept
What is a Guardrail?
sequenceDiagram autonumber participant A as Agent box DarkGreen Guardrails participant IV as InputGuard participant OF as OutputGuard end participant MR as AI Gateway (Model Router) A->>IV: Raw input activate IV IV->>MR: Validated and filtered input deactivate IV activate MR MR-->>OF: Raw output deactivate MR activate OF OF-->>A: Validated and filtered output deactivate OFHow is a Guardrail defined?
Solution Approaches
Proxy
graph LR Agent <--> Guardrails Guardrails <--> MR Guardrails["AI Gateway (Guardrail Proxy)"] MR["AI Gateway (Model Router)"] %% Styling style Agent fill:#fff4e1 style MR fill:#ffe1f5 style Guardrails fill:#D2FCD6Possible Implementations in Kubernetes
Gateway-Integrated
Using the AI Gateway as an example: Guardrails are configured within the AI Gateway.
graph LR Agent <--> MR MR[AI Gateway] %% Styling style Agent fill:#fff4e1 style MR fill:#ffe1f5Existing Solutions
Guardrails
Gateways with Guardrail Support
Implementation
Requirements
Brainstorming
Definition of Guards
Instantiating Guardrails
As outlined above, various approaches are conceivable. They are not mutually exclusive and could be used depending on the gateway.
Custom Resource Definitions
Rough draft of the CRDs. The following ideas:
Implementation Details for LiteLLM Gateway
Operator
New reconciliation steps in the AI Gateway Operator:
Important note: The guardrails must always be explicitly added with
default_on: truein the LiteLLM config, otherwise users would have to explicitly enable all guardrails with every LLM call.The AIGateway-LiteLLM operator must be explicitly configured to watch for changes to guardrails.
Prompt Injection Guardrail (Special Case)
LiteLLM has a config setting to block prompt injections, but this is not a LiteLLM guardrail.
-> Consider an alternative solution
Guardrail Metrics
Publish guardrail metrics about approved/denied requests to Prometheus