Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/components/docs/Card.astro
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ const iconPaths: Record<string, string> = {
lightning: 'M13 10V3L4 14h7v7l9-11h-7z',
database: 'M4 7v10c0 2.21 3.582 4 8 4s8-1.79 8-4V7M4 7c0 2.21 3.582 4 8 4s8-1.79 8-4M4 7c0-2.21 3.582-4 8-4s8 1.79 8 4m0 5c0 2.21-3.582 4-8 4s-8-1.79-8-4',
shield: 'M9 12l2 2 4-4m5.618-4.016A11.955 11.955 0 0112 2.944a11.955 11.955 0 01-8.618 3.04A12.02 12.02 0 003 9c0 5.591 3.824 10.29 9 11.622 5.176-1.332 9-6.03 9-11.622 0-1.042-.133-2.052-.382-3.016z',
server: 'M5 12h14M5 12a2 2 0 01-2-2V6a2 2 0 012-2h14a2 2 0 012 2v4a2 2 0 01-2 2M5 12a2 2 0 00-2 2v4a2 2 0 002 2h14a2 2 0 002-2v-4a2 2 0 00-2-2m-2-4h.01M17 16h.01',
// Charts & Analytics (Mintlify: chart-mixed, chart-line)
'chart-mixed': 'M9 19v-6a2 2 0 00-2-2H5a2 2 0 00-2 2v6a2 2 0 002 2h2a2 2 0 002-2zm0 0V9a2 2 0 012-2h2a2 2 0 012 2v10m-6 0a2 2 0 002 2h2a2 2 0 002-2m0 0V5a2 2 0 012-2h2a2 2 0 012 2v14a2 2 0 01-2 2h-2a2 2 0 01-2-2z',
'chart-line': 'M7 12l3-3 3 3 4-4M8 21l4-4 4 4M3 4h18M4 4v16',
Expand Down
146 changes: 43 additions & 103 deletions src/pages/docs/index.mdx
Original file line number Diff line number Diff line change
@@ -1,118 +1,58 @@
---
title: "Future AGI Docs: AI Testing, Guardrails, and Monitoring"
description: "The complete platform to test, guard, and monitor AI agents. Build self-improving agents that ship smarter with every version."
title: "What is Future AGI?"
description: "Future AGI is the end-to-end platform for building, evaluating, observing, and making AI agents reliable"
---

![Future AGI platform](/images/agi2.webp)

The platform covers the full agent lifecycle across three stages: simulate and iterate on your agent before deployment, evaluate outputs and catch issues with 70+ metrics, then optimize and observe performance in production. All stages feed into each other: production traces inform evaluations, evaluations surface issues, and issues drive the next iteration.

Future AGI integrates with OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI, Vercel AI SDK, and 30+ more frameworks. You can start with a single line of code.

---

## Simulate & Iterate

Go from idea to production-ready agent faster. Simulate thousands of scenarios, iterate with the Agent IDE, and run structured experiments.

- **Simulation**: Run thousands of multi-turn conversations with synthetic users, personas, and branching scenarios. Test voice agents and chat agents before they reach real users.
- **Prototype**: Build AI application variants in the Agent IDE. Compare models, prompts, and configurations side by side with built-in evaluation.
- **Dataset**: Create golden datasets manually, import from files, or generate synthetic data. Use them across evaluations, simulations, and experiments.
- **Prompt**: Version prompts, deploy to environments via labels, and track how changes affect quality across traces.
- **Knowledge Base**: Upload documents that ground evaluations, power RAG testing, and provide context for synthetic data generation.

<CardGroup cols={3}>
<Card title="Simulation" icon="robot" href="/docs/simulation">
Scenarios, personas, synthetic users
</Card>
<Card title="Prototype" icon="flask" href="/docs/prototype">
Agent IDE, experiments, comparison
</Card>
<Card title="Dataset" icon="database" href="/docs/dataset">
Golden datasets, import, synthetic generation
</Card>
<Card title="Prompt" icon="wand-magic-sparkles" href="/docs/prompt">
Versioning, labels, environments
</Card>
<Card title="Knowledge Base" icon="brain" href="/docs/knowledge-base">
Document upload, RAG grounding
</Card>
</CardGroup>

---
Future AGI is an end-to-end platform for building **reliable AI agents**. It brings simulation, evaluation, guardrails, tracing, optimization, and an LLM gateway into one place, so the work of shipping a trustworthy agent, and keeping it trustworthy, happens in a single connected loop instead of across disconnected tools

## Evaluate
It's built for the whole team shipping AI (engineers, product managers, and domain experts working from one source of truth), and it works with the stack you already use. If you use it, we probably support it. You can start with a single line of code

Catch issues early. Run comprehensive evaluations across datasets, detect hallucinations, and protect your agents with real-time guardrails.
## The Learning Loop

- **Error Feed**: Sentry-style error tracking for AI agents. Errors are automatically surfaced, grouped, and triaged. See exactly where and why your agent failed, which traces are affected, and how many users were impacted.
- **Evaluation**: 70+ built-in metrics covering quality, safety, hallucination, faithfulness, toxicity, PII detection, and more. Create custom evals for domain-specific checks. Run on datasets in development or on production traces continuously.
- **Protect**: Real-time guardrails that intercept requests and responses before they reach users. Block hallucinations, PII leaks, and policy violations in production.
Every part of Future AGI feeds the next. You [**prototype**](/docs/prototype) and [**simulate**](/docs/simulation) an agent before launch, [**evaluate**](/docs/evaluation) its outputs against built-in and custom metrics, [**observe**](/docs/observe) real traffic once it's live, and [**optimize**](/docs/optimization) from what you learn, then the cycle repeats

<CardGroup cols={3}>
<Card title="Error Feed" icon="compass" href="/docs/error-feed">
Error tracking, triage, root cause
</Card>
<Card title="Evaluation" icon="chart-mixed" href="/docs/evaluation">
70+ metrics, custom evals, CI/CD
</Card>
<Card title="Protect" icon="shield" href="/docs/protect">
Real-time guardrails, PII blocking
</Card>
</CardGroup>
![Future AGI platform](/images/agi2.webp)

---
Because every product shares the same **traces, datasets, and scores**, the work compounds: a trace you capture becomes evaluation data, an evaluation result becomes an optimization signal, and a dataset feeds simulations and experiments alike. That shared spine is the mental model for everything below

## Optimize & Observe
{/* TODO: embed the Future AGI overview video here once the URL is ready. */}

Use production data to continuously improve your agents. Track performance in real-time, trace requests end-to-end, and get alerted before users complain.
## Explore the platform

- **Optimization**: Apply reinforcement learning from human feedback to automatically improve agent responses. The optimizer uses evaluation scores as reward signals to refine prompts without manual tuning.
- **Observability**: End-to-end tracing for every LLM call, retrieval, and tool invocation. Track costs by model, monitor latency percentiles, replay user sessions, and set up alerts for anomalies. Based on OpenTelemetry.
- **Agent Command Center**: Unified API gateway across 25+ LLM providers. Route requests with fallbacks, cache responses, enforce rate limits and budgets, and run shadow experiments. Drop-in replacement for the OpenAI SDK.
- **Falcon AI**: AI copilot with 300+ tools built into the dashboard. Analyze evaluation results, debug traces, create datasets, and chain multi-step workflows through natural language.
- **Annotations**: Human-in-the-loop labeling with annotation queues, custom scoring labels, and inter-annotator agreement tracking. Feed human judgments back into evaluations and optimization.
Future AGI is organized into six broad areas:

<CardGroup cols={3}>
<Card title="Optimization" icon="arrows-rotate" href="/docs/optimization">
RL from human feedback, auto-tuning
</Card>
<Card title="Observability" icon="chart-line" href="/docs/observe">
Tracing, costs, latency, alerts
</Card>
<Card title="Agent Command Center" icon="server" href="/docs/command-center">
Routing, caching, rate limits, 25+ providers
</Card>
<Card title="Falcon AI" icon="message-circle" href="/docs/falcon-ai">
AI copilot, 300+ tools, natural language
</Card>
<Card title="Annotations" icon="pen" href="/docs/annotations">
Labeling queues, scoring, agreement
</Card>
<Card title="Prototype your agent" icon="flask" href="/docs/prototype">Build and refine: Prototype, Agent Playground, Prompt, and Dataset</Card>
<Card title="Agent Command Center" icon="server" href="/docs/command-center">One gateway for routing, caching, guardrails, and cost control across 100+ providers</Card>
<Card title="Simulate" icon="robot" href="/docs/simulation">Test agents against synthetic users and scenarios before launch</Card>
<Card title="Evaluate" icon="chart-mixed" href="/docs/evaluation">Score quality with built-in and custom metrics, guardrails, knowledge bases, and human review</Card>
<Card title="Observe" icon="chart-line" href="/docs/observe">Trace production calls and surface failures in the Error Feed</Card>
<Card title="Optimize" icon="arrows-rotate" href="/docs/optimization">Improve prompts and agents from real production data</Card>
</CardGroup>

---

## Where to start

Setting up tracing, evaluation, and simulation can be done independently. Pick the path that matches where you are.

| Starting point | You want to... | Start here |
|---|---|---|
| **New to Future AGI** | Get a quick overview and make your first call | [Quickstart](/docs/quickstart/prompts) |
| **Building an agent** | Test with simulated users before deploying | [Simulation](/docs/simulation) |
| **Already in production** | See what's happening with your LLM calls | [Observability](/docs/observe) |
| **Evaluating quality** | Run evals on outputs and catch regressions | [Evaluation](/docs/evaluation) |
| **Managing multiple LLM providers** | Unify routing, caching, and cost controls behind one API | [Agent Command Center](/docs/command-center) |

<CardGroup cols={3}>
<Card title="Quickstart" icon="rocket" href="/docs/quickstart/prompts">
Make your first API call in under 5 minutes.
</Card>
<Card title="Integrations" icon="plug" href="/docs/integrations">
Connect OpenAI, LangChain, LlamaIndex, and 30+ more.
</Card>
<Card title="API Reference" icon="code" href="/docs/api">
Full API documentation for programmatic access.
</Card>
</CardGroup>
<div style={{ display: 'flex', alignItems: 'center', flexWrap: 'wrap', gap: '1.25rem', margin: '2rem 0', padding: '1.5rem 1.75rem', borderRadius: '16px', color: 'var(--color-text-primary)', border: '1px solid color-mix(in srgb, var(--color-accent-primary) 28%, transparent)', background: 'linear-gradient(135deg, color-mix(in srgb, var(--color-accent-primary) 8%, var(--color-bg-secondary)), var(--color-bg-secondary))', boxShadow: '0 8px 22px -16px color-mix(in srgb, var(--color-accent-primary) 20%, transparent)' }}>
<div style={{ flexShrink: 0, width: '52px', height: '52px', borderRadius: '14px', display: 'flex', alignItems: 'center', justifyContent: 'center', background: 'var(--color-accent-primary)' }}>
<svg width="28" height="28" viewBox="0 0 24 24" style={{ fill: 'none', stroke: '#0a0a0a', strokeWidth: 1.5, strokeLinecap: 'round', strokeLinejoin: 'round' }}><path d="M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z" /></svg>
</div>
<div style={{ flex: 1, minWidth: '200px' }}>
<div style={{ fontSize: '1.3rem', fontWeight: 700, marginBottom: '0.4rem', color: 'var(--color-text-primary)' }}>Build with Falcon</div>
<div style={{ fontSize: '0.9rem', color: 'var(--color-text-secondary)', lineHeight: 1.55 }}>Future AGI's Falcon across the whole platform: analyze evals, debug traces, build datasets, and run multi-step workflows in natural language</div>
</div>
<a href="/docs/falcon-ai" style={{ display: 'flex', flexDirection: 'row', justifyContent:'center', gap:'0.5rem', alignItems: 'center', padding: '0.7rem 1.35rem', borderRadius: '50px', fontWeight: 700, textDecoration: 'none', color: '#0a0a0a', background: 'var(--color-accent-primary)' }}>
<p style={{color:'black',marginBottom:0, fontWeight:'600'}}>Explore Falcon</p>
<svg width="16" height="16" viewBox="0 0 24 24" style={{ fill: 'none', stroke: '#0a0a0a', strokeWidth: 1.5, strokeLinecap: 'round', strokeLinejoin: 'round' }}><path d="M5 12h14M13 6l6 6-6 6" /></svg>
</a>
</div>

## Bring your data in

The fastest way to see Future AGI is to get your data flowing:

- [Send your first trace](/docs/get-started/send-your-first-trace)
- [Route your first LLM request](/docs/get-started/route-your-first-llm-request)
- [Add your first agent definition](/docs/get-started/add-your-first-agent-definition)
- [Create your first prompt](/docs/get-started/create-your-first-prompt)

<Note>
**Using Cursor or Claude Code?** Install the Future AGI MCP server to bring the platform and docs straight into your editor. See [Set up the MCP server](/docs/quickstart/setup-mcp-server)
</Note>