diff --git a/src/components/docs/Card.astro b/src/components/docs/Card.astro index ee482b87..07096e01 100644 --- a/src/components/docs/Card.astro +++ b/src/components/docs/Card.astro @@ -16,6 +16,7 @@ const iconPaths: Record = { lightning: 'M13 10V3L4 14h7v7l9-11h-7z', database: 'M4 7v10c0 2.21 3.582 4 8 4s8-1.79 8-4V7M4 7c0 2.21 3.582 4 8 4s8-1.79 8-4M4 7c0-2.21 3.582-4 8-4s8 1.79 8 4m0 5c0 2.21-3.582 4-8 4s-8-1.79-8-4', shield: 'M9 12l2 2 4-4m5.618-4.016A11.955 11.955 0 0112 2.944a11.955 11.955 0 01-8.618 3.04A12.02 12.02 0 003 9c0 5.591 3.824 10.29 9 11.622 5.176-1.332 9-6.03 9-11.622 0-1.042-.133-2.052-.382-3.016z', + server: 'M5 12h14M5 12a2 2 0 01-2-2V6a2 2 0 012-2h14a2 2 0 012 2v4a2 2 0 01-2 2M5 12a2 2 0 00-2 2v4a2 2 0 002 2h14a2 2 0 002-2v-4a2 2 0 00-2-2m-2-4h.01M17 16h.01', // Charts & Analytics (Mintlify: chart-mixed, chart-line) 'chart-mixed': 'M9 19v-6a2 2 0 00-2-2H5a2 2 0 00-2 2v6a2 2 0 002 2h2a2 2 0 002-2zm0 0V9a2 2 0 012-2h2a2 2 0 012 2v10m-6 0a2 2 0 002 2h2a2 2 0 002-2m0 0V5a2 2 0 012-2h2a2 2 0 012 2v14a2 2 0 01-2 2h-2a2 2 0 01-2-2z', 'chart-line': 'M7 12l3-3 3 3 4-4M8 21l4-4 4 4M3 4h18M4 4v16', diff --git a/src/pages/docs/index.mdx b/src/pages/docs/index.mdx index 0ea10c94..c02a4138 100644 --- a/src/pages/docs/index.mdx +++ b/src/pages/docs/index.mdx @@ -1,118 +1,58 @@ --- -title: "Future AGI Docs: AI Testing, Guardrails, and Monitoring" -description: "The complete platform to test, guard, and monitor AI agents. Build self-improving agents that ship smarter with every version." +title: "What is Future AGI?" +description: "Future AGI is the end-to-end platform for building, evaluating, observing, and making AI agents reliable" --- -![Future AGI platform](/images/agi2.webp) - -The platform covers the full agent lifecycle across three stages: simulate and iterate on your agent before deployment, evaluate outputs and catch issues with 70+ metrics, then optimize and observe performance in production. All stages feed into each other: production traces inform evaluations, evaluations surface issues, and issues drive the next iteration. - -Future AGI integrates with OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI, Vercel AI SDK, and 30+ more frameworks. You can start with a single line of code. - ---- - -## Simulate & Iterate - -Go from idea to production-ready agent faster. Simulate thousands of scenarios, iterate with the Agent IDE, and run structured experiments. - -- **Simulation**: Run thousands of multi-turn conversations with synthetic users, personas, and branching scenarios. Test voice agents and chat agents before they reach real users. -- **Prototype**: Build AI application variants in the Agent IDE. Compare models, prompts, and configurations side by side with built-in evaluation. -- **Dataset**: Create golden datasets manually, import from files, or generate synthetic data. Use them across evaluations, simulations, and experiments. -- **Prompt**: Version prompts, deploy to environments via labels, and track how changes affect quality across traces. -- **Knowledge Base**: Upload documents that ground evaluations, power RAG testing, and provide context for synthetic data generation. - - - - Scenarios, personas, synthetic users - - - Agent IDE, experiments, comparison - - - Golden datasets, import, synthetic generation - - - Versioning, labels, environments - - - Document upload, RAG grounding - - - ---- +Future AGI is an end-to-end platform for building **reliable AI agents**. It brings simulation, evaluation, guardrails, tracing, optimization, and an LLM gateway into one place, so the work of shipping a trustworthy agent, and keeping it trustworthy, happens in a single connected loop instead of across disconnected tools -## Evaluate +It's built for the whole team shipping AI (engineers, product managers, and domain experts working from one source of truth), and it works with the stack you already use. If you use it, we probably support it. You can start with a single line of code -Catch issues early. Run comprehensive evaluations across datasets, detect hallucinations, and protect your agents with real-time guardrails. +## The Learning Loop -- **Error Feed**: Sentry-style error tracking for AI agents. Errors are automatically surfaced, grouped, and triaged. See exactly where and why your agent failed, which traces are affected, and how many users were impacted. -- **Evaluation**: 70+ built-in metrics covering quality, safety, hallucination, faithfulness, toxicity, PII detection, and more. Create custom evals for domain-specific checks. Run on datasets in development or on production traces continuously. -- **Protect**: Real-time guardrails that intercept requests and responses before they reach users. Block hallucinations, PII leaks, and policy violations in production. +Every part of Future AGI feeds the next. You [**prototype**](/docs/prototype) and [**simulate**](/docs/simulation) an agent before launch, [**evaluate**](/docs/evaluation) its outputs against built-in and custom metrics, [**observe**](/docs/observe) real traffic once it's live, and [**optimize**](/docs/optimization) from what you learn, then the cycle repeats - - - Error tracking, triage, root cause - - - 70+ metrics, custom evals, CI/CD - - - Real-time guardrails, PII blocking - - +![Future AGI platform](/images/agi2.webp) ---- +Because every product shares the same **traces, datasets, and scores**, the work compounds: a trace you capture becomes evaluation data, an evaluation result becomes an optimization signal, and a dataset feeds simulations and experiments alike. That shared spine is the mental model for everything below -## Optimize & Observe +{/* TODO: embed the Future AGI overview video here once the URL is ready. */} -Use production data to continuously improve your agents. Track performance in real-time, trace requests end-to-end, and get alerted before users complain. +## Explore the platform -- **Optimization**: Apply reinforcement learning from human feedback to automatically improve agent responses. The optimizer uses evaluation scores as reward signals to refine prompts without manual tuning. -- **Observability**: End-to-end tracing for every LLM call, retrieval, and tool invocation. Track costs by model, monitor latency percentiles, replay user sessions, and set up alerts for anomalies. Based on OpenTelemetry. -- **Agent Command Center**: Unified API gateway across 25+ LLM providers. Route requests with fallbacks, cache responses, enforce rate limits and budgets, and run shadow experiments. Drop-in replacement for the OpenAI SDK. -- **Falcon AI**: AI copilot with 300+ tools built into the dashboard. Analyze evaluation results, debug traces, create datasets, and chain multi-step workflows through natural language. -- **Annotations**: Human-in-the-loop labeling with annotation queues, custom scoring labels, and inter-annotator agreement tracking. Feed human judgments back into evaluations and optimization. +Future AGI is organized into six broad areas: - - RL from human feedback, auto-tuning - - - Tracing, costs, latency, alerts - - - Routing, caching, rate limits, 25+ providers - - - AI copilot, 300+ tools, natural language - - - Labeling queues, scoring, agreement - + Build and refine: Prototype, Agent Playground, Prompt, and Dataset + One gateway for routing, caching, guardrails, and cost control across 100+ providers + Test agents against synthetic users and scenarios before launch + Score quality with built-in and custom metrics, guardrails, knowledge bases, and human review + Trace production calls and surface failures in the Error Feed + Improve prompts and agents from real production data ---- - -## Where to start - -Setting up tracing, evaluation, and simulation can be done independently. Pick the path that matches where you are. - -| Starting point | You want to... | Start here | -|---|---|---| -| **New to Future AGI** | Get a quick overview and make your first call | [Quickstart](/docs/quickstart/prompts) | -| **Building an agent** | Test with simulated users before deploying | [Simulation](/docs/simulation) | -| **Already in production** | See what's happening with your LLM calls | [Observability](/docs/observe) | -| **Evaluating quality** | Run evals on outputs and catch regressions | [Evaluation](/docs/evaluation) | -| **Managing multiple LLM providers** | Unify routing, caching, and cost controls behind one API | [Agent Command Center](/docs/command-center) | - - - - Make your first API call in under 5 minutes. - - - Connect OpenAI, LangChain, LlamaIndex, and 30+ more. - - - Full API documentation for programmatic access. - - +
+
+ +
+
+
Build with Falcon
+
Future AGI's Falcon across the whole platform: analyze evals, debug traces, build datasets, and run multi-step workflows in natural language
+
+ +

Explore Falcon

+ +
+
+ +## Bring your data in + +The fastest way to see Future AGI is to get your data flowing: + +- [Send your first trace](/docs/get-started/send-your-first-trace) +- [Route your first LLM request](/docs/get-started/route-your-first-llm-request) +- [Add your first agent definition](/docs/get-started/add-your-first-agent-definition) +- [Create your first prompt](/docs/get-started/create-your-first-prompt) + + +**Using Cursor or Claude Code?** Install the Future AGI MCP server to bring the platform and docs straight into your editor. See [Set up the MCP server](/docs/quickstart/setup-mcp-server) +