diff --git a/public/images/docs/get-started/create-your-first-prompt/create-prompt.png b/public/images/docs/get-started/create-your-first-prompt/create-prompt.png new file mode 100644 index 00000000..aece7d41 Binary files /dev/null and b/public/images/docs/get-started/create-your-first-prompt/create-prompt.png differ diff --git a/public/images/docs/get-started/create-your-first-prompt/parameters.png b/public/images/docs/get-started/create-your-first-prompt/parameters.png new file mode 100644 index 00000000..5adaa292 Binary files /dev/null and b/public/images/docs/get-started/create-your-first-prompt/parameters.png differ diff --git a/public/images/docs/get-started/create-your-first-prompt/run-prompt.png b/public/images/docs/get-started/create-your-first-prompt/run-prompt.png new file mode 100644 index 00000000..e85996b5 Binary files /dev/null and b/public/images/docs/get-started/create-your-first-prompt/run-prompt.png differ diff --git a/public/images/docs/get-started/create-your-first-prompt/select-model.png b/public/images/docs/get-started/create-your-first-prompt/select-model.png new file mode 100644 index 00000000..26937cd7 Binary files /dev/null and b/public/images/docs/get-started/create-your-first-prompt/select-model.png differ diff --git a/public/images/docs/get-started/create-your-first-prompt/write-prompt.png b/public/images/docs/get-started/create-your-first-prompt/write-prompt.png new file mode 100644 index 00000000..9ba397eb Binary files /dev/null and b/public/images/docs/get-started/create-your-first-prompt/write-prompt.png differ diff --git a/src/components/docs/Card.astro b/src/components/docs/Card.astro index ee482b87..07096e01 100644 --- a/src/components/docs/Card.astro +++ b/src/components/docs/Card.astro @@ -16,6 +16,7 @@ const iconPaths: Record = { lightning: 'M13 10V3L4 14h7v7l9-11h-7z', database: 'M4 7v10c0 2.21 3.582 4 8 4s8-1.79 8-4V7M4 7c0 2.21 3.582 4 8 4s8-1.79 8-4M4 7c0-2.21 3.582-4 8-4s8 1.79 8 4m0 5c0 2.21-3.582 4-8 4s-8-1.79-8-4', shield: 'M9 12l2 2 4-4m5.618-4.016A11.955 11.955 0 0112 2.944a11.955 11.955 0 01-8.618 3.04A12.02 12.02 0 003 9c0 5.591 3.824 10.29 9 11.622 5.176-1.332 9-6.03 9-11.622 0-1.042-.133-2.052-.382-3.016z', + server: 'M5 12h14M5 12a2 2 0 01-2-2V6a2 2 0 012-2h14a2 2 0 012 2v4a2 2 0 01-2 2M5 12a2 2 0 00-2 2v4a2 2 0 002 2h14a2 2 0 002-2v-4a2 2 0 00-2-2m-2-4h.01M17 16h.01', // Charts & Analytics (Mintlify: chart-mixed, chart-line) 'chart-mixed': 'M9 19v-6a2 2 0 00-2-2H5a2 2 0 00-2 2v6a2 2 0 002 2h2a2 2 0 002-2zm0 0V9a2 2 0 012-2h2a2 2 0 012 2v10m-6 0a2 2 0 002 2h2a2 2 0 002-2m0 0V5a2 2 0 012-2h2a2 2 0 012 2v14a2 2 0 01-2 2h-2a2 2 0 01-2-2z', 'chart-line': 'M7 12l3-3 3 3 4-4M8 21l4-4 4 4M3 4h18M4 4v16', diff --git a/src/pages/docs/get-started/create-your-first-prompt.mdx b/src/pages/docs/get-started/create-your-first-prompt.mdx new file mode 100644 index 00000000..8bed4a2d --- /dev/null +++ b/src/pages/docs/get-started/create-your-first-prompt.mdx @@ -0,0 +1,72 @@ +--- +title: "Create your first prompt" +description: "Create and run your first prompt in the Future AGI Prompt Workbench, with versioning and variables built in" +--- + +A prompt is the instruction you give a model, and getting it right is one of the highest-leverage things you can do for an AI product. The **Prompt Workbench** gives every prompt a versioned home, so you can edit it, compare versions, and reuse it across datasets, simulations, experiments, and the SDK. This page walks you through creating and running your first prompt from scratch + +## Prerequisites + +- A Future AGI account +- At least one model provider configured (Dashboard → **Settings → AI Providers**) so you can run the prompt + +## 1. Open Prompts and create one + +In the dashboard, under **Build** in the left nav, click **Prompts**, then **Create prompt** and choose **Start from scratch** (the other options are *Generate with AI* and *Start with a template*): + +![Creating a new prompt from the Prompts section](/images/docs/get-started/create-your-first-prompt/create-prompt.png) + +## 2. Name it and write the prompt + +The prompt opens as **Untitled-1**: click the title to rename it (e.g. **Acme Support Assistant**). The editor then has two fields: **System** (optional) shapes the model's overall behavior, and **User** is the message that drives the response. Here's a ready-to-use example. Copy each block into the matching field: + +**System** + +```text +You are a customer support assistant for Acme, a company that makes project-management software. +Answer the customer's question clearly and accurately, in a friendly and professional tone. +Keep your reply under 120 words. If you are unsure of the answer, say so honestly and point the +customer to help@acme.com instead of guessing. +``` + +**User** + +```text +How do I reset my password? +``` + +![Renaming the prompt and writing the System and User messages](/images/docs/get-started/create-your-first-prompt/write-prompt.png) + +## 3. Pick a model and tune parameters + +With the prompt open, click **Select Model** and choose the model it runs on: + +![Choosing a model from the Select Model dropdown](/images/docs/get-started/create-your-first-prompt/select-model.png) + +Optionally, open **Params** to tune **temperature**, **max tokens**, **top P**, and more: + +![Tuning model parameters in the Params panel](/images/docs/get-started/create-your-first-prompt/parameters.png) + +## 4. Run it and save a version + +Click **Run Prompt** in the top-right corner, and the model's response appears in the **Output** panel. Saving the prompt creates a new **version** every time, so you can compare versions, roll back, deploy a specific one via labels, and reuse the prompt across the rest of the platform: + +![Running the prompt and viewing the model's response](/images/docs/get-started/create-your-first-prompt/run-prompt.png) + +## Verify + +Your prompt now appears in the **Prompts** list with its first version, and the **Output** panel shows the model's latest response + +Congratulations! You've created your first prompt!🎉 + +## Troubleshooting + +Not getting a response? Try checking these: + +- **"API key not configured"**: add a model provider key under **Settings → AI Providers**, then run again +- **Empty output**: check that the **User** field isn't blank before running + +## Next steps + +- [Run the prompt over a dataset](/docs/dataset/features/run-prompt) to test it at scale +- [Fetch the prompt from your app via the SDK](/docs/prompt/features/sdk) diff --git a/src/pages/docs/index.mdx b/src/pages/docs/index.mdx index 0ea10c94..c02a4138 100644 --- a/src/pages/docs/index.mdx +++ b/src/pages/docs/index.mdx @@ -1,118 +1,58 @@ --- -title: "Future AGI Docs: AI Testing, Guardrails, and Monitoring" -description: "The complete platform to test, guard, and monitor AI agents. Build self-improving agents that ship smarter with every version." +title: "What is Future AGI?" +description: "Future AGI is the end-to-end platform for building, evaluating, observing, and making AI agents reliable" --- -![Future AGI platform](/images/agi2.webp) - -The platform covers the full agent lifecycle across three stages: simulate and iterate on your agent before deployment, evaluate outputs and catch issues with 70+ metrics, then optimize and observe performance in production. All stages feed into each other: production traces inform evaluations, evaluations surface issues, and issues drive the next iteration. - -Future AGI integrates with OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI, Vercel AI SDK, and 30+ more frameworks. You can start with a single line of code. - ---- - -## Simulate & Iterate - -Go from idea to production-ready agent faster. Simulate thousands of scenarios, iterate with the Agent IDE, and run structured experiments. - -- **Simulation**: Run thousands of multi-turn conversations with synthetic users, personas, and branching scenarios. Test voice agents and chat agents before they reach real users. -- **Prototype**: Build AI application variants in the Agent IDE. Compare models, prompts, and configurations side by side with built-in evaluation. -- **Dataset**: Create golden datasets manually, import from files, or generate synthetic data. Use them across evaluations, simulations, and experiments. -- **Prompt**: Version prompts, deploy to environments via labels, and track how changes affect quality across traces. -- **Knowledge Base**: Upload documents that ground evaluations, power RAG testing, and provide context for synthetic data generation. - - - - Scenarios, personas, synthetic users - - - Agent IDE, experiments, comparison - - - Golden datasets, import, synthetic generation - - - Versioning, labels, environments - - - Document upload, RAG grounding - - - ---- +Future AGI is an end-to-end platform for building **reliable AI agents**. It brings simulation, evaluation, guardrails, tracing, optimization, and an LLM gateway into one place, so the work of shipping a trustworthy agent, and keeping it trustworthy, happens in a single connected loop instead of across disconnected tools -## Evaluate +It's built for the whole team shipping AI (engineers, product managers, and domain experts working from one source of truth), and it works with the stack you already use. If you use it, we probably support it. You can start with a single line of code -Catch issues early. Run comprehensive evaluations across datasets, detect hallucinations, and protect your agents with real-time guardrails. +## The Learning Loop -- **Error Feed**: Sentry-style error tracking for AI agents. Errors are automatically surfaced, grouped, and triaged. See exactly where and why your agent failed, which traces are affected, and how many users were impacted. -- **Evaluation**: 70+ built-in metrics covering quality, safety, hallucination, faithfulness, toxicity, PII detection, and more. Create custom evals for domain-specific checks. Run on datasets in development or on production traces continuously. -- **Protect**: Real-time guardrails that intercept requests and responses before they reach users. Block hallucinations, PII leaks, and policy violations in production. +Every part of Future AGI feeds the next. You [**prototype**](/docs/prototype) and [**simulate**](/docs/simulation) an agent before launch, [**evaluate**](/docs/evaluation) its outputs against built-in and custom metrics, [**observe**](/docs/observe) real traffic once it's live, and [**optimize**](/docs/optimization) from what you learn, then the cycle repeats - - - Error tracking, triage, root cause - - - 70+ metrics, custom evals, CI/CD - - - Real-time guardrails, PII blocking - - +![Future AGI platform](/images/agi2.webp) ---- +Because every product shares the same **traces, datasets, and scores**, the work compounds: a trace you capture becomes evaluation data, an evaluation result becomes an optimization signal, and a dataset feeds simulations and experiments alike. That shared spine is the mental model for everything below -## Optimize & Observe +{/* TODO: embed the Future AGI overview video here once the URL is ready. */} -Use production data to continuously improve your agents. Track performance in real-time, trace requests end-to-end, and get alerted before users complain. +## Explore the platform -- **Optimization**: Apply reinforcement learning from human feedback to automatically improve agent responses. The optimizer uses evaluation scores as reward signals to refine prompts without manual tuning. -- **Observability**: End-to-end tracing for every LLM call, retrieval, and tool invocation. Track costs by model, monitor latency percentiles, replay user sessions, and set up alerts for anomalies. Based on OpenTelemetry. -- **Agent Command Center**: Unified API gateway across 25+ LLM providers. Route requests with fallbacks, cache responses, enforce rate limits and budgets, and run shadow experiments. Drop-in replacement for the OpenAI SDK. -- **Falcon AI**: AI copilot with 300+ tools built into the dashboard. Analyze evaluation results, debug traces, create datasets, and chain multi-step workflows through natural language. -- **Annotations**: Human-in-the-loop labeling with annotation queues, custom scoring labels, and inter-annotator agreement tracking. Feed human judgments back into evaluations and optimization. +Future AGI is organized into six broad areas: - - RL from human feedback, auto-tuning - - - Tracing, costs, latency, alerts - - - Routing, caching, rate limits, 25+ providers - - - AI copilot, 300+ tools, natural language - - - Labeling queues, scoring, agreement - + Build and refine: Prototype, Agent Playground, Prompt, and Dataset + One gateway for routing, caching, guardrails, and cost control across 100+ providers + Test agents against synthetic users and scenarios before launch + Score quality with built-in and custom metrics, guardrails, knowledge bases, and human review + Trace production calls and surface failures in the Error Feed + Improve prompts and agents from real production data ---- - -## Where to start - -Setting up tracing, evaluation, and simulation can be done independently. Pick the path that matches where you are. - -| Starting point | You want to... | Start here | -|---|---|---| -| **New to Future AGI** | Get a quick overview and make your first call | [Quickstart](/docs/quickstart/prompts) | -| **Building an agent** | Test with simulated users before deploying | [Simulation](/docs/simulation) | -| **Already in production** | See what's happening with your LLM calls | [Observability](/docs/observe) | -| **Evaluating quality** | Run evals on outputs and catch regressions | [Evaluation](/docs/evaluation) | -| **Managing multiple LLM providers** | Unify routing, caching, and cost controls behind one API | [Agent Command Center](/docs/command-center) | - - - - Make your first API call in under 5 minutes. - - - Connect OpenAI, LangChain, LlamaIndex, and 30+ more. - - - Full API documentation for programmatic access. - - +
+
+ +
+
+
Build with Falcon
+
Future AGI's Falcon across the whole platform: analyze evals, debug traces, build datasets, and run multi-step workflows in natural language
+
+ +

Explore Falcon

+ +
+
+ +## Bring your data in + +The fastest way to see Future AGI is to get your data flowing: + +- [Send your first trace](/docs/get-started/send-your-first-trace) +- [Route your first LLM request](/docs/get-started/route-your-first-llm-request) +- [Add your first agent definition](/docs/get-started/add-your-first-agent-definition) +- [Create your first prompt](/docs/get-started/create-your-first-prompt) + + +**Using Cursor or Claude Code?** Install the Future AGI MCP server to bring the platform and docs straight into your editor. See [Set up the MCP server](/docs/quickstart/setup-mcp-server) +