Skip to content

chore: Add initial project specs#249

Open
asamal4 wants to merge 1 commit into
lightspeed-core:mainfrom
asamal4:initial-proj-specs
Open

chore: Add initial project specs#249
asamal4 wants to merge 1 commit into
lightspeed-core:mainfrom
asamal4:initial-proj-specs

Conversation

@asamal4
Copy link
Copy Markdown
Collaborator

@asamal4 asamal4 commented Jun 3, 2026

Description

Add project specification covering WHAT and HOW for better AI context.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Unit tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive specification README and a suite of design docs for the LightSpeed Evaluation Framework covering architecture, configuration/models, agent drivers, metrics subsystem, evaluation pipeline behavior, LLM/judge model, output/reporting and storage, and overall project structure
    • Documented behavioral rules, config surfaces, data flows, lifecycle semantics, override precedence, error handling, metric scoring/thresholding, and planned future work

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

Warning

Review limit reached

@asamal4, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 38 minutes and 58 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ad7f6066-85f3-4d90-a6c7-5f1f478798f3

📥 Commits

Reviewing files that changed from the base of the PR and between c62012b and 7acb089.

📒 Files selected for processing (12)
  • .ai/spec/README.md
  • .ai/spec/how/agent-drivers.md
  • .ai/spec/how/configuration-and-models.md
  • .ai/spec/how/metrics-implementation.md
  • .ai/spec/how/output-and-storage.md
  • .ai/spec/how/project-structure.md
  • .ai/spec/what/agent-drivers.md
  • .ai/spec/what/evaluation-pipeline.md
  • .ai/spec/what/llm-and-judges.md
  • .ai/spec/what/metrics.md
  • .ai/spec/what/output-and-reporting.md
  • .ai/spec/what/system-overview.md

Walkthrough

This PR adds comprehensive specification documentation for the LightSpeed Evaluation Framework, establishing both behavioral rules (what/ directory) and implementation guides (how/ directory). The change includes nine markdown files totaling 626 new lines describing system architecture, configuration semantics, subsystem interactions, and codebase organization.

Changes

LightSpeed Evaluation Framework Specification Documentation

Layer / File(s) Summary
Specification structure and system overview
.ai/spec/README.md, .ai/spec/what/system-overview.md
README defines the specification framework, distinguishing behavioral rules (what/) from implementation guides (how/) and establishes writing conventions. system-overview provides the high-level system description, covering evaluation roles, metric resolution modes, execution model, configuration surface, and planned changes.
User-facing behavioral specifications
.ai/spec/what/agent-drivers.md, .ai/spec/what/evaluation-pipeline.md, .ai/spec/what/llm-and-judges.md, .ai/spec/what/metrics.md, .ai/spec/what/output-and-reporting.md
Five specs defining runtime behavior: agent drivers (turn enrichment, latency/token tracking, config semantics), evaluation pipeline (concurrency, per-conversation/turn sequencing, error cascade rules), LLM & judges (pool/panel modes, caching, token tracking), metrics (backend categories, resolution/thresholding, metadata precedence), and output/reporting (report formats, stats, storage lifecycle).
Implementation architecture and subsystem guides
.ai/spec/how/project-structure.md, .ai/spec/how/configuration-and-models.md, .ai/spec/how/agent-drivers.md, .ai/spec/how/metrics-implementation.md, .ai/spec/how/output-and-storage.md
Five implementation guides: project structure and entry points; configuration and Pydantic model flows (loader, validations, auto-migrations); agent-driver architecture and API client/streaming details; metrics backend dispatch, metadata merge semantics, and evaluator/judge orchestration; output and storage lifecycle, backend composition, and persistence differences.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • lightspeed-core/lightspeed-evaluation#238: Defines agentic evaluation abstractions (AgentDriver interface, driver registry, pipeline integration) that align with this PR's agent-driver behavioral and implementation specifications.

Suggested reviewers

  • rioloc
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: adding initial project specification documents across what/ and how/ directories.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.ai/spec/how/metrics-implementation.md:
- Line 42: Update the integration contract text to correct the method names: for
the interaction between ConversationProcessor and MetricManager keep "Calls
resolve_metrics()" but replace the incorrect "MetricsEvaluator calls
get_metadata() for thresholds" with a precise description stating that
MetricsEvaluator uses MetricManager.get_effective_threshold(...) to obtain
threshold values and MetricManager.get_metric_metadata(...) for metric metadata
retrieval; reference ConversationProcessor, MetricManager, resolve_metrics(),
MetricManager.get_effective_threshold(...), and get_metric_metadata(...) so
contributors can locate the implementations.

In @.ai/spec/what/agent-drivers.md:
- Line 12: The sentence should be changed to clarify that AgentDriverRegistry
(the class-based registry used in AgentDriverRegistry in src code) is keyed by
driver "type" (e.g., "http_api") while conversations/agent selection use a
configured agent ID; update the wording to explicitly split these concerns and
reference the registry API symbols AGENT_DRIVERS and create_driver() so readers
know drivers are looked up by type via AGENT_DRIVERS/create_driver() and
conversations pick an agent by agent ID with a fallback to the default agent.

In @.ai/spec/what/evaluation-pipeline.md:
- Around line 31-32: Update the text to accurately reflect that API/Agent errors
do more than mark a single turn: they mark the current turn as ERROR and then
cascade ERROR status to all remaining turns and to conversation-level metrics
via ConversationProcessor._handle_api_error; change the sentence that currently
reads “mark all metrics for that turn as ERROR” to explicitly state this cascade
to remaining turns and conversation-level metrics and reference
ConversationProcessor._handle_api_error as the implementing behavior.

In @.ai/spec/what/metrics.md:
- Line 35: Update the inaccurate sentence to state that required-data gating
occurs during metric evaluation in MetricsEvaluator (post-agent enrichment) and
that DataValidator is intentionally scoped to only validate explicitly provided
user evaluation data rather than resolved/default metric requirements; reference
MetricsEvaluator and DataValidator by name in the revised wording so readers
know where each responsibility lives.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 132d5920-c21b-40df-a05d-b929c68693a9

📥 Commits

Reviewing files that changed from the base of the PR and between fc5f537 and 2ec918c.

📒 Files selected for processing (12)
  • .ai/spec/README.md
  • .ai/spec/how/agent-drivers.md
  • .ai/spec/how/configuration-and-models.md
  • .ai/spec/how/metrics-implementation.md
  • .ai/spec/how/output-and-storage.md
  • .ai/spec/how/project-structure.md
  • .ai/spec/what/agent-drivers.md
  • .ai/spec/what/evaluation-pipeline.md
  • .ai/spec/what/llm-and-judges.md
  • .ai/spec/what/metrics.md
  • .ai/spec/what/output-and-reporting.md
  • .ai/spec/what/system-overview.md

Comment thread .ai/spec/how/metrics-implementation.md Outdated
Comment thread .ai/spec/what/agent-drivers.md
Comment thread .ai/spec/what/evaluation-pipeline.md Outdated
Comment thread .ai/spec/what/metrics.md
@asamal4 asamal4 force-pushed the initial-proj-specs branch from 2ec918c to c62012b Compare June 3, 2026 10:27
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.ai/spec/what/system-overview.md:
- Around line 31-32: There is a contradiction between the system overview
statement that "setup script failures similarly mark all metrics as ERROR" and
the evaluation-pipeline description that setup failures only "mark all metrics
for that turn as ERROR"; decide which behavior is correct and make the two specs
consistent: either remove the word "similarly" and clarify in the system
overview that setup script failures only mark metrics for the current turn as
ERROR, or update the evaluation-pipeline text to state that setup failures
cascade to remaining turns and conversation-level metrics; update the relevant
sentences in both documents so they match exactly and add a short clarifying
phrase indicating whether cascade beyond the current turn occurs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dc5158bb-54f8-4637-ac0a-e1ea832fc7f9

📥 Commits

Reviewing files that changed from the base of the PR and between 2ec918c and c62012b.

📒 Files selected for processing (12)
  • .ai/spec/README.md
  • .ai/spec/how/agent-drivers.md
  • .ai/spec/how/configuration-and-models.md
  • .ai/spec/how/metrics-implementation.md
  • .ai/spec/how/output-and-storage.md
  • .ai/spec/how/project-structure.md
  • .ai/spec/what/agent-drivers.md
  • .ai/spec/what/evaluation-pipeline.md
  • .ai/spec/what/llm-and-judges.md
  • .ai/spec/what/metrics.md
  • .ai/spec/what/output-and-reporting.md
  • .ai/spec/what/system-overview.md
✅ Files skipped from review due to trivial changes (7)
  • .ai/spec/README.md
  • .ai/spec/how/project-structure.md
  • .ai/spec/how/agent-drivers.md
  • .ai/spec/what/metrics.md
  • .ai/spec/how/configuration-and-models.md
  • .ai/spec/how/metrics-implementation.md
  • .ai/spec/what/agent-drivers.md

Comment thread .ai/spec/what/system-overview.md Outdated
@asamal4 asamal4 force-pushed the initial-proj-specs branch from c62012b to 7acb089 Compare June 3, 2026 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant