chore: Add initial project specs by asamal4 · Pull Request #249 · lightspeed-core/lightspeed-evaluation

asamal4 · 2026-06-03T08:59:17Z

Description

Add project specification covering WHAT and HOW for better AI context.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Documentation
- Added a comprehensive specification README and a suite of design docs for the LightSpeed Evaluation Framework covering architecture, configuration/models, agent drivers, metrics subsystem, evaluation pipeline behavior, LLM/judge model, output/reporting and storage, and overall project structure
- Documented behavioral rules, config surfaces, data flows, lifecycle semantics, override precedence, error handling, metric scoring/thresholding, and planned future work

coderabbitai · 2026-06-03T08:59:28Z

Warning

Review limit reached

@asamal4, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 38 minutes and 58 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ad7f6066-85f3-4d90-a6c7-5f1f478798f3

📥 Commits

Reviewing files that changed from the base of the PR and between c62012b and 7acb089.

📒 Files selected for processing (12)

.ai/spec/README.md
.ai/spec/how/agent-drivers.md
.ai/spec/how/configuration-and-models.md
.ai/spec/how/metrics-implementation.md
.ai/spec/how/output-and-storage.md
.ai/spec/how/project-structure.md
.ai/spec/what/agent-drivers.md
.ai/spec/what/evaluation-pipeline.md
.ai/spec/what/llm-and-judges.md
.ai/spec/what/metrics.md
.ai/spec/what/output-and-reporting.md
.ai/spec/what/system-overview.md

Walkthrough

This PR adds comprehensive specification documentation for the LightSpeed Evaluation Framework, establishing both behavioral rules (what/ directory) and implementation guides (how/ directory). The change includes nine markdown files totaling 626 new lines describing system architecture, configuration semantics, subsystem interactions, and codebase organization.

Changes

LightSpeed Evaluation Framework Specification Documentation

Layer / File(s)	Summary
Specification structure and system overview `.ai/spec/README.md`, `.ai/spec/what/system-overview.md`	README defines the specification framework, distinguishing behavioral rules (`what/`) from implementation guides (`how/`) and establishes writing conventions. `system-overview` provides the high-level system description, covering evaluation roles, metric resolution modes, execution model, configuration surface, and planned changes.
User-facing behavioral specifications `.ai/spec/what/agent-drivers.md`, `.ai/spec/what/evaluation-pipeline.md`, `.ai/spec/what/llm-and-judges.md`, `.ai/spec/what/metrics.md`, `.ai/spec/what/output-and-reporting.md`	Five specs defining runtime behavior: agent drivers (turn enrichment, latency/token tracking, config semantics), evaluation pipeline (concurrency, per-conversation/turn sequencing, error cascade rules), LLM & judges (pool/panel modes, caching, token tracking), metrics (backend categories, resolution/thresholding, metadata precedence), and output/reporting (report formats, stats, storage lifecycle).
Implementation architecture and subsystem guides `.ai/spec/how/project-structure.md`, `.ai/spec/how/configuration-and-models.md`, `.ai/spec/how/agent-drivers.md`, `.ai/spec/how/metrics-implementation.md`, `.ai/spec/how/output-and-storage.md`	Five implementation guides: project structure and entry points; configuration and Pydantic model flows (loader, validations, auto-migrations); agent-driver architecture and API client/streaming details; metrics backend dispatch, metadata merge semantics, and evaluator/judge orchestration; output and storage lifecycle, backend composition, and persistence differences.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

lightspeed-core/lightspeed-evaluation#238: Defines agentic evaluation abstractions (AgentDriver interface, driver registry, pipeline integration) that align with this PR's agent-driver behavioral and implementation specifications.

Suggested reviewers

rioloc

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: adding initial project specification documents across what/ and how/ directories.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.ai/spec/how/metrics-implementation.md:
- Line 42: Update the integration contract text to correct the method names: for
the interaction between ConversationProcessor and MetricManager keep "Calls
resolve_metrics()" but replace the incorrect "MetricsEvaluator calls
get_metadata() for thresholds" with a precise description stating that
MetricsEvaluator uses MetricManager.get_effective_threshold(...) to obtain
threshold values and MetricManager.get_metric_metadata(...) for metric metadata
retrieval; reference ConversationProcessor, MetricManager, resolve_metrics(),
MetricManager.get_effective_threshold(...), and get_metric_metadata(...) so
contributors can locate the implementations.

In @.ai/spec/what/agent-drivers.md:
- Line 12: The sentence should be changed to clarify that AgentDriverRegistry
(the class-based registry used in AgentDriverRegistry in src code) is keyed by
driver "type" (e.g., "http_api") while conversations/agent selection use a
configured agent ID; update the wording to explicitly split these concerns and
reference the registry API symbols AGENT_DRIVERS and create_driver() so readers
know drivers are looked up by type via AGENT_DRIVERS/create_driver() and
conversations pick an agent by agent ID with a fallback to the default agent.

In @.ai/spec/what/evaluation-pipeline.md:
- Around line 31-32: Update the text to accurately reflect that API/Agent errors
do more than mark a single turn: they mark the current turn as ERROR and then
cascade ERROR status to all remaining turns and to conversation-level metrics
via ConversationProcessor._handle_api_error; change the sentence that currently
reads “mark all metrics for that turn as ERROR” to explicitly state this cascade
to remaining turns and conversation-level metrics and reference
ConversationProcessor._handle_api_error as the implementing behavior.

In @.ai/spec/what/metrics.md:
- Line 35: Update the inaccurate sentence to state that required-data gating
occurs during metric evaluation in MetricsEvaluator (post-agent enrichment) and
that DataValidator is intentionally scoped to only validate explicitly provided
user evaluation data rather than resolved/default metric requirements; reference
MetricsEvaluator and DataValidator by name in the revised wording so readers
know where each responsibility lives.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 132d5920-c21b-40df-a05d-b929c68693a9

📥 Commits

Reviewing files that changed from the base of the PR and between fc5f537 and 2ec918c.

📒 Files selected for processing (12)

.ai/spec/README.md
.ai/spec/how/agent-drivers.md
.ai/spec/how/configuration-and-models.md
.ai/spec/how/metrics-implementation.md
.ai/spec/how/output-and-storage.md
.ai/spec/how/project-structure.md
.ai/spec/what/agent-drivers.md
.ai/spec/what/evaluation-pipeline.md
.ai/spec/what/llm-and-judges.md
.ai/spec/what/metrics.md
.ai/spec/what/output-and-reporting.md
.ai/spec/what/system-overview.md

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.ai/spec/what/system-overview.md:
- Around line 31-32: There is a contradiction between the system overview
statement that "setup script failures similarly mark all metrics as ERROR" and
the evaluation-pipeline description that setup failures only "mark all metrics
for that turn as ERROR"; decide which behavior is correct and make the two specs
consistent: either remove the word "similarly" and clarify in the system
overview that setup script failures only mark metrics for the current turn as
ERROR, or update the evaluation-pipeline text to state that setup failures
cascade to remaining turns and conversation-level metrics; update the relevant
sentences in both documents so they match exactly and add a short clarifying
phrase indicating whether cascade beyond the current turn occurs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dc5158bb-54f8-4637-ac0a-e1ea832fc7f9

📥 Commits

Reviewing files that changed from the base of the PR and between 2ec918c and c62012b.

📒 Files selected for processing (12)

.ai/spec/README.md
.ai/spec/how/agent-drivers.md
.ai/spec/how/configuration-and-models.md
.ai/spec/how/metrics-implementation.md
.ai/spec/how/output-and-storage.md
.ai/spec/how/project-structure.md
.ai/spec/what/agent-drivers.md
.ai/spec/what/evaluation-pipeline.md
.ai/spec/what/llm-and-judges.md
.ai/spec/what/metrics.md
.ai/spec/what/output-and-reporting.md
.ai/spec/what/system-overview.md

✅ Files skipped from review due to trivial changes (7)

.ai/spec/README.md
.ai/spec/how/project-structure.md
.ai/spec/how/agent-drivers.md
.ai/spec/what/metrics.md
.ai/spec/how/configuration-and-models.md
.ai/spec/how/metrics-implementation.md
.ai/spec/what/agent-drivers.md

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread .ai/spec/how/metrics-implementation.md Outdated

Comment thread .ai/spec/what/agent-drivers.md

Comment thread .ai/spec/what/evaluation-pipeline.md Outdated

Comment thread .ai/spec/what/metrics.md

asamal4 force-pushed the initial-proj-specs branch from 2ec918c to c62012b Compare June 3, 2026 10:27

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread .ai/spec/what/system-overview.md Outdated

chore: Add initial project specs

7acb089

asamal4 force-pushed the initial-proj-specs branch from c62012b to 7acb089 Compare June 3, 2026 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Add initial project specs#249

chore: Add initial project specs#249
asamal4 wants to merge 1 commit into
lightspeed-core:mainfrom
asamal4:initial-proj-specs

asamal4 commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Review limit reached

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

asamal4 commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

asamal4 commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading