test(skills): add plugin skill eval datasets by ngoncharenko · Pull Request #116 · NVIDIA-NeMo/nemo-platform

ngoncharenko · 2026-05-29T20:40:21Z

Summary

Add eval datasets for the data-designer plugin skill.
Add eval datasets for the evaluator plugin skill, including positive routing cases and one negative Kubernetes/Flask case.

Validation

jq empty skills/nemo-data-designer-plugin/evals/evals.json
jq empty skills/nemo-evaluator-plugin/evals/evals.json
git diff --cached --check before commit

Summary by CodeRabbit

Tests
- Added/expanded evaluation suites: data-designer (now 4 cases), evaluator-plugin (4 cases), and nemo-setup (4 cases) covering expected guidance, CLI usage, and safety constraints.
Documentation
- Renamed several skills to consistent nemo-* plugin identifiers, updated frontmatter (owner/license/maturity) and reference links, and adjusted usage/troubleshooting guidance.
Chores
- Standardized the console message after creating the remote API key secret.

ngoncharenko · 2026-05-29T20:41:39Z

/nvskills-ci

coderabbitai · 2026-05-29T20:44:06Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds evaluation JSON suites for three skills, updates SKILL front-matter and references to use renamed nemo-* plugin identifiers, and tweaks a single SDK example console message.

Changes

Skill Evaluation Cases and SKILL metadata

Layer / File(s)	Summary
Data Designer SKILL and evals `skills/nemo-data-designer-plugin/SKILL.md`, `skills/nemo-data-designer-plugin/evals/evals.json`	Rename SKILL `name` to `nemo-data-designer-plugin`, adjust usage/troubleshooting bullets and Output Template guidance, and add eval cases `nemo-data-designer-plugin-001`..`-004` covering interactive/autopilot dataset generation, required script shape (`load_config_builder()`), references, and safety constraints.
Evaluator plugin SKILL and evals `skills/nemo-evaluator-plugin/SKILL.md`, `skills/nemo-evaluator-plugin/evals/evals.json`	Rename SKILL to `nemo-evaluator-plugin`, update metadata and reference links, and add eval cases `nemo-evaluator-plugin-001`..`-004` specifying CLI usage (`nemo evaluator evaluate run --spec` / `submit --spec-file`), JSON spec examples, venv mention, SKILL.md reading, and safety constraints (one control expects unrelated answer).

Setup docs update

Layer / File(s)	Summary
What’s next? / Available skills mapping `skills/nemo-setup/SKILL.md`, `skills/nemo-setup/evals/evals.json`	Add frontmatter metadata to `nemo-setup` SKILL.md, update follow-up and Available skills entries to reference `nemo-evaluator-plugin` and `nemo-data-designer-plugin`, and add four `nemo-setup-*` eval cases focused on port 8080 handling and safe, non-destructive guidance.

SDK example tweak

Layer / File(s)	Summary
Console message change in plugin_examples `packages/nemo_evaluator_sdk/examples/plugin_examples.py`	Replaced formatted workspace/secret print after creating remote API key secret with a fixed message: "API key secret created for workspace".

Suggested reviewers

gabwow
tylersbray

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Title clearly and concisely summarizes the main change: adding evaluation datasets to plugin skills (data-designer, evaluator, setup).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ngoncharenko/aalgo-231-add-evals-datasets-to-skills

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gabwow · 2026-05-29T20:45:21Z

/nvskills-ci

github-actions · 2026-05-29T20:47:32Z

Suite	Lines Covered	Line Rate	Branch Rate
Unit Tests	18412/24386	75.5%	61.9%
Integration Tests	11765/23163	50.8%	25.9%

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/nemo_evaluator_sdk/examples/plugin_examples.py`:
- Line 232: Update the generic print("API key secret created for workspace") to
include the workspace and secret_name variables so the message mirrors other
logs in this file; locate the print in plugin_examples.py (near the API key
secret creation logic) and change it to output a descriptive message containing
workspace and secret_name (e.g., "API key secret '{secret_name}' created for
workspace '{workspace}'") to restore useful context for debugging.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 252b19f4-2a84-4405-a5fc-f0fe97fc5d3e

📥 Commits

Reviewing files that changed from the base of the PR and between ce349ae and 91cd031.

📒 Files selected for processing (1)

packages/nemo_evaluator_sdk/examples/plugin_examples.py

gabwow · 2026-05-29T21:00:28Z

/nvskills-ci

coderabbitai

🧹 Nitpick comments (1)

skills/nemo-setup/SKILL.md (1)

188-188: 💤 Low value

Inconsistent separator for dual skill names.

Line 188 uses slash (nemo-evaluator / nemo-evaluator-plugin), but line 162 uses comma (nemo-evaluator, nemo-evaluator-plugin). Pick one format consistently.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/nemo-setup/SKILL.md` at line 188, The dual skill name separator is
inconsistent: change the entry "**`nemo-evaluator`** /
**`nemo-evaluator-plugin`** — metrics, sync/async evaluations, llm-judge,
benchmark jobs. From `plugins/nemo-evaluator`." to use the same separator as the
earlier mention (comma), i.e., "**`nemo-evaluator`, **`nemo-evaluator-plugin`**
— metrics, sync/async evaluations, llm-judge, benchmark jobs. From
`plugins/nemo-evaluator`.", so both occurrences of the pair use the comma
separator and maintain identical styling for `nemo-evaluator` and
`nemo-evaluator-plugin`.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@skills/nemo-setup/SKILL.md`:
- Line 188: The dual skill name separator is inconsistent: change the entry
"**`nemo-evaluator`** / **`nemo-evaluator-plugin`** — metrics, sync/async
evaluations, llm-judge, benchmark jobs. From `plugins/nemo-evaluator`." to use
the same separator as the earlier mention (comma), i.e., "**`nemo-evaluator`,
**`nemo-evaluator-plugin`** — metrics, sync/async evaluations, llm-judge,
benchmark jobs. From `plugins/nemo-evaluator`.", so both occurrences of the pair
use the comma separator and maintain identical styling for `nemo-evaluator` and
`nemo-evaluator-plugin`.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: abbbe96e-4450-4db5-a836-1f34cc8ebf73

📥 Commits

Reviewing files that changed from the base of the PR and between 91cd031 and b461cda.

📒 Files selected for processing (5)

skills/nemo-data-designer-plugin/SKILL.md
skills/nemo-data-designer-plugin/evals/evals.json
skills/nemo-evaluator-plugin/SKILL.md
skills/nemo-evaluator-plugin/evals/evals.json
skills/nemo-setup/SKILL.md

✅ Files skipped from review due to trivial changes (2)

skills/nemo-evaluator-plugin/evals/evals.json
skills/nemo-data-designer-plugin/SKILL.md

🚧 Files skipped from review as they are similar to previous changes (1)

skills/nemo-data-designer-plugin/evals/evals.json

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@skills/nemo-data-designer-plugin/SKILL.md`:
- Line 42: The troubleshooting bullets have mismatched markdown delimiters
around the key tokens (e.g., the line referencing `SamplerColumnConfig`),
causing rendering errors; update both bullets (the one mentioning
SamplerColumnConfig and the other at the same issue location) to use consistent
backticks and bold markers so they read like: bold label then inline code for
names — e.g., reference the symbol SamplerColumnConfig and its parameter name
`params` (not `sampler_params`) using matching delimiters to fix the markdown
rendering.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 939ee745-8031-4bf4-89cb-13fbe8eda371

📥 Commits

Reviewing files that changed from the base of the PR and between b461cda and b4c360d.

📒 Files selected for processing (8)

skills/nemo-data-designer-plugin/SKILL.md
skills/nemo-data-designer-plugin/evals/evals.json
skills/nemo-evaluator-plugin/SKILL.md
skills/nemo-evaluator-plugin/references/api-auth.md
skills/nemo-evaluator-plugin/references/llm-judge.md
skills/nemo-evaluator-plugin/references/troubleshooting.md
skills/nemo-setup/SKILL.md
skills/nemo-setup/evals/evals.json

✅ Files skipped from review due to trivial changes (1)

skills/nemo-setup/SKILL.md

🚧 Files skipped from review as they are similar to previous changes (1)

skills/nemo-data-designer-plugin/evals/evals.json

Signed-off-by: Nick Goncharenko <ngoncharenko@nvidia.com>

ngoncharenko · 2026-05-29T22:00:24Z

/nvskills-ci

Signed-off-by: Nick Goncharenko <ngoncharenko@nvidia.com>

gabwow · 2026-05-29T22:16:50Z

/nvskills-ci

Signed-off-by: Nick Goncharenko <ngoncharenko@nvidia.com>

svvarom · 2026-05-29T23:32:25Z

/nvskills-ci

Signed-off-by: Nick Goncharenko <ngoncharenko@nvidia.com>

gabwow · 2026-05-30T00:54:43Z

/nvskills-ci

gabwow · 2026-05-30T00:58:15Z

/nvskills-ci

ngoncharenko marked this pull request as ready for review May 29, 2026 20:41

ngoncharenko requested review from a team as code owners May 29, 2026 20:41

ngoncharenko changed the title ~~test: add plugin skill eval datasets~~ test(skills): add plugin skill eval datasets May 29, 2026

ngoncharenko force-pushed the ngoncharenko/aalgo-231-add-evals-datasets-to-skills branch from ce349ae to b51a05b Compare May 29, 2026 20:50

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Comment thread packages/nemo_evaluator_sdk/examples/plugin_examples.py

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Comment thread skills/nemo-data-designer-plugin/SKILL.md

Squash ngoncharenko/aalgo-231-add-evals-datasets-to-skills onto main

33b3c5f

Signed-off-by: Nick Goncharenko <ngoncharenko@nvidia.com>

ngoncharenko force-pushed the ngoncharenko/aalgo-231-add-evals-datasets-to-skills branch from b4c360d to 33b3c5f Compare May 29, 2026 22:00

chore: copy evals from PR108

5949adc

Signed-off-by: Nick Goncharenko <ngoncharenko@nvidia.com>

fix: rm unwanted comment

0e02301

Signed-off-by: Nick Goncharenko <ngoncharenko@nvidia.com>

ngoncharenko force-pushed the ngoncharenko/aalgo-231-add-evals-datasets-to-skills branch from a3c803d to 0e02301 Compare May 29, 2026 22:20

chore: workaround for timeout

e6fc3c6

Signed-off-by: Nick Goncharenko <ngoncharenko@nvidia.com>

ngoncharenko requested a review from gabwow May 29, 2026 23:32

gabwow approved these changes May 29, 2026

View reviewed changes

calliente mentioned this pull request May 30, 2026

feat(nemo-platform): onboard NeMo Platform skills NVIDIA/skills#120

Open

12 tasks

chore: fix nemo-setup

47c4778

Signed-off-by: Nick Goncharenko <ngoncharenko@nvidia.com>

ngoncharenko force-pushed the ngoncharenko/aalgo-231-add-evals-datasets-to-skills branch from 0125065 to 47c4778 Compare May 30, 2026 00:54

Conversation

ngoncharenko commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by CodeRabbit

Uh oh!

ngoncharenko commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Suggested reviewers

Uh oh!

gabwow commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gabwow commented May 29, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ngoncharenko commented May 29, 2026

Uh oh!

gabwow commented May 29, 2026

Uh oh!

svvarom commented May 29, 2026

Uh oh!

gabwow commented May 30, 2026

Uh oh!

gabwow commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngoncharenko commented May 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading