PDX-490: feat(mcp) — error_category + retryable on test-run failures (Thread D)#186
Conversation
…run failures RCA: Consumers of provar_automation_testrun.steps[] and provar_testrun_rca.failures[] currently have no machine-friendly retry hint and must re-parse human-readable strings to decide whether a failure is transient or deterministic, so agents either retry locator failures forever or give up on a single ECONNRESET blip. Fix: Add optional error_category (INFRASTRUCTURE|ASSERTION|LOCATOR|TIMEOUT|OTHER) and retryable (boolean) fields to both FailureReport (rcaTools.ts) and JUnitStepResult (antTools.ts), populated by an additive pattern-match classifier; retryable=true only for INFRASTRUCTURE and TIMEOUT, undefined when no pattern matches — non-breaking — with tool descriptions and docs/mcp.md updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Quality Orchestrator🟢 LOW · 🧪 Tests to Run · Running 3 of 51 tests
▶ Run commandnpx vitest run \
unit/mcp/antTools.test.ts \
unit/mcp/automationTools.test.ts \
unit/mcp/rcaTools.test.ts⚡ quality-orchestrator · |
There was a problem hiding this comment.
Pull request overview
This PR adds two additive, optional fields (error_category, retryable) to MCP tool outputs for test-run failures so downstream consumers can make machine-friendly retry decisions without re-parsing human-readable error text.
Changes:
- Add coarse-grained error classification + retryability flags to
provar_testrun_rcafailure reports and toprovar_automation_testrunstep results. - Update MCP tool descriptions and docs to describe the new fields.
- Add unit test coverage for all categories + “no match” behavior in both RCA and JUnit parsing.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/mcp/tools/rcaTools.ts |
Adds error_category/retryable to FailureReport and populates them from failure text; updates tool description. |
src/mcp/tools/antTools.ts |
Adds error_category/retryable to JUnitStepResult and populates them during JUnit step extraction. |
src/mcp/tools/automationTools.ts |
Updates provar_automation_testrun tool description to mention the new step fields. |
docs/mcp.md |
Documents the new fields for both tools and updates example output. |
test/unit/mcp/rcaTools.test.ts |
Adds tests validating RCA failure error_category/retryable behavior. |
test/unit/mcp/antTools.test.ts |
Adds tests validating JUnit step error_category/retryable behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 'Each failure includes optional error_category (INFRASTRUCTURE|ASSERTION|LOCATOR|TIMEOUT|OTHER)', | ||
| 'and retryable (boolean) fields when the failure text matches a known pattern — INFRASTRUCTURE/TIMEOUT', | ||
| 'are flagged retryable, others are not.', |
| | `test_case` | Test case filename from JUnit `<testcase name>` | | ||
| | `error_class` | Extracted exception class name | | ||
| | `error_message` | First 500 chars of failure/error text | | ||
| | `root_cause_category` | One of 12 categories (see table below) | |
…ix root_cause_category count RCA: Copilot review on PR #186 flagged two doc/description inconsistencies — (1) the provar_testrun_rca tool description implied error_category/retryable appear in mode="failures" output, but mode="failures" only emits { testItemId, title, errorMessage } and the new fields live on FailureReport entries in mode="rca" only; (2) docs/mcp.md FailureReport table said root_cause_category is "One of 12 categories" while the enumerated list contains 17 buckets (and the new paragraph added in this PR explicitly says "17 buckets"). These are doc/description mismatches, not runtime behaviour bugs. Fix: Reword the rcaTools.ts tool description so error_category/retryable are explicitly scoped to failures[] entries in mode="rca", with a "NOT included in mode=failures output" clarifier so agents reading the description cold pick the right mode. Update docs/mcp.md root_cause_category row from "One of 12 categories (see table below)" to "One of 17 categories (see list below)" — count now matches the enumerated list (DRIVER_VERSION_MISMATCH, LOCATOR_STALE, TIMEOUT, ASSERTION_FAILED, CREDENTIAL_FAILURE, MISSING_CALLABLE, METADATA_CACHE, PAGE_OBJECT_COMPILE, CONNECTION_REFUSED, DATA_SETUP, LICENSE_INVALID, SALESFORCE_VALIDATION, SALESFORCE_PICKLIST, SALESFORCE_REFERENCE, SALESFORCE_ACCESS, SALESFORCE_TRIGGER, UNKNOWN = 17). Also corrected "see table below" → "see list below" since the categories are an inline comma-separated list, not a table. Verified with yarn compile (clean), yarn lint (clean), and full mocha suite (1169 passing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mrdailey99
left a comment
There was a problem hiding this comment.
Addressed in 669b517: rca tool description now correctly scopes error_category/retryable to mode='rca'; root_cause_category count corrected to 17.
Summary
Adds two optional, additive fields to both
provar_automation_testrun.steps[](JUnitStepResult) andprovar_testrun_rca.failures[](FailureReport):error_category— one ofINFRASTRUCTURE|ASSERTION|LOCATOR|TIMEOUT|OTHER. Absent when no known pattern matches.retryable—trueonly forINFRASTRUCTURE(Connection reset / ECONNRESET / socket hang up / Failed to read client socket message) andTIMEOUT(TimeoutException).falseforLOCATOR(NoSuchElementException),ASSERTION(AssertionException), andOTHER(SessionNotCreatedException / WebDriverException / ClassNotFoundException / LicenseException / InvalidPasswordException). Absent whenerror_categoryis absent.Identical classifier logic lives in both files (
classifyErrorCategoryinrcaTools.ts,classifyStepErrorCategoryinantTools.ts) so a consumer sees the same label regardless of whether they consume the testrun output or the RCA report.The existing
root_cause_category(17 fine-grained buckets, drivesrecommendationtext) is unchanged —error_categoryis an additional coarse-grained classifier that exists to drive automated retry policy.Why
Today an agent calling
provar_automation_testrunorprovar_testrun_rcahas no machine-friendly retry hint — to decide whether a failure is transient or deterministic, it has to re-parse human-readable strings. That leads to two failure modes:ECONNRESETblip that would succeed on the next attempt.error_category+retryablegive the agent one machine-checkable boolean to gate retries on.Scope
src/mcp/tools/rcaTools.ts— addsErrorCategorytype,classifyErrorCategory(),isRetryable(), fields onFailureReport, population insidebuildFailureReports, tool description update.src/mcp/tools/antTools.ts— addsJUnitErrorCategorytype,classifyStepErrorCategory(),isStepRetryable(), fields onJUnitStepResult, population insideextractStepsFromJUnit.src/mcp/tools/automationTools.ts— tool-description update forprovar_automation_testrun.docs/mcp.md— documents the new fields on both tools.test/unit/mcp/rcaTools.test.ts— 8 new tests covering all five categories, the "no match" case, the OTHER case, and the additive-no-breaking-change case.test/unit/mcp/antTools.test.ts— 6 new tests onparseJUnitResultscovering all categories plus the no-match and pass-step cases.Cross-thread coordination
[touches antTools.ts]— Thread B (PDX-491 H1) will also touchantTools.tsin a sibling PR — specifically the body ofparseJUnitResults/extractStepsFromJUnit(the parser body). The edits in this PR are isolated to:JUnitStepResultinterface declaration (one new union type + two optional fields), andextractStepsFromJUnit(an additiveif (errorMessage) { ... step.error_category = ...; step.retryable = ...; }).Both edits are additive and confined to small contiguous spans, so a clean rebase with Thread B's PR should be straightforward.
Test plan
node ../../../node_modules/typescript/bin/tsc -p . --pretty --noEmit— clean (no errors)node ../../../node_modules/eslint/bin/eslint.json every changed file — clean (no warnings)node_modules/.bin/nyc node_modules/.bin/mocha "test/**/*.test.ts"— 1169 passing, 1 pendingnode scripts/lint-script-names.cjs— OKdocs/provar-mcp-public-docs.md,docs/university-of-provar-mcp-course.md) — the additive field rollout will be flagged for the Provar team to mirror in customer docs.🤖 Generated with Claude Code