Refactor ParseWorkflowFile orchestration into focused helper phases#36012
Conversation
Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
ParseWorkflowFile orchestration into focused helper phases
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
🧪 Test Quality Sentinel completed test quality analysis. No test files were added or modified in this PR. The only changed file is pkg/workflow/compiler_orchestrator_workflow.go (a production file). Test Quality Sentinel skipped. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. |
|
✅ PR Code Quality Reviewer completed the code quality review. |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🏗️ Design Decision Gate — ADR RequiredThis PR makes significant changes to core business logic (276 new lines in 📄 Draft ADR committed:
📋 What to do next
Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision. ❓ Why ADRs MatterADRs create a searchable, permanent record of why the codebase looks the way it does. This decomposition establishes a 📋 Michael Nygard ADR Format ReferenceAn ADR must contain these four sections to be considered complete:
All ADRs are stored in
|
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /zoom-out and /improve-codebase-architecture — commenting with suggestions; no blocking issues.
📋 Key Themes & Highlights
Key Themes
- Opaque return tuple (
mergeRawOTLPEndpoints): the twointreturn positions carry implicit meaning that's invisible at the call site — named returns or consolidating the log counts intoapplyMergedRawObservabilitywould improve readability. - Silent error discard (
mergeImportedObservability): the lenientjson.Unmarshalpattern is preserved from the original, but in a named helper it's easier to miss. A log line would keep future debuggers informed. - Fragile ordering contract (
validateWorkflowEngineSettings): the comment asking callers to keep thechecksslice in legacy order is an implicit contract with no enforcement; a brief test or expanded comment would lock it down. - Untested pure functions:
buildMergedEnvSources,setMainWorkflowEnvSources, andmergeRawOTLPEndpointsare now independently unit-testable but have no coverage. The refactor makes these easy to test — worth taking the opportunity. - Weak field name (
workflowBuildContext.result): renaming tofrontmatterwould align with the project's existing vocabulary and avoid shadowing commonresultlocal names.
Positive Highlights
- ✅ Clean four-phase decomposition (
setup → validate → populate) makes the compilation pipeline easy to reason about at a glance. - ✅
validateWorkflowEngineSettingsusing a[]func(*WorkflowData) errorslice is an elegant way to unify the seven engine checks without repetition. - ✅ Error formatting helpers (
formatEngineSetupError,formatToolsProcessingError) correctly preserve the double-formatting guard — good attention to existing semantics. - ✅
mergeWorkflowEnvinverted the condition to handle the no-imports fast path first — more readable control flow than the originalif/else if.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.7M
| envSources := make(map[string]string, len(topEnv)) | ||
| for key := range topEnv { | ||
| envSources[key] = "(main workflow)" | ||
| func mergeRawOTLPEndpoints(mainObs map[string]any, importedObs map[string]any) ([]any, int, int) { |
There was a problem hiding this comment.
[/improve-codebase-architecture] mergeRawOTLPEndpoints returns an opaque ([]any, int, int) tuple — the two int positions are meaningful (mainCount, importAdded) but invisible at the call site.
💡 Suggestion: named returns or a small struct
The two integers are only used for a log line in applyMergedRawObservability. Named return values would clarify the contract without adding a struct:
func mergeRawOTLPEndpoints(mainObs, importedObs map[string]any) (merged []any, mainCount, importAdded int) {
...
return
}Alternatively, since mainCount and importAdded are used exclusively for logging, they could be dropped from the return and computed inside applyMergedRawObservability directly. That would reduce the coupling between the two functions.
| return | ||
| } | ||
| var importedObs map[string]any | ||
| if err := json.Unmarshal([]byte(mergedObservability), &importedObs); err != nil { |
There was a problem hiding this comment.
[/zoom-out] mergeImportedObservability silently swallows the JSON unmarshal error on line 244. This was the original behaviour, but now that the logic lives in a named helper it's less obvious — a future caller won't know the function is a no-op when the input is malformed.
💡 Suggestion: log the error
if err := json.Unmarshal([]byte(mergedObservability), &importedObs); err != nil {
orchestratorWorkflowLog.Printf("skipping imported observability merge: invalid JSON: %v", err)
return
}This preserves the existing lenient behaviour while making unexpected failures visible in debug output.
| if err := c.validateEngineMCPToolTimeout(workflowData); err != nil { | ||
| return nil, fmt.Errorf("%s: %w", cleanPath, err) | ||
| func (c *Compiler) validateWorkflowEngineSettings(cleanPath string, workflowData *WorkflowData) error { | ||
| // Keep this order aligned with legacy ParseWorkflowFile behavior. |
There was a problem hiding this comment.
[/improve-codebase-architecture] The comment // Keep this order aligned with legacy ParseWorkflowFile behavior. is a fragile implicit contract. If execution order matters (e.g., a later validator assumes an earlier one already returned an error), there's nothing to enforce it.
💡 Suggestion: make the contract testable
Add an integration test that exercises the specific ordering that matters (e.g., validateRunInstallScripts fires before validateEngineVersion). Even a simple sequence-of-errors test would prevent silent reordering.
Alternatively, if order genuinely does not matter and the comment is just defensive, replace it with a note explaining why order is irrelevant, which is equally useful.
|
|
||
| // workflowBuildContext captures the shared state across parse setup, validation, | ||
| // and workflow population phases. | ||
| type workflowBuildContext struct { |
There was a problem hiding this comment.
[/zoom-out] The result field on workflowBuildContext carries *parser.FrontmatterResult — the name result doesn't communicate what it holds, and it shadows the common local variable name result used throughout the old code.
💡 Suggestion: rename to `frontmatter`
type workflowBuildContext struct {
cleanPath string
content []byte
frontmatter *parser.FrontmatterResult // was: result
markdownDir string
...
}This makes ctx.frontmatter.Frontmatter read naturally and avoids the ambiguity with generic result locals in other functions that receive ctx.
|
|
||
| orchestratorWorkflowLog.Printf("Workflow file parsing completed successfully: %s", markdownPath) | ||
| return workflowData, nil | ||
| func buildMergedEnvSources(mergedEnv map[string]any, topEnv map[string]any, importedSources map[string]string) map[string]string { |
There was a problem hiding this comment.
[/improve-codebase-architecture] buildMergedEnvSources and setMainWorkflowEnvSources are now independently testable pure functions but have no unit tests. The env source attribution logic (main workflow vs import path) is subtle and worth covering directly.
💡 Suggested tests
func TestBuildMergedEnvSources(t *testing.T) {
merged := map[string]any{"A": "1", "B": "2"}
topEnv := map[string]any{"A": "1"} // A is from main
importedSources := map[string]string{"B": "imports/shared.md"}
got := buildMergedEnvSources(merged, topEnv, importedSources)
assert.Equal(t, "(main workflow)", got["A"])
assert.Equal(t, "imports/shared.md", got["B"])
}Similarly for mergeRawOTLPEndpoints (dedup logic, precedence, mainCount vs importAdded counters).
There was a problem hiding this comment.
The refactor is behavior-preserving and the decomposition is clean — phase sequencing is correct, error-formatting semantics are intact, merge precedence is unchanged, and the validation order matches the original.
### Findings summary
Two non-blocking observations on newly introduced code:
-
mergeRawOTLPEndpointsopaque return signature (line 267) — the([]any, int, int)tuple gives no indication of which int ismainCountvsimportAdded. Named returns or a small struct would make this self-documenting. -
workflowBuildContextundocumented phase ordering (line 15) —engineSetup,toolsResult, andworkflowDataare nil untilsetupWorkflowBuildContextcompletes. The struct doc should state this explicitly so test helpers and future orchestrators don't accidentally call validate/populate on an uninitialized context.
No correctness regressions found. All grumpy-coder findings (nil map write, silent JSON discard, observability overwrite, double-wrap) were present in the original code and are not introduced by this change.
🔎 Code quality review by PR Code Quality Reviewer · sonnet46 2.5M
| envSources := make(map[string]string, len(topEnv)) | ||
| for key := range topEnv { | ||
| envSources[key] = "(main workflow)" | ||
| func mergeRawOTLPEndpoints(mainObs map[string]any, importedObs map[string]any) ([]any, int, int) { |
There was a problem hiding this comment.
Opaque return signature makes call sites hard to reason about: the three returned values ([]any, int, int) have no indication of semantics at the definition or call site.
💡 Suggested fix
Use named returns to make the contract self-documenting:
func mergeRawOTLPEndpoints(mainObs, importedObs map[string]any) (endpoints []any, mainCount, importAdded int) {
seen := make(map[string]bool)
for _, ep := range extractRawOTLPEndpointMaps(mainObs) {
if url, _ := ep["url"].(string); url != "" && !seen[url] {
seen[url] = true
endpoints = append(endpoints, ep)
}
}
mainCount = len(endpoints)
for _, ep := range extractRawOTLPEndpointMaps(importedObs) {
if url, _ := ep["url"].(string); url != "" && !seen[url] {
seen[url] = true
endpoints = append(endpoints, ep)
importAdded++
}
}
return
}With unnamed returns, the caller mergedEndpoints, mainCount, importAdded := mergeRawOTLPEndpoints(...) works, but a future reader of the function signature alone cannot tell which int is which.
|
|
||
| // workflowBuildContext captures the shared state across parse setup, validation, | ||
| // and workflow population phases. | ||
| type workflowBuildContext struct { |
There was a problem hiding this comment.
workflowBuildContext has undocumented phase-ordering invariants: engineSetup, toolsResult, and workflowData are nil until setupWorkflowBuildContext completes. Calling validateWorkflowBuildContext or populateWorkflowBuildContext on a freshly constructed context panics.
💡 Suggested fix
Document the phase contract directly on the struct, so a future caller (e.g. a test helper, a second orchestrator) doesn't accidentally skip setup:
// workflowBuildContext captures shared state across the three build phases:
// 1. setupWorkflowBuildContext — must run first; populates engineSetup, toolsResult, workflowData
// 2. validateWorkflowBuildContext — requires phase 1 to have succeeded
// 3. populateWorkflowBuildContext — requires phases 1 and 2 to have succeeded
//
// Fields engineSetup, toolsResult, and workflowData are nil until phase 1 completes;
// accessing them before that point will panic.
type workflowBuildContext struct {Alternatively, a ready bool sentinel or a small validation helper (ctx.assertSetupDone()) can catch misuse at development time.
|
@copilot review all comments and reviews from copilot and github-actions actor. Respond and resolve. |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Addressed the actionable github-actions review items in |
|
|
|
|
|
@copilot lint go, merge main and recompile see https://github.com/github/gh-aw/actions/runs/26700559611/job/78692677070?pr=36012 |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Addressed in 4bff6f7. I fixed the remaining Go fmt/lint issues, reran |
|
|
The custom large-function lint pass flagged orchestration methods in
compiler_*.go; this change targetsParseWorkflowFileby decomposing its control flow without changing compiler behavior. The goal is to reduce function complexity while keeping existing validation, merge precedence, and error formatting semantics intact.Orchestration decomposition
ParseWorkflowFileinto explicit phases:shared/redirect-only)engine + imports, tools/markdown processing, initialWorkflowData)workflowBuildContextto carry phase state instead of passing long parameter lists.Behavior-preserving extraction of dense merge logic
Error handling parity
engine:line targeting