feat(aml): add lineage command for extracting metadata#17
Open
nvquanghuy wants to merge 7 commits intomasterfrom
Open
feat(aml): add lineage command for extracting metadata#17nvquanghuy wants to merge 7 commits intomasterfrom
nvquanghuy wants to merge 7 commits intomasterfrom
Conversation
Adds a new `holistics aml lineage` command that extracts lineage metadata from compiled AML projects and outputs a normalized JSON structure optimized for integration with data catalogs like DataHub. Features: - Parses TableModel and QueryModel entities with fields (dimensions/measures) - Extracts Dataset and Dashboard entities with chart definitions - Builds lineage edges: model->source, dataset->model, chart->model, dashboard->chart - Supports multiple table name formats (BigQuery, PostgreSQL, simple) - Options: --output file, --entities filter, --compact JSON Includes comprehensive tests with vitest. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds explicit chart→dataset relationship for proper hierarchy: - Dashboard → Chart → Dataset → Model → DW Table This complements chart_to_model (granular field-level lineage) with chart_to_dataset (hierarchical relationship). Also adds sample output fixture for reference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add AQL parsing to lineage extraction to solve the understatement problem where chart dependencies hidden in calculations and metrics were missed. Changes: - Add extractAqlModelRefs() to parse model.field patterns from AQL strings - Add extractAqlStrings() to find Heredoc content in viz blocks - Update FieldReference type to include 'source' field (field_ref vs aql) - Update parseDataset() to extract metrics and their AQL dependencies - Add DatasetMetric type with models_referenced and fields_referenced This addresses Problems #2 and #3 from LINEAGE_CHALLENGES.md: - AQL expressions in chart calculations now traced - Dataset-level metrics now have their model dependencies extracted Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add CLI_CORE_PATH env var support for local development - Add type-checked AQL extraction using cli-core utilities - Fall back to regex-based extraction when utilities unavailable - Pass dataset context through parsing for accurate type resolution This enables more accurate model.field extraction from AQL expressions using the same type checker as the Holistics frontend. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Test regex fallback when cli-core not available - Test type-checked extraction with mock cli-core - Test dataset caching for multiple metrics - Test fallback on errors from extractAqlReferences - Test fallback on errors from createDatasetFromCompiled - Test chart AQL extraction with dataset context - Test charts without dataset reference (orphan charts) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add --compiled <file> option to `aml lineage` for testing with pre-compiled JSON instead of running the compile subprocess - Fix [dev] debug message to use console.error so it doesn't pollute stdout JSON output when the CLI is invoked as a subprocess Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move lineage command from wrapper to @holistics/cli-core to fix
subprocess spawning bug. The wrapper is now minimal (10 lines).
## Changes
**Deleted** (1,862 lines):
- src/lineage.ts (782 lines) - All logic moved to cli-core
- src/__tests__/lineage.test.ts (560 lines) - Ported to cli-core
- src/__tests__/fixtures/* (416 lines) - Moved to cli-core
**Simplified** (100 lines removed):
- src/index.ts - Now just loads cli-core and registers commands
- package.json - Removed test scripts (wrapper doesn't need tests)
**Added**:
- .gitignore - Ignore test binaries
## New Architecture
The wrapper now delegates ALL commands to cli-core:
```typescript
import { loadModule } from './loader';
import { Command } from 'commander';
const program = new Command();
const clicore = await loadModule('@holistics/cli-core');
clicore.registerCommands(program);
program.parse(process.argv);
```
cli-core (v0.6.24+) now provides:
- aml compile
- aml validate
- aml lineage ← NEW!
## Why This Fix Works
Before: wrapper spawned `holistics aml compile` → spawn fails in Bun binary
After: cli-core calls `compileAMLFiles()` directly → no spawning needed
## Testing
Run with local cli-core:
```bash
CLI_CORE_PATH=/path/to/cli-core npm run cli -- aml lineage . --compact
```
Related:
- holistics-core PR #2798 (adds lineage to cli-core)
- datahub-integration-docs/LINEAGE_MIGRATION.md
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
STATUS: This PR is OPTIONAL and can be closed without merging.
The spawn bug fix works without this PR because the wrapper auto-downloads the latest cli-core from npm. Once
cli-core@0.6.24is published, existing wrapper binaries automatically get the fix.This PR simplifies the wrapper by removing lineage code, but it's a nice-to-have, not required.
What This PR Does
Removes lineage command from wrapper since it's been moved to
@holistics/cli-core(see holistics/holistics-core#2798).Deleted (1,862 lines):
Simplified:
Why This PR is Optional
The wrapper doesn't have cli-core in package.json. Instead, it downloads from npm at runtime:
This means:
cli-core@0.6.24→ wrapper auto-downloads itOptions
Option A: Close This PR ✅ Recommended
Option B: Merge This PR
Recommendation
Close this PR and just publish cli-core@0.6.24. The fix works without it!
If you want a cleaner wrapper in the future, you can revisit this later.
Related
🤖 Generated with Claude Code