Skip to content

feat(aml): add lineage command for extracting metadata#17

Open
nvquanghuy wants to merge 7 commits intomasterfrom
feature/lineage-command
Open

feat(aml): add lineage command for extracting metadata#17
nvquanghuy wants to merge 7 commits intomasterfrom
feature/lineage-command

Conversation

@nvquanghuy
Copy link
Copy Markdown

@nvquanghuy nvquanghuy commented Mar 31, 2026

Summary

STATUS: This PR is OPTIONAL and can be closed without merging.

The spawn bug fix works without this PR because the wrapper auto-downloads the latest cli-core from npm. Once cli-core@0.6.24 is published, existing wrapper binaries automatically get the fix.

This PR simplifies the wrapper by removing lineage code, but it's a nice-to-have, not required.


What This PR Does

Removes lineage command from wrapper since it's been moved to @holistics/cli-core (see holistics/holistics-core#2798).

Deleted (1,862 lines):

  • src/lineage.ts (782 lines) → Moved to cli-core
  • src/tests/* (1,080 lines) → Ported to cli-core

Simplified:

  • src/index.ts → 10 lines (just loads cli-core)
  • package.json → Removed test scripts

Why This PR is Optional

The wrapper doesn't have cli-core in package.json. Instead, it downloads from npm at runtime:

// src/downloader.ts
async function getLatestVersion(pkg: string) {
  const res = await fetch(`https://registry.npmjs.org/${pkg}`);
  return data["dist-tags"]?.latest;  // Auto-gets latest!
}

This means:

  1. Publish cli-core@0.6.24 → wrapper auto-downloads it
  2. Users get the fix immediately
  3. No wrapper update needed!

Options

Option A: Close This PR ✅ Recommended

  • Pros: Simpler, no wrapper release needed, fix works immediately
  • Cons: Wrapper still has dead code (lineage.ts) but it's not used

Option B: Merge This PR

  • Pros: Cleaner wrapper (10 lines vs 110), removes dead code
  • Cons: Requires wrapper release, more complex deployment

Recommendation

Close this PR and just publish cli-core@0.6.24. The fix works without it!

If you want a cleaner wrapper in the future, you can revisit this later.

Related

  • cli-core PR #2798 (the actual fix)
  • Documentation: datahub-integration-docs/LINEAGE_MIGRATION.md

🤖 Generated with Claude Code

Adds a new `holistics aml lineage` command that extracts lineage metadata
from compiled AML projects and outputs a normalized JSON structure optimized
for integration with data catalogs like DataHub.

Features:
- Parses TableModel and QueryModel entities with fields (dimensions/measures)
- Extracts Dataset and Dashboard entities with chart definitions
- Builds lineage edges: model->source, dataset->model, chart->model, dashboard->chart
- Supports multiple table name formats (BigQuery, PostgreSQL, simple)
- Options: --output file, --entities filter, --compact JSON

Includes comprehensive tests with vitest.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nvquanghuy nvquanghuy requested a review from khanhhuy March 31, 2026 08:56
nvquanghuy and others added 6 commits April 1, 2026 09:53
Adds explicit chart→dataset relationship for proper hierarchy:
- Dashboard → Chart → Dataset → Model → DW Table

This complements chart_to_model (granular field-level lineage)
with chart_to_dataset (hierarchical relationship).

Also adds sample output fixture for reference.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add AQL parsing to lineage extraction to solve the understatement problem
where chart dependencies hidden in calculations and metrics were missed.

Changes:
- Add extractAqlModelRefs() to parse model.field patterns from AQL strings
- Add extractAqlStrings() to find Heredoc content in viz blocks
- Update FieldReference type to include 'source' field (field_ref vs aql)
- Update parseDataset() to extract metrics and their AQL dependencies
- Add DatasetMetric type with models_referenced and fields_referenced

This addresses Problems #2 and #3 from LINEAGE_CHALLENGES.md:
- AQL expressions in chart calculations now traced
- Dataset-level metrics now have their model dependencies extracted

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add CLI_CORE_PATH env var support for local development
- Add type-checked AQL extraction using cli-core utilities
- Fall back to regex-based extraction when utilities unavailable
- Pass dataset context through parsing for accurate type resolution

This enables more accurate model.field extraction from AQL expressions
using the same type checker as the Holistics frontend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Test regex fallback when cli-core not available
- Test type-checked extraction with mock cli-core
- Test dataset caching for multiple metrics
- Test fallback on errors from extractAqlReferences
- Test fallback on errors from createDatasetFromCompiled
- Test chart AQL extraction with dataset context
- Test charts without dataset reference (orphan charts)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add --compiled <file> option to `aml lineage` for testing with
  pre-compiled JSON instead of running the compile subprocess
- Fix [dev] debug message to use console.error so it doesn't pollute
  stdout JSON output when the CLI is invoked as a subprocess

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move lineage command from wrapper to @holistics/cli-core to fix
subprocess spawning bug. The wrapper is now minimal (10 lines).

## Changes

**Deleted** (1,862 lines):
- src/lineage.ts (782 lines) - All logic moved to cli-core
- src/__tests__/lineage.test.ts (560 lines) - Ported to cli-core
- src/__tests__/fixtures/* (416 lines) - Moved to cli-core

**Simplified** (100 lines removed):
- src/index.ts - Now just loads cli-core and registers commands
- package.json - Removed test scripts (wrapper doesn't need tests)

**Added**:
- .gitignore - Ignore test binaries

## New Architecture

The wrapper now delegates ALL commands to cli-core:

```typescript
import { loadModule } from './loader';
import { Command } from 'commander';

const program = new Command();
const clicore = await loadModule('@holistics/cli-core');
clicore.registerCommands(program);
program.parse(process.argv);
```

cli-core (v0.6.24+) now provides:
- aml compile
- aml validate
- aml lineage ← NEW!

## Why This Fix Works

Before: wrapper spawned `holistics aml compile` → spawn fails in Bun binary
After: cli-core calls `compileAMLFiles()` directly → no spawning needed

## Testing

Run with local cli-core:
```bash
CLI_CORE_PATH=/path/to/cli-core npm run cli -- aml lineage . --compact
```

Related:
- holistics-core PR #2798 (adds lineage to cli-core)
- datahub-integration-docs/LINEAGE_MIGRATION.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant