Skip to content

refactor(package): stable lakeflow_framework import namespace with bundled config/schemas#87

Open
rederik76 wants to merge 2 commits into
mainfrom
refactor/lakeflow-framework-package
Open

refactor(package): stable lakeflow_framework import namespace with bundled config/schemas#87
rederik76 wants to merge 2 commits into
mainfrom
refactor/lakeflow-framework-package

Conversation

@rederik76

@rederik76 rederik76 commented May 23, 2026

Copy link
Copy Markdown
Collaborator

refactor(package): restructure src/ into an importable lakeflow_framework package

Summary

Restructures the framework's flat src/*.py layout into a proper lakeflow_framework Python package with a root pyproject.toml. The primary goal is a stable, globally-unique import namespace (lakeflow_framework.*) and portable bundling of default config and JSON schemas as package data.

The default deployment model is unchanged: flat DAB (bundle) deploy remains the preferred and default path for all customers. You still clone the repo and databricks bundle deploy, with framework.sourcePath pointing at the deployed src/ on Workspace Files. This PR additionally makes the package pip-installable as an optional distribution path for teams that manage Python dependencies via PyPI, a UC Volume, or an internal Artifactory feed — but no one is required to switch.

Two ADRs capture the design:

  • ADR-0007lakeflow_framework package layout, compat shims, packaging, deprecation timeline.
  • ADR-0008 — Strategy B "Workspace Files-first" config & schema resolution, which guarantees flat-deploy behavior is identical to prior releases.

Deployment modes

Flat DAB deploy stays the default; the wheel is an optional add-on for specific dependency-management needs:

Mode How it works When to use
Flat DAB deploy (default, preferred) Repo cloned and deployed with databricks bundle deploy; framework.sourcePath points at deployed src/; cluster reads modules and default config directly. Default for all customers. No pipeline changes required.
Wheel install (optional) pip install lakeflow-framework; defaults/schemas bundled in the wheel via importlib.resources. Teams managing Python deps via PyPI, a UC Volume, or Artifactory.
Wheel + local overlay (optional) Wheel installed, framework.sourcePath still set so src/local/config/ sparse overrides deep-merge on top. pip-managed installs that still need per-deployment config customisation.

What changed

Package restructure (primary change)

  • New src/lakeflow_framework/ package; all internal imports use absolute lakeflow_framework.* names (no bare imports remain inside the package). This removes the shadow-import risk where a customer bundle module could collide with a bare framework module name.
  • config/default/** and schemas/** bundled as package data so defaults/schemas travel with the package regardless of deploy mode.

Config/schema resolver (Strategy B, ADR-0008)

  • load_framework_default_json is the single resolver, resolution order: (1) Workspace Files under framework.sourcePath if present → (2) importlib.resources package data → (3) src/local/config/ sparse overlay deep-merged on top (ADR-0006 behavior preserved).
  • Workspace Files-first means flat-deploy customers see no behavior change — their files are found at step 1 and the package data is never consulted.
  • load_framework_schema returns an importlib.resources traversable for jsonschema validators; os.path.join(...) call sites migrated to the resolver.

Optional pip packaging (secondary)

  • Root pyproject.toml (setuptools build); VERSION is the single source of truth, resolved by both importlib.metadata (wheel) and direct file read (editable/flat deploy).
  • Optional extras scaffolded: lakeflow-framework, [contrib] (currently an empty no-op), [all].
  • New contrib/ extension point (empty __init__.py + support-policy README.rst); no modules land here in this PR.

Backward compatibility

  • Old flat src/*.py locations reduced to thin re-export shims (e.g. from lakeflow_framework.logger import *), kept until v1.0.0. Existing notebooks/bundles importing bare names keep working unchanged.

Tests

  • tests/test_package.py — public import surface.
  • tests/test_strategy_b_resolver.py — resolver precedence + schema resolution across deploy modes.

Docs

  • New: ADR-0007, ADR-0008, deploy_framework_overview.rst (positions flat DAB as default, wheel as optional), deploy_wheel.rst, contributor_contrib.rst, contributor_dev_env.rst.
  • Updated existing docs to prefer lakeflow_framework imports in new code; docs/conf.py reads release from VERSION.

Deprecation timeline

Version Action
v0.16.0 (this PR) lakeflow_framework package introduced; bare src/ imports still work via shims; flat DAB deploy remains default
v1.0.0 Compat shims at old flat src/*.py paths removed

Diff footprint

~128 files changed (+3,741 / −1,979), mostly file moves (src/X.pysrc/lakeflow_framework/X.py) plus import rewrites; net-new code concentrated in config_resolver.py, constants.py, __init__.py, contrib/, tests, and docs.

Test plan

  • Flat DAB deploy (default path): deploy with framework.sourcePath set; defaults/schemas load from Workspace Files (step 1); behavior identical to pre-v0.16.0.
  • Bare-name compat imports (e.g. from constants import ...) still resolve via shims.
  • src/local/config/ overlay deep-merges on top in all deploy modes.
  • pytest tests/test_package.py tests/test_strategy_b_resolver.py passes; full suite green.
  • (optional path) pip install -e . / ".[all]" succeed; lakeflow_framework.__version__ matches VERSION; python -m build wheel contains config/default/** and schemas/**; wheel install without framework.sourcePath loads defaults from package data (step 2).
  • Sphinx docs build clean; version header reflects VERSION.

…rk package

- Introduce src/lakeflow_framework/ as a proper Python package with pyproject.toml
  (hatchling build); config/ and schemas/ bundled as package data
- Implement Strategy B (Workspace Files-first) resolver in config_resolver:
  load_framework_default_json resolves via Workspace Files → importlib.resources →
  local/config/ overlay; load_framework_schema returns an importlib.resources
  traversable for bundled JSON schemas
- Reduce src/*.py shims to thin re-exports from lakeflow_framework for backward compat
- Add contrib/ extension point with README and __init__ stub
- Add tests: test_package.py (import surface), test_strategy_b_resolver.py (resolver + schema)
- Update all internal imports across dataflow/, dataflow_spec_builder/, and support modules
- Add docs: ADR-0007 (package layout), ADR-0008 (Workspace Files-first resolver),
  deploy_wheel.rst, deploy_framework_overview.rst, contributor_contrib.rst;
  update all existing docs/ pages to reference lakeflow_framework imports
@rederik76 rederik76 requested a review from haillew as a code owner May 23, 2026 05:36
@rederik76 rederik76 self-assigned this May 23, 2026
@rederik76 rederik76 changed the title feat(package): restructure src/ into pip-installable lakeflow_framewo… feat(package): restructure into src/lakeflow_framework namespace and add contrib support Jul 1, 2026
@rederik76 rederik76 changed the title feat(package): restructure into src/lakeflow_framework namespace and add contrib support refactor(package): stable lakeflow_framework import namespace with bundled config/schemas Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: lakeflow-framework Python package with contrib extras and absolute imports

1 participant