Skip to content

feat: Databricks Unity Catalog Materialization#6565

Draft
falloficaruss wants to merge 24 commits into
feast-dev:masterfrom
falloficaruss:feat/databricks-uc-materialization
Draft

feat: Databricks Unity Catalog Materialization#6565
falloficaruss wants to merge 24 commits into
feast-dev:masterfrom
falloficaruss:feat/databricks-uc-materialization

Conversation

@falloficaruss

@falloficaruss falloficaruss commented Jun 28, 2026

Copy link
Copy Markdown

What this PR does / why we need it:

The L3 PR adds UC-backed materialization — the ability to write materialized features back to Unity Catalog Delta tables during feast materialize.
When feast materialize runs, the compute engine nodes (LocalOutputNode in local/nodes.py, SparkWriteNode in spark/nodes.py) now call a new write_uc_materialized_data() hook after writing to the online/offline stores. This hook:

  1. Checks if the offline store is databricks_uc with uc_registration.enabled
  2. Resolves the UC target path (catalog.schema.feature_view_name) from config or FeatureView tags
  3. Creates the UC feature table if it doesn't exist yet (via FeatureEngineeringClient.create_table)
  4. Merges the materialized data into the UC Delta table (via fe_client.write_table(mode="merge"))

Which issue(s) this PR fixes:

Fixes #6499

Checks

  • I've made sure the tests are passing.
  • My commits are signed off (git commit -s)
  • My PR title follows conventional commits format

Testing Strategy

  • Unit tests
  • Integration tests
  • Manual tests
  • Testing is not required for this change

Misc

@falloficaruss falloficaruss changed the title Feat/databricks uc materialization feat: Databricks uc materialization Jun 28, 2026
@falloficaruss falloficaruss changed the title feat: Databricks uc materialization feat: Databricks Unity Catalog Materialization Jun 28, 2026
falloficaruss and others added 9 commits July 1, 2026 16:58
…cks-specific SDKs

Signed-off-by: Abhishek Shinde <norizzabhii@gmail.com>
Signed-off-by: Abhishek Shinde <norizzabhii@gmail.com>
Clicking any image in a blog post now opens a fullscreen overlay
with the image centered on a dark backdrop. Close with click or Escape.

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Aniket Paluska <apaluska@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Aniket Paluskar <apaluska@redhat.com>
* feat: scaffold Aerospike online store

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* feat: implement Aerospike online_write_batch

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* feat: implement Aerospike online_read

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* feat: implement Aerospike update and teardown

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* test: add Aerospike unit and integration tests

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* feat: add async online_read/write and lifecycle hooks for Aerospike

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* docs: add Aerospike online store reference and tuning guide

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* fix: use bytearray keys and zip-based batch mapping for Aerospike reads

The Aerospike Python client mishandles bytes user keys (hashes only the first byte), collapsing all entities onto the same digest. Wrap keys in bytearray on write and read. Also pair BatchRecord responses with original input keys via zip rather than trusting br.key[2], which the client returns in a different representation on reads.

Add two integration tests: cross-FV Map CDT coexistence and update(tables_to_delete=...) background scan.

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* docs: clarify Aerospike auth and TLS sections are Enterprise-only

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* feat: add aerospike to feast-operator supported online stores

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* fix(aerospike): project requested_features server-side and surface per-record errors

Two review blockers rolled into one commit because they share the same code path.

1. Server-side projection. online_read now builds a map_get_by_key_list op nested into the feature-view submap via cdt_ctx_map_key when requested_features is provided, instead of fetching the whole FV slot and filtering in Python. For wide feature views this ships only the requested columns over the wire. The response shape (flat [k,v,k,v] list vs. dict) is normalized through _normalize_projected_features.

2. Per-record error surfacing. Both batch_write and batch_operate only raise when the whole request is rejected; partial failures (single-partition timeout, replica quorum miss) are otherwise silent and present downstream as missing features. online_read now distinguishes RECORD_NOT_FOUND (2) and OP_NOT_APPLICABLE (26, = nested ctx miss when FV slot is absent) from transient errors, which are raised. online_write_batch inspects every per-record result code after the batch call.

Unit tests cover all four paths: projected read, not-found, op-not-applicable (nested ctx miss), and a simulated TIMEOUT that must raise. The docker-backed cross-FV and update() integration tests still pass, so server-side projection is verified end-to-end against a real Aerospike server.

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* feat(aerospike)!: rename total_timeout_ms -> batch_total_timeout_ms and add socket_timeout_ms

Review feedback: total_timeout_ms was ambiguous (users read it as a global/end-to-end timeout) and the timeout surface was missing socket_timeout, which is the per-attempt trigger that lets max_retries actually fire within the total budget.

* total_timeout_ms -> batch_total_timeout_ms. Now explicitly named after the Aerospike batch policy it maps to, matches read_timeout_ms / write_timeout_ms in framing (each targets one policy scope).

* Add socket_timeout_ms (optional). Applies uniformly to read, write and batch policies when set. Leaves the Aerospike client default in place when unset.

BREAKING CHANGE: total_timeout_ms is renamed to batch_total_timeout_ms. Config files using the old name must be updated. No default value change.

Docs updated (reference + perf-tuning guide) with a short explainer on the per-attempt vs total deadline distinction. Two new unit tests pin the policy wiring: socket_timeout_ms propagates to all three scopes, and is omitted (not injected as None) when unset.

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* refactor(aerospike): use MAP_KEY_ORDERED, KEY_DIGEST, and instance-scoped client

Cheap-win cleanups flagged in review, all touching the same small patch of write-path and lifecycle code.

* Map CDTs are now created with MAP_KEY_ORDERED. map_get_by_key / map_remove_by_key on an ordered map are O(log N) in the map size instead of O(N); matters on reads of wide feature views and on the update() background scan (which walks every record in the project's set).

* Writes drop POLICY_KEY_SEND and rely on the client default (POLICY_KEY_DIGEST). The serialized entity key is no longer stored alongside each record, saving per-record storage the read path never consumes (batch_operate preserves request order; results are paired back by zip in online_read).

* _client moves from a class attribute to an instance attribute (set in __init__). Previously two AerospikeOnlineStore instances could share the cached client through class state until one wrote self._client. With the instance attribute the state is always per-instance from construction.

* Drop MongoDB references from class docstrings and comments (they referred to how the storage layout was derived rather than documenting current behavior). Also rewrite the _build_batch_writes docstring to describe the policies applied on the write path.

Unit test assertions for the write-path record are updated: bw.policy is now None (client default applies) and map ops carry map_policy={'map_order': MAP_KEY_ORDERED}. All three docker-backed integration tests still pass end-to-end (cross-FV upsert, update() background scan, full feature-store round-trip), so the read/write shape survives the ordering and policy changes against a real server.

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* feat(aerospike): add per-FV namespace/set overrides and prewriting hook

Adds three configuration knobs to AerospikeOnlineStoreConfig:

- namespace_overrides: pin individual feature views to a different
  Aerospike namespace (e.g. RAM-only vs. SSD-backed) without splitting
  the project across stores.
- set_overrides: place a feature view in its own set so admin ops on
  it (truncate, scan-based deletes during `feast apply`) do not touch
  records of other views.
- prewriting_hook: import-string-resolved callable invoked once per
  online_write_batch with the rows about to be written, returning the
  rows that actually go on the wire. Resolved and cached on first use;
  returning [] short-circuits the wire call.

Read, write, update and teardown paths all honour the per-FV ns/set
resolution. update() groups dropped feature views by their resolved
(ns, set) pair and issues one background scan per group. teardown()
truncates every unique (ns, set) pair the project may have written to,
including the store-level default.

Adds 22 unit tests for the new behaviour and updates 3 existing call
sites of _build_batch_writes for the new namespace= parameter. Adds a
sample hook module under examples/online_store/aerospike_overrides_and_hooks/
and corresponding sections in docs/reference/online-stores/aerospike.md.

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* test: update aerospike image tag

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* chore: sync README template and secrets baseline after master merge

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* chore: fix secrets baseline line number for v1 operator types

Adding aerospike to the feast-operator enum shifted the allowlisted
SecretRef entry in api/v1/featurestore_types.go by one line.

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* docs: update aerospike docs

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* fix(aerospike): wire batch max_retries and fix empty projection handling

Copilot review feedback on PR feast-dev#6532:

- Add max_retries to the batch client policy (batch_operate/batch_write path)
- Treat empty projected feature maps as present FV slots (is not None)
- Return {} from _normalize_projected_features([]) instead of None
- Fix projection unit test mock/assertions
- Correct prewriting_hook config docstring

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* style(aerospike): format online_read docs assignment for ruff

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* chore: update pixi.lock for aerospike optional extra

Regenerate the v6 lockfile with Pixi v0.63.1 after adding the aerospike extra to pyproject.toml.

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

* fix(aerospike): add client init lock and batch chunking

Guard lazy client creation with a lock to avoid connection leaks under concurrent first use, and chunk batch reads/writes by batch_max_records so large materializations stay under Aerospike server batch limits.

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>

---------

Signed-off-by: Valentyn Kahamlyk <valentin.kagamlyk@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Aniket Paluskar <apaluska@redhat.com>
* fix: Unblock nightly UI build

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* ci: Add UI production build check

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Abhishek Shinde <norizzabhii@gmail.com>
@falloficaruss falloficaruss force-pushed the feat/databricks-uc-materialization branch from 2850a47 to 386599e Compare July 3, 2026 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Extend Feast's DataSource to natively support Iceberg REST Catalog-backed tables

4 participants