feat: Bring Your Own Spark - SparkApplication#6550
Open
aniketpalu wants to merge 4 commits into
Open
Conversation
…aterialization Adds a new batch compute engine that submits materialization jobs as SparkApplication CRDs via the Kubeflow Spark Operator. One 'feast materialize' call creates one SparkApplication pod that processes all feature views using distributed Spark, rather than running in-process on the Feast server. Key changes: - Refactor materialize()/materialize_incremental() to pass all tasks to the engine in a single batch call instead of looping per feature view. Existing engines are unaffected (base class loops tasks internally via _materialize_one). - Add public get_provider() method on FeatureStore. - New spark_application engine: config, compute, job, driver script, Dockerfile. - 12 unit tests covering config, validation, CR structure, state mapping, timeout, cleanup, and job naming.
| from tqdm import tqdm | ||
|
|
||
| fv_name = task_info["feature_view"] | ||
| logger.info(f"Thread started: {fv_name}") |
Comment on lines
+112
to
+113
| f"Starting materialization: {total} feature views, " | ||
| f"concurrency={concurrency}" |
| succeeded, failed = 0, 0 | ||
| for i, task in enumerate(tasks, 1): | ||
| fv_name = task["feature_view"] | ||
| logger.info(f"[{i}/{total}] Materializing: {fv_name}") |
| try: | ||
| name, elapsed = _materialize_one_fv(spark, feast_config, task) | ||
| succeeded += 1 | ||
| logger.info(f"[{i}/{total}] Completed: {name} ({elapsed:.1f}s)") |
| logger.info(f"[{i}/{total}] Completed: {name} ({elapsed:.1f}s)") | ||
| except Exception: | ||
| failed += 1 | ||
| logger.exception(f"[{i}/{total}] Failed: {fv_name}") |
| logger.info(f"Completed: {name} ({elapsed:.1f}s)") | ||
| except Exception: | ||
| failed += 1 | ||
| logger.exception(f"Failed: {fv_name}") |
- Pod calls apply_materialization via gRPC after each successful FV, setting state to AVAILABLE_ONLINE. Server reads FV state post-completion to determine per-FV success/failure in batched SparkApplication runs. - registry_address is now mandatory (simplified from complex path heuristic). - Dockerfile rewritten to install feast from source (matches K8s engine pattern). - Unit tests updated: 15/15 pass (3 new tests for _build_per_fv_jobs). - E2E validated: 5 FVs x 9600 rows, 5 executors, all AVAILABLE_ONLINE. Signed-off-by: Aniket Paluskar <apaluska@redhat.com>
Signed-off-by: Aniket Paluskar <apaluska@redhat.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
Core changes
Upstream (feature_store.py):
New engine (spark_application/):
Design decisions
Validated on
Test plan
Which issue(s) this PR fixes:
Checks
git commit -s)Testing Strategy
Misc