Skip to content

overriding microbatch macro for capturing the compiled code of the mi…#1000

Merged
GuyEshdat merged 18 commits into
masterfrom
microbatch-compiled-code
May 6, 2026
Merged

overriding microbatch macro for capturing the compiled code of the mi…#1000
GuyEshdat merged 18 commits into
masterfrom
microbatch-compiled-code

Conversation

@GuyEshdat
Copy link
Copy Markdown
Collaborator

@GuyEshdat GuyEshdat commented May 3, 2026

…crobatch models

Summary by CodeRabbit

  • New Features

    • Added support for capturing and storing compiled code from dbt microbatch incremental models, including event time and batch size configuration options.
    • Enhanced compiled code retrieval mechanism to access cached compiled code when the primary source is unavailable, improving data consistency.
  • Tests

    • Added integration tests verifying compiled code capture for microbatch models under various configurations and execution scenarios.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

👋 @GuyEshdat
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds support for capturing compiled code from dbt microbatch incremental models through a cache-backed macro system. New macros capture compiled code during model execution, store it in a scoped cache, and integrate with existing compiled code retrieval via a fallback mechanism. Integration tests validate capture with and without an override macro.

Changes

Microbatch Compiled Code Capture and Retrieval

Layer / File(s) Summary
Macro Infrastructure
macros/edr/dbt_artifacts/microbatch/capture_microbatch_compiled_code.sql
Two new macros: get_incremental_microbatch_sql(arg_dict) captures the model's compiled code during execution and delegates to the dispatched implementation; capture_microbatch_compiled_code_for_model() stores compiled code in an elementary cache keyed by unique_id to avoid duplication.
Utility Integration
macros/utils/graph/get_compiled_code.sql
get_compiled_code(node) macro now includes a fallback: if the dispatched implementation yields no compiled_code, it attempts to retrieve compiled_code from the elementary cache using node.get("unique_id").
Test Fixtures and Helpers
integration_tests/tests/test_dbt_artifacts/test_microbatch_compiled_code.py
Introduces _microbatch_model_sql() defining an incremental model with incremental_strategy="microbatch", event_time="order_date", batch_size="year", and adapter-specific partition config for bigquery and athena.
Test Execution and Validation
integration_tests/tests/test_dbt_artifacts/test_microbatch_compiled_code.py
Adds _run_microbatch_model_and_get_latest_success_result() helper to run the model and retrieve run results; _with_microbatch_override_macro() context manager to temporarily inject a get_incremental_microbatch_sql override macro.
Integration Tests
integration_tests/tests/test_dbt_artifacts/test_microbatch_compiled_code.py
test_microbatch_run_results_has_compiled_code() verifies that compiled code is captured and non-empty when the override macro is present; test_microbatch_run_results_without_override_has_empty_compiled_code() confirms compiled code remains empty without the override macro (skipped for vertica and when dbt fusion is enabled).

Sequence Diagram

sequenceDiagram
    participant TestRunner as Test Runner
    participant DbtExecution as dbt Model<br/>Execution
    participant Macro as get_incremental_<br/>microbatch_sql
    participant Cache as Elementary<br/>Cache
    participant Retrieval as get_compiled_<br/>code Utility

    TestRunner->>DbtExecution: Run microbatch model
    DbtExecution->>Macro: Invoke during incremental resolution
    Macro->>Macro: Capture current model's compiled_code
    Macro->>Cache: Store compiled_code by unique_id
    Cache-->>Macro: Cache updated
    Macro-->>DbtExecution: Return dispatched SQL
    DbtExecution-->>TestRunner: Execution complete
    
    TestRunner->>Retrieval: Query compiled_code for node
    Retrieval->>Retrieval: Check dispatched result
    alt Dispatched compiled_code exists
        Retrieval-->>TestRunner: Return compiled_code
    else Dispatched compiled_code missing
        Retrieval->>Cache: Lookup by unique_id
        Cache-->>Retrieval: Return cached compiled_code
        Retrieval-->>TestRunner: Return cached compiled_code
    end
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A microbatch model now shines so bright,
With compiled code captured in cache's light,
Elementary tracks each unique dance,
While fallbacks ensure we don't miss a chance,
Tests verify all, both with and without— 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective: implementing a microbatch macro override to capture compiled code for microbatch models, which is reflected in all three file changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch microbatch-compiled-code

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
integration_tests/tests/test_dbt_artifacts/test_microbatch_compiled_code.py (2)

30-33: ⚡ Quick win

Avoid hardcoding the project/package segment in unique_id.

Line 30 assumes model.elementary_tests.*. If project name changes, the test can fail even when run results are correct.

Proposed fix
-    unique_id = f"model.elementary_tests.{test_id}"
     run_results = dbt_project.read_table(
         "dbt_run_results",
-        where=f"unique_id = '{unique_id}' and status = 'success'",
+        where=f"name = '{test_id}' and resource_type = 'model' and status = 'success'",
         order_by="generated_at desc",
         limit=1,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/test_dbt_artifacts/test_microbatch_compiled_code.py`
around lines 30 - 33, The test hardcodes the package segment in unique_id
("model.elementary_tests.{test_id}") which breaks if the project/package name
changes; update the construction of unique_id in
test_microbatch_compiled_code.py to derive the package/project segment
dynamically (e.g., obtain the package/project name from the dbt_project fixture
or manifest) before calling dbt_project.read_table, so unique_id is built as
f"model.{package_name}.{test_id}" (refer to the unique_id variable and the
dbt_project.read_table call to locate where to change this).

5-5: ⚡ Quick win

Restore disable_run_results after the test.

Line 5 mutates shared runner state and does not restore it. This can create cross-test coupling when fixture scope is not function-level.

Proposed fix
 def test_microbatch_run_results_has_compiled_code(test_id: str, dbt_project: DbtProject):
-    dbt_project.dbt_runner.vars["disable_run_results"] = False
+    previous_disable_run_results = dbt_project.dbt_runner.vars.get("disable_run_results")
+    dbt_project.dbt_runner.vars["disable_run_results"] = False
 
-    model_sql = """
+    try:
+        model_sql = """
 {{ config(
     materialized='incremental',
     incremental_strategy='microbatch',
@@
 from {{ ref('one') }}
 """
 
-    with dbt_project.create_temp_model_for_existing_table(
-        test_id, raw_code=model_sql
-    ) as model_path:
-        dbt_project.dbt_runner.run(select=str(model_path))
+        with dbt_project.create_temp_model_for_existing_table(
+            test_id, raw_code=model_sql
+        ) as model_path:
+            dbt_project.dbt_runner.run(select=str(model_path))
 
-    unique_id = f"model.elementary_tests.{test_id}"
-    run_results = dbt_project.read_table(
-        "dbt_run_results",
-        where=f"unique_id = '{unique_id}' and status = 'success'",
-        order_by="generated_at desc",
-        limit=1,
-    )
-    assert run_results, "Expected a successful run result row for microbatch model"
-    assert run_results[0]["compiled_code"], (
-        "Expected compiled_code to be populated for successful microbatch model run result"
-    )
+        unique_id = f"model.elementary_tests.{test_id}"
+        run_results = dbt_project.read_table(
+            "dbt_run_results",
+            where=f"unique_id = '{unique_id}' and status = 'success'",
+            order_by="generated_at desc",
+            limit=1,
+        )
+        assert run_results, "Expected a successful run result row for microbatch model"
+        assert run_results[0]["compiled_code"], (
+            "Expected compiled_code to be populated for successful microbatch model run result"
+        )
+    finally:
+        if previous_disable_run_results is None:
+            dbt_project.dbt_runner.vars.pop("disable_run_results", None)
+        else:
+            dbt_project.dbt_runner.vars["disable_run_results"] = previous_disable_run_results
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/test_dbt_artifacts/test_microbatch_compiled_code.py`
at line 5, The test mutates shared state by setting
dbt_project.dbt_runner.vars["disable_run_results"] = False and does not restore
it; fix by saving the original value (orig =
dbt_project.dbt_runner.vars.get("disable_run_results")), set the key for the
test, and ensure restoration in a finally block (or use pytest
fixture/monkeypatch to set and revert) so
dbt_project.dbt_runner.vars["disable_run_results"] is reset to orig (or deleted
if it was absent) after the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@integration_tests/tests/test_dbt_artifacts/test_microbatch_compiled_code.py`:
- Around line 30-33: The test hardcodes the package segment in unique_id
("model.elementary_tests.{test_id}") which breaks if the project/package name
changes; update the construction of unique_id in
test_microbatch_compiled_code.py to derive the package/project segment
dynamically (e.g., obtain the package/project name from the dbt_project fixture
or manifest) before calling dbt_project.read_table, so unique_id is built as
f"model.{package_name}.{test_id}" (refer to the unique_id variable and the
dbt_project.read_table call to locate where to change this).
- Line 5: The test mutates shared state by setting
dbt_project.dbt_runner.vars["disable_run_results"] = False and does not restore
it; fix by saving the original value (orig =
dbt_project.dbt_runner.vars.get("disable_run_results")), set the key for the
test, and ensure restoration in a finally block (or use pytest
fixture/monkeypatch to set and revert) so
dbt_project.dbt_runner.vars["disable_run_results"] is reset to orig (or deleted
if it was absent) after the test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1446de91-45dc-4082-95f1-1e83579ba7cc

📥 Commits

Reviewing files that changed from the base of the PR and between 36e1956 and bbd163d.

📒 Files selected for processing (1)
  • integration_tests/tests/test_dbt_artifacts/test_microbatch_compiled_code.py

coderabbitai[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

Open in Devin Review

Comment on lines +3 to +7
{% if not compiled_code and node and node.get("unique_id") %}
{% set compiled_code = elementary.get_cache(
"microbatch_compiled_code_by_unique_id", {}
).get(node.get("unique_id")) %}
{% endif %}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Microbatch cache fallback bypasses Redshift %%% escaping

On Redshift, redshift__get_compiled_code (macros/utils/graph/get_compiled_code.sql:21-25) replaces % with %% to prevent SQL formatting errors when the compiled code is inserted as a string literal. The new microbatch cache fallback path (lines 3-7) retrieves compiled code directly from the cache without going through adapter.dispatch, so the Redshift-specific escaping is never applied. If a microbatch model's compiled SQL contains % characters, this will produce unescaped % in the INSERT into dbt_run_results, which can cause runtime SQL errors on Redshift (e.g., incomplete format or similar adapter-level failures).

Prompt for agents
In macros/utils/graph/get_compiled_code.sql, the new microbatch cache fallback (lines 3-7) returns raw compiled code, bypassing adapter-specific transformations such as the Redshift percent-escaping in redshift__get_compiled_code (lines 21-25). To fix this, the cached compiled code should be run through the same adapter-specific post-processing. One approach: after retrieving the code from the cache, apply the same adapter.dispatch or at minimum replicate the Redshift escaping logic. For example, you could introduce a new dispatchable macro like elementary.post_process_compiled_code(compiled_code) that is a no-op by default but does .replace("%", "%%") on Redshift, and call it on the cache-retrieved value. Alternatively, store the already-escaped code in the cache by calling the capture during redshift__get_compiled_code, but that would couple the cache to a specific adapter.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

{% endif %}

{% do compiled_code_by_unique_id.update({model_unique_id: model_compiled_code}) %}
{% do elementary.set_cache(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This set_cache could have a race if multiple dbt models run in parallel, and all of them update the initial dict.

I think instead you should create the microbatch_compiled_code_by_unique_id in the init_elementary_graph macro, and then I think you don't even need to do set_cache.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

…_unique_id map. Instead, we will initalize it when initalizing the graph, and only update the dict itself instead of updating the cache
@GuyEshdat GuyEshdat merged commit 91d04fd into master May 6, 2026
26 of 30 checks passed
@GuyEshdat GuyEshdat deleted the microbatch-compiled-code branch May 6, 2026 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants