overriding microbatch macro for capturing the compiled code of the mi… by GuyEshdat · Pull Request #1000 · elementary-data/dbt-data-reliability

GuyEshdat · 2026-05-03T20:57:15Z

…crobatch models

Summary by CodeRabbit

New Features
- Added support for capturing and storing compiled code from dbt microbatch incremental models, including event time and batch size configuration options.
- Enhanced compiled code retrieval mechanism to access cached compiled code when the primary source is unavailable, improving data consistency.
Tests
- Added integration tests verifying compiled code capture for microbatch models under various configurations and execution scenarios.

…crobatch models

github-actions · 2026-05-03T20:57:23Z

👋 @GuyEshdat
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

coderabbitai · 2026-05-03T20:57:25Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds support for capturing compiled code from dbt microbatch incremental models through a cache-backed macro system. New macros capture compiled code during model execution, store it in a scoped cache, and integrate with existing compiled code retrieval via a fallback mechanism. Integration tests validate capture with and without an override macro.

Changes

Microbatch Compiled Code Capture and Retrieval

Layer / File(s)	Summary
Macro Infrastructure `macros/edr/dbt_artifacts/microbatch/capture_microbatch_compiled_code.sql`	Two new macros: `get_incremental_microbatch_sql(arg_dict)` captures the model's compiled code during execution and delegates to the dispatched implementation; `capture_microbatch_compiled_code_for_model()` stores compiled code in an `elementary` cache keyed by `unique_id` to avoid duplication.
Utility Integration `macros/utils/graph/get_compiled_code.sql`	`get_compiled_code(node)` macro now includes a fallback: if the dispatched implementation yields no `compiled_code`, it attempts to retrieve `compiled_code` from the `elementary` cache using `node.get("unique_id")`.
Test Fixtures and Helpers `integration_tests/tests/test_dbt_artifacts/test_microbatch_compiled_code.py`	Introduces `_microbatch_model_sql()` defining an incremental model with `incremental_strategy="microbatch"`, `event_time="order_date"`, `batch_size="year"`, and adapter-specific partition config for `bigquery` and `athena`.
Test Execution and Validation `integration_tests/tests/test_dbt_artifacts/test_microbatch_compiled_code.py`	Adds `_run_microbatch_model_and_get_latest_success_result()` helper to run the model and retrieve run results; `_with_microbatch_override_macro()` context manager to temporarily inject a `get_incremental_microbatch_sql` override macro.
Integration Tests `integration_tests/tests/test_dbt_artifacts/test_microbatch_compiled_code.py`	`test_microbatch_run_results_has_compiled_code()` verifies that compiled code is captured and non-empty when the override macro is present; `test_microbatch_run_results_without_override_has_empty_compiled_code()` confirms compiled code remains empty without the override macro (skipped for `vertica` and when dbt fusion is enabled).

Sequence Diagram

sequenceDiagram
    participant TestRunner as Test Runner
    participant DbtExecution as dbt Model<br/>Execution
    participant Macro as get_incremental_<br/>microbatch_sql
    participant Cache as Elementary<br/>Cache
    participant Retrieval as get_compiled_<br/>code Utility

    TestRunner->>DbtExecution: Run microbatch model
    DbtExecution->>Macro: Invoke during incremental resolution
    Macro->>Macro: Capture current model's compiled_code
    Macro->>Cache: Store compiled_code by unique_id
    Cache-->>Macro: Cache updated
    Macro-->>DbtExecution: Return dispatched SQL
    DbtExecution-->>TestRunner: Execution complete
    
    TestRunner->>Retrieval: Query compiled_code for node
    Retrieval->>Retrieval: Check dispatched result
    alt Dispatched compiled_code exists
        Retrieval-->>TestRunner: Return compiled_code
    else Dispatched compiled_code missing
        Retrieval->>Cache: Lookup by unique_id
        Cache-->>Retrieval: Return cached compiled_code
        Retrieval-->>TestRunner: Return cached compiled_code
    end

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A microbatch model now shines so bright,
With compiled code captured in cache's light,
Elementary tracks each unique dance,
While fallbacks ensure we don't miss a chance,
Tests verify all, both with and without— 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main objective: implementing a microbatch macro override to capture compiled code for microbatch models, which is reflected in all three file changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch microbatch-compiled-code

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

integration_tests/tests/test_dbt_artifacts/test_microbatch_compiled_code.py (2)

30-33: ⚡ Quick win

Avoid hardcoding the project/package segment in unique_id.

Line 30 assumes model.elementary_tests.*. If project name changes, the test can fail even when run results are correct.

Proposed fix

-    unique_id = f"model.elementary_tests.{test_id}"
     run_results = dbt_project.read_table(
         "dbt_run_results",
-        where=f"unique_id = '{unique_id}' and status = 'success'",
+        where=f"name = '{test_id}' and resource_type = 'model' and status = 'success'",
         order_by="generated_at desc",
         limit=1,
     )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/test_dbt_artifacts/test_microbatch_compiled_code.py`
around lines 30 - 33, The test hardcodes the package segment in unique_id
("model.elementary_tests.{test_id}") which breaks if the project/package name
changes; update the construction of unique_id in
test_microbatch_compiled_code.py to derive the package/project segment
dynamically (e.g., obtain the package/project name from the dbt_project fixture
or manifest) before calling dbt_project.read_table, so unique_id is built as
f"model.{package_name}.{test_id}" (refer to the unique_id variable and the
dbt_project.read_table call to locate where to change this).

5-5: ⚡ Quick win

Restore disable_run_results after the test.

Line 5 mutates shared runner state and does not restore it. This can create cross-test coupling when fixture scope is not function-level.

Proposed fix

 def test_microbatch_run_results_has_compiled_code(test_id: str, dbt_project: DbtProject):
-    dbt_project.dbt_runner.vars["disable_run_results"] = False
+    previous_disable_run_results = dbt_project.dbt_runner.vars.get("disable_run_results")
+    dbt_project.dbt_runner.vars["disable_run_results"] = False
 
-    model_sql = """
+    try:
+        model_sql = """
 {{ config(
     materialized='incremental',
     incremental_strategy='microbatch',
@@
 from {{ ref('one') }}
 """
 
-    with dbt_project.create_temp_model_for_existing_table(
-        test_id, raw_code=model_sql
-    ) as model_path:
-        dbt_project.dbt_runner.run(select=str(model_path))
+        with dbt_project.create_temp_model_for_existing_table(
+            test_id, raw_code=model_sql
+        ) as model_path:
+            dbt_project.dbt_runner.run(select=str(model_path))
 
-    unique_id = f"model.elementary_tests.{test_id}"
-    run_results = dbt_project.read_table(
-        "dbt_run_results",
-        where=f"unique_id = '{unique_id}' and status = 'success'",
-        order_by="generated_at desc",
-        limit=1,
-    )
-    assert run_results, "Expected a successful run result row for microbatch model"
-    assert run_results[0]["compiled_code"], (
-        "Expected compiled_code to be populated for successful microbatch model run result"
-    )
+        unique_id = f"model.elementary_tests.{test_id}"
+        run_results = dbt_project.read_table(
+            "dbt_run_results",
+            where=f"unique_id = '{unique_id}' and status = 'success'",
+            order_by="generated_at desc",
+            limit=1,
+        )
+        assert run_results, "Expected a successful run result row for microbatch model"
+        assert run_results[0]["compiled_code"], (
+            "Expected compiled_code to be populated for successful microbatch model run result"
+        )
+    finally:
+        if previous_disable_run_results is None:
+            dbt_project.dbt_runner.vars.pop("disable_run_results", None)
+        else:
+            dbt_project.dbt_runner.vars["disable_run_results"] = previous_disable_run_results

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/test_dbt_artifacts/test_microbatch_compiled_code.py`
at line 5, The test mutates shared state by setting
dbt_project.dbt_runner.vars["disable_run_results"] = False and does not restore
it; fix by saving the original value (orig =
dbt_project.dbt_runner.vars.get("disable_run_results")), set the key for the
test, and ensure restoration in a finally block (or use pytest
fixture/monkeypatch to set and revert) so
dbt_project.dbt_runner.vars["disable_run_results"] is reset to orig (or deleted
if it was absent) after the test.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@integration_tests/tests/test_dbt_artifacts/test_microbatch_compiled_code.py`:
- Around line 30-33: The test hardcodes the package segment in unique_id
("model.elementary_tests.{test_id}") which breaks if the project/package name
changes; update the construction of unique_id in
test_microbatch_compiled_code.py to derive the package/project segment
dynamically (e.g., obtain the package/project name from the dbt_project fixture
or manifest) before calling dbt_project.read_table, so unique_id is built as
f"model.{package_name}.{test_id}" (refer to the unique_id variable and the
dbt_project.read_table call to locate where to change this).
- Line 5: The test mutates shared state by setting
dbt_project.dbt_runner.vars["disable_run_results"] = False and does not restore
it; fix by saving the original value (orig =
dbt_project.dbt_runner.vars.get("disable_run_results")), set the key for the
test, and ensure restoration in a finally block (or use pytest
fixture/monkeypatch to set and revert) so
dbt_project.dbt_runner.vars["disable_run_results"] is reset to orig (or deleted
if it was absent) after the test.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1446de91-45dc-4082-95f1-1e83579ba7cc

📥 Commits

Reviewing files that changed from the base of the PR and between 36e1956 and bbd163d.

📒 Files selected for processing (1)

integration_tests/tests/test_dbt_artifacts/test_microbatch_compiled_code.py

devin-ai-integration

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

devin-ai-integration · 2026-05-05T09:33:08Z

+    {% if not compiled_code and node and node.get("unique_id") %}
+        {% set compiled_code = elementary.get_cache(
+            "microbatch_compiled_code_by_unique_id", {}
+        ).get(node.get("unique_id")) %}
+    {% endif %}


🔴 Microbatch cache fallback bypasses Redshift % → %% escaping

On Redshift, redshift__get_compiled_code (macros/utils/graph/get_compiled_code.sql:21-25) replaces % with %% to prevent SQL formatting errors when the compiled code is inserted as a string literal. The new microbatch cache fallback path (lines 3-7) retrieves compiled code directly from the cache without going through adapter.dispatch, so the Redshift-specific escaping is never applied. If a microbatch model's compiled SQL contains % characters, this will produce unescaped % in the INSERT into dbt_run_results, which can cause runtime SQL errors on Redshift (e.g., incomplete format or similar adapter-level failures).

Prompt for agents

In macros/utils/graph/get_compiled_code.sql, the new microbatch cache fallback (lines 3-7) returns raw compiled code, bypassing adapter-specific transformations such as the Redshift percent-escaping in redshift__get_compiled_code (lines 21-25). To fix this, the cached compiled code should be run through the same adapter-specific post-processing. One approach: after retrieving the code from the cache, apply the same adapter.dispatch or at minimum replicate the Redshift escaping logic. For example, you could introduce a new dispatchable macro like elementary.post_process_compiled_code(compiled_code) that is a no-op by default but does .replace("%", "%%") on Redshift, and call it on the cache-retrieved value. Alternatively, store the already-escaped code in the cache by calling the capture during redshift__get_compiled_code, but that would couple the cache to a specific adapter.

Was this helpful? React with 👍 or 👎 to provide feedback.

haritamar · 2026-05-05T13:37:40Z

+    {% endif %}
+
+    {% do compiled_code_by_unique_id.update({model_unique_id: model_compiled_code}) %}
+    {% do elementary.set_cache(


This set_cache could have a race if multiple dbt models run in parallel, and all of them update the initial dict.

I think instead you should create the microbatch_compiled_code_by_unique_id in the init_elementary_graph macro, and then I think you don't even need to do set_cache.

…_unique_id map. Instead, we will initalize it when initalizing the graph, and only update the dict itself instead of updating the cache

…ity into microbatch-compiled-code

overriding microbatch macro for capturing the compiled code of the mi…

36e1956

…crobatch models

fix test

bbd163d

coderabbitai Bot reviewed May 3, 2026

View reviewed changes

GuyEshdat added 2 commits May 4, 2026 11:39

test fixes

b6aac49

test fixes

cecb3a6

This comment was marked as resolved.

Sign in to view

GuyEshdat added 10 commits May 4, 2026 12:52

test fixes

4271d1f

test fixes

1c26b1a

test fixes

7ae883b

test fixes

677b181

test fixes

a80518a

test fixes

dd629fd

test fixes

70ccd14

test fixes

736b2a1

improved comments and fixed tests for dremio and spark

9c2d74f

fix spark tests

20b2fae

devin-ai-integration Bot reviewed May 5, 2026

View reviewed changes

GuyEshdat added 2 commits May 5, 2026 13:32

skip spark as it has a different microbatch implementation

35a35ce

fixed compiled code formatting in redshift in microbatch models

b4407f0

haritamar reviewed May 5, 2026

View reviewed changes

avoiding race condition when updating the microbatch_compiled_code_by…

51d3872

…_unique_id map. Instead, we will initalize it when initalizing the graph, and only update the dict itself instead of updating the cache

haritamar approved these changes May 5, 2026

View reviewed changes

arbiv approved these changes May 5, 2026

View reviewed changes

Merge branch 'master' of github.com:elementary-data/dbt-data-reliabil…

6f358de

…ity into microbatch-compiled-code

GuyEshdat merged commit 91d04fd into master May 6, 2026
26 of 30 checks passed

GuyEshdat deleted the microbatch-compiled-code branch May 6, 2026 10:13

Conversation

GuyEshdat commented May 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

coderabbitai Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

GuyEshdat May 5, 2026

Choose a reason for hiding this comment

Uh oh!

haritamar May 5, 2026

Choose a reason for hiding this comment

Uh oh!

GuyEshdat May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GuyEshdat commented May 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 3, 2026 •

edited

Loading