fix(snowflake): bound get_schema_columns cache, drop table_type kwarg by ulixius9 · Pull Request #28136 · open-metadata/OpenMetadata

ulixius9 · 2026-05-15T07:50:05Z

Summary

Two related fixes that together stop the OOM seen ingesting Snowflake
COM_US_IMDNA_ADL.AWB_INTERM (~13k wide tables) on a 4 GB pod (kernel
SIGKILL mid-table, no traceback in the workflow log).

metadata.py — stop forwarding table_type into
inspector.get_columns(...). The kwarg ended up in SQLAlchemy's
@reflection.cache key, so Regular vs View calls for the same schema
got distinct keys and re-materialized the schema-wide column dict
(~1.6 GB) at the first view. No dialect reads table_type from kw;
the Stage/Stream branches above the call already consumed it.
utils.py — replace @reflection.cache on get_schema_columns
with a bounded LRU (size 2 default, env
OM_SNOWFLAKE_SCHEMA_COLUMNS_CACHE_SIZE). The LRU is stored on
info_cache, inheriting the per-thread isolation that
_inspector_map already provides. LRU recency keeps an
actively-queried schema from being evicted by other threads' churn;
on eviction the per-table get_columns cache entries for that
schema are also cleared so the column data is actually freed
(otherwise per-table refs pin the column lists even after the
schema-wide dict is evicted).

Root cause (from the customer log)

Memory walk:

Time	Event	Memory
13:34–13:45	`_get_schema_columns(AWB_INTERM)` runs 11m 36s	740 → 2424 MB
13:46–14:01	5263 BASE TABLEs stream; cache hits	flat ~2.45 GB
14:01–14:07	`_get_schema_columns(AWB_INTERM)` runs AGAIN 6m 19s	jumps to 4053 MB
14:40	log cuts off mid-table, 0 errors in log	SIGKILL

Three @reflection.cache-decorated functions all cache-missed
simultaneously at 14:01 (_current_database_schema,
_get_schema_primary_keys, _get_schema_columns). The only input that
flipped across that boundary was table_type (Regular for the last
table → View for the first view), which was being forwarded as a kwarg
into inspector.get_columns(...) and ended up in the cache key.

Plus, without the LRU bound, info_cache is only cleared between
databases (common_db_source.py:_release_engine), so any multi-schema
run accumulates every schema's column metadata in RAM for the full
database — also a latent OOM risk for any database with more than one
wide schema.

Test plan

7 new tests in test_snowflake_schema_columns_lru.py — same-schema cache hit, eviction over size, LRU recency protecting a long-running schema (the multi-thread "one slow schema + many fast ones" case), per-table entry cleanup on eviction, no-info_cache fallthrough, 90030 None cached, env-var override
4 new tests in test_snowflake_table_type_cache_pollution.py — base-table and view kwargs (no table_type), table-vs-view kwargs identical, Stage early-return still works
26 existing Snowflake unit tests green
make py_format_check clean
Smoke-tested locally against a real Snowflake account by the reporter

Tuning

OM_SNOWFLAKE_SCHEMA_COLUMNS_CACHE_SIZE=1 for the tightest bound (one schema at a time, no buffer slot) if memory is extremely constrained.
Default 2 (current + just-finished) covers the table→view-transition use case and lets long-running schemas stay resident while smaller schemas cycle through.

🤖 Generated with Claude Code

Summary by Gitar

Resilience and error handling:
- Added robust exception handling in _get_table_names_and_types to prevent FQN build failures from interrupting ingestion.
- Added error handling in get_schema_columns to skip records with unparsable table names instead of crashing the process.

_{This will update automatically on new commits.}

…ype cache pollution Two related fixes that together stop the OOM seen ingesting Snowflake COM_US_IMDNA_ADL.AWB_INTERM (~13k wide tables) on a 4 GB pod: - metadata.py: stop forwarding table_type into inspector.get_columns(...). The kwarg ended up in SQLAlchemy's @reflection.cache key, so Regular vs View calls for the same schema got distinct keys and re-materialized the schema-wide column dict (~1.6 GB) at the first view. No dialect reads table_type from kw; the Stage/Stream branches above already consumed it for their early-returns. - utils.py: replace @reflection.cache on get_schema_columns with a bounded LRU (size 2 default, env OM_SNOWFLAKE_SCHEMA_COLUMNS_CACHE_SIZE). Stored on info_cache so it inherits per-thread isolation from _inspector_map. LRU recency keeps an actively-queried schema from being evicted by other threads' churn; on eviction the per-table get_columns cache entries for that schema are also cleared so the column data is actually freed (otherwise per-table refs pin the lists even after the schema-wide dict is evicted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

gitar-bot · 2026-05-15T14:56:39Z

Code Review ✅ Approved

Bounds the Snowflake schema column cache with an LRU and removes the table_type kwarg to prevent memory exhaustion and cache pollution. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-05-15T16:16:17Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
96.2% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-05-15T17:18:22Z

🔴 Playwright Results — 1 failure(s), 10 flaky

✅ 4069 passed · ❌ 1 failed · 🟡 10 flaky · ⏭️ 92 skipped

Shard	Passed	Failed	Flaky	Skipped
🔴 Shard 1	297	1	1	4
🟡 Shard 2	757	0	5	14
🟡 Shard 3	780	0	1	7
🟡 Shard 4	789	0	1	18
🟡 Shard 5	708	0	1	41
🟡 Shard 6	738	0	1	8

Genuine Failures (failed on all attempts)

❌ Pages/SearchIndexApplication.spec.ts › Search Index Application (shard 1)

Error: �[2mexpect(�[22m�[31mreceived�[39m�[2m).�[22mtoEqual�[2m(�[22m�[32mexpected�[39m�[2m) // deep equality�[22m

Expected: �[32mStringMatching /success|activeError/g�[39m
Received: �[31m"failed"�[39m

🟡 10 flaky test(s) (passed on retry)

Features/TagsSuggestion.spec.ts › should decline suggested tags for a container column (shard 1, 1 retry)
Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2, 1 retry)
Features/KnowledgeCenterList.spec.ts › Knowledge Center List - Test infinite scroll/pagination (shard 2, 1 retry)
Features/KnowledgeCenterTextEditor.spec.ts › Rich Text Editor - Text Formatting (shard 2, 1 retry)
Features/KnowledgeCenterTextEditor.spec.ts › Rich Text Editor - Text Formatting (shard 2, 1 retry)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Set (shard 4, 1 retry)
Pages/ExplorePageRightPanel_KnowledgeCenter.spec.ts › Should remove user owner for knowledgeCenter (shard 5, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

ulixius9 requested a review from a team as a code owner May 15, 2026 07:50

Copilot AI review requested due to automatic review settings May 15, 2026 07:50

Merge branch 'main' into snowflake_cust_oom

735801c

github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels May 15, 2026

Copilot AI reviewed May 15, 2026

View reviewed changes

Merge branch 'main' into snowflake_cust_oom

a47f67b

ulixius9 temporarily deployed to test May 15, 2026 15:07 — with GitHub Actions Inactive

ulixius9 had a problem deploying to test May 15, 2026 15:07 — with GitHub Actions Failure

ulixius9 temporarily deployed to test May 15, 2026 15:07 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(snowflake): bound get_schema_columns cache, drop table_type kwarg#28136

fix(snowflake): bound get_schema_columns cache, drop table_type kwarg#28136
ulixius9 wants to merge 3 commits into
mainfrom
snowflake_cust_oom

ulixius9 commented May 15, 2026 •

edited by gitar-bot Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

gitar-bot Bot commented May 15, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ulixius9 commented May 15, 2026 • edited by gitar-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause (from the customer log)

Test plan

Tuning

Summary by Gitar

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 15, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

github-actions Bot commented May 15, 2026

🔴 Playwright Results — 1 failure(s), 10 flaky

Genuine Failures (failed on all attempts)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ulixius9 commented May 15, 2026 •

edited by gitar-bot Bot

Loading

gitar-bot Bot commented May 15, 2026 •

edited

Loading