fix(snowflake): bound get_schema_columns cache, drop table_type kwarg#28136
Open
ulixius9 wants to merge 3 commits into
Open
fix(snowflake): bound get_schema_columns cache, drop table_type kwarg#28136ulixius9 wants to merge 3 commits into
ulixius9 wants to merge 3 commits into
Conversation
…ype cache pollution Two related fixes that together stop the OOM seen ingesting Snowflake COM_US_IMDNA_ADL.AWB_INTERM (~13k wide tables) on a 4 GB pod: - metadata.py: stop forwarding table_type into inspector.get_columns(...). The kwarg ended up in SQLAlchemy's @reflection.cache key, so Regular vs View calls for the same schema got distinct keys and re-materialized the schema-wide column dict (~1.6 GB) at the first view. No dialect reads table_type from kw; the Stage/Stream branches above already consumed it for their early-returns. - utils.py: replace @reflection.cache on get_schema_columns with a bounded LRU (size 2 default, env OM_SNOWFLAKE_SCHEMA_COLUMNS_CACHE_SIZE). Stored on info_cache so it inherits per-thread isolation from _inspector_map. LRU recency keeps an actively-queried schema from being evicted by other threads' churn; on eviction the per-table get_columns cache entries for that schema are also cleared so the column data is actually freed (otherwise per-table refs pin the lists even after the schema-wide dict is evicted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code Review ✅ ApprovedBounds the Snowflake schema column cache with an LRU and removes the table_type kwarg to prevent memory exhaustion and cache pollution. No issues found. OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
Contributor
🔴 Playwright Results — 1 failure(s), 10 flaky✅ 4069 passed · ❌ 1 failed · 🟡 10 flaky · ⏭️ 92 skipped
Genuine Failures (failed on all attempts)❌
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
Two related fixes that together stop the OOM seen ingesting Snowflake
COM_US_IMDNA_ADL.AWB_INTERM(~13k wide tables) on a 4 GB pod (kernelSIGKILL mid-table, no traceback in the workflow log).
metadata.py— stop forwardingtable_typeintoinspector.get_columns(...). The kwarg ended up in SQLAlchemy's@reflection.cachekey, so Regular vs View calls for the same schemagot distinct keys and re-materialized the schema-wide column dict
(~1.6 GB) at the first view. No dialect reads
table_typefromkw;the Stage/Stream branches above the call already consumed it.
utils.py— replace@reflection.cacheonget_schema_columnswith a bounded LRU (size 2 default, env
OM_SNOWFLAKE_SCHEMA_COLUMNS_CACHE_SIZE). The LRU is stored oninfo_cache, inheriting the per-thread isolation that_inspector_mapalready provides. LRU recency keeps anactively-queried schema from being evicted by other threads' churn;
on eviction the per-table
get_columnscache entries for thatschema are also cleared so the column data is actually freed
(otherwise per-table refs pin the column lists even after the
schema-wide dict is evicted).
Root cause (from the customer log)
Memory walk:
_get_schema_columns(AWB_INTERM)runs 11m 36s_get_schema_columns(AWB_INTERM)runs AGAIN 6m 19sThree
@reflection.cache-decorated functions all cache-missedsimultaneously at 14:01 (
_current_database_schema,_get_schema_primary_keys,_get_schema_columns). The only input thatflipped across that boundary was
table_type(Regular for the lasttable → View for the first view), which was being forwarded as a kwarg
into
inspector.get_columns(...)and ended up in the cache key.Plus, without the LRU bound,
info_cacheis only cleared betweendatabases (
common_db_source.py:_release_engine), so any multi-schemarun accumulates every schema's column metadata in RAM for the full
database — also a latent OOM risk for any database with more than one
wide schema.
Test plan
test_snowflake_schema_columns_lru.py— same-schema cache hit, eviction over size, LRU recency protecting a long-running schema (the multi-thread "one slow schema + many fast ones" case), per-table entry cleanup on eviction, no-info_cache fallthrough, 90030 None cached, env-var overridetest_snowflake_table_type_cache_pollution.py— base-table and view kwargs (notable_type), table-vs-view kwargs identical, Stage early-return still worksmake py_format_checkcleanTuning
OM_SNOWFLAKE_SCHEMA_COLUMNS_CACHE_SIZE=1for the tightest bound (one schema at a time, no buffer slot) if memory is extremely constrained.🤖 Generated with Claude Code
Summary by Gitar
_get_table_names_and_typesto prevent FQN build failures from interrupting ingestion.get_schema_columnsto skip records with unparsable table names instead of crashing the process.This will update automatically on new commits.