feat: safe type expansion, NVARCHAR/NCHAR catalog fix, seed empty cell fix#702
feat: safe type expansion, NVARCHAR/NCHAR catalog fix, seed empty cell fix#702axellpadilla wants to merge 2 commits into
Conversation
…l fix - Add dbt_sqlserver_enable_safe_type_expansion flag for safe column type widening (varchar->nvarchar, integer family promotions, numeric precision/scale upgrades) - Add column_type_expansion_max_rows config (default 1,000,000 rows) - Add prefer_single_alter_column config for single ALTER COLUMN statement - Add string_type_instance() to preserve NVARCHAR/NCHAR type family - Fix catalog generation (user_type_id) so NVARCHAR/NCHAR no longer appear as SYSNAME - Fix is_numeric() to exclude money/smallmoney (now is_fixed_numeric()) - Fix seed table ingestion of empty numeric cells - Add tinyint/bit to is_integer() type list
|
Reviewing now, big change with lots of potential for issues so may take a bit longer. |
|
When safe type expansion is enabled, some column changes that don't actually fit the existing values. For example, it will widen an int to numeric(10,5) or numeric(10,2) to numeric(12,5), but in those instances we can overflow the integer portion. We can't just check the overall precision and scale, we need to check that the integer section is wide enough. |
Benjamin-Knight
left a comment
There was a problem hiding this comment.
Think there is an issue with the expansion of integers around precision and scale but otherwise just that one shadowed function I don't understand the basis of.
| self.alter_column_type(current, column_name, new_type) | ||
|
|
||
| @available.parse_none | ||
| def expand_target_column_types( |
There was a problem hiding this comment.
Is this not just a copy of the base SQLAdapter code, what is the override doing?
Summary
This PR addresses several long-standing issues with SQL Server native type handling during column expansion, catalog generation, and seed ingestion, considered similar problems, reproduced and handled on one run.
Closes: #701, #637, #425, #446
Supersedes: #606, thanks @Cogito
What Changed
1. SQL Server Native String Type Recognition (
sqlserver_column.py)is_string()now includesnvarcharandncharin addition tovarcharandcharstring_type_instance()— new instance method that preserves the original type family:nvarchar(n)emitsnvarchar(n)(notvarchar(n))nchar(n)emitsnchar(n)(notchar(n))varchar(n)/char(n)for non-Unicode typesdata_typeproperty now usesstring_type_instance()instead ofstring_type()is_number()now includesis_fixed_numeric()somoney/smallmoneyparticipate in numeric checks without being classified asis_numeric()is_fixed_numeric()— new method formoney/smallmoneyis_numeric()now excludesmoney/smallmoney(breaking change — see migration note below)is_integer()now includestinyintandbitcan_expand_to()— stricter: only allows same-family string size increases (e.g.,varchar(10)→varchar(25))can_expand_safe()— new method for flag-gated safe expansions:varchar(n)nvarchar(m)wherem >= nchar(n)nchar(m)wherem >= nbit→tinyint→smallint→int→bigintintnumeric(p,s)wherep >= 10numeric(p,s)numeric(p2,s2)wherep2 >= pands2 >= ssmallmoneymoneymoneynumeric(p,s)wherep >= 192. Safe Type Expansion Feature Flag (
sqlserver_adapter.py)New
dbt_sqlserver_enable_safe_type_expansionbehaviour flag (default:false):When enabled, the adapter's
expand_column_types()override performs:varchar(10)→varchar(25))column_type_expansion_max_rowsis not exceeded:varchar/char→nvarchar/ncharnumericwith sufficient precisionnumeric/decimalprecision/scale upgradessmallmoney→money→numeric)expand_target_column_types()— new public API that forwards themax_rowsparameter, called from incremental and snapshot materializations.alter_column_type()— new method that dispatches to thesqlserver__alter_column_typemacro, replacing the base adapter's implementation.3. Row-Count Guardrail (
column_type_expansion_max_rows)New per-model config (default:
1,000,000):{{ config(materialized='incremental', unique_key='id', column_type_expansion_max_rows=500000) }}-1to disable the check0to always skip safe expansion (only same-family string resizes proceed)4. Single ALTER COLUMN Mode (
prefer_single_alter_column)New per-model config (default:
false):{{ config(materialized='incremental', unique_key='id', prefer_single_alter_column=true) }}When
true, thesqlserver__alter_column_typemacro uses a singleALTER COLUMNstatement instead of the safer add+update+drop+rename pattern. This is faster for small/medium tables and instant for safe type expansions, but may fail for types that cannot be implicitly converted.5. Catalog Fix (
catalog.sql)Changed
sys.typesjoin fromsystem_type_idtouser_type_idin both catalog queries. This preventsNVARCHAR/NCHARcolumns from appearing asSYSNAMEindbt docs generateoutput. Fixes #637.6. Seed Empty Cell Fix (
helpers.sql)Changed seed CSV ingestion to inline
NULLliterals instead of binding empty cells as SQL parameters. Previously, an empty cell in anumeric(18,0)column would be bound as an empty string parameter, causing arithmetic overflow error 8115. Now empty cells emitnulldirectly in the VALUES clause. Fixes #425.7. Adapter Configs (
sqlserver_configs.py)Added two new optional config fields:
prefer_single_alter_column: Optional[bool] = Falsecolumn_type_expansion_max_rows: Optional[int] = None8. Unit Tests
test_sqlserver_column.py— Tests foris_string(),string_type_instance(),data_type,is_fixed_numeric(),is_numeric(),string_size()across all string/numeric type familiestest_can_expand_to.py— Parameterized tests forcan_expand_to()andcan_expand_safe()covering same-family resizes, cross-family promotions, integer family promotions, numeric precision/scale upgrades, fixed-money promotions, and prevented shrinking conversionstest_expand_column_types.py— Tests for the adapter'sexpand_column_types()method: row-count skip, max-rows=0 blocking, warning emission,max_rowsforwarding throughexpand_target_column_types()9. Functional Tests
test_column_type_expansion.py— Full integration tests for:on_schema_change='append_new_columns'on_schema_change='sync_all_columns'varchar→nvarcharblocked without flagvarchar→nvarcharworks with flagtest_catalog.py— Tests thatdbt docs generatedoes not returnSYSNAMEforNVARCHAR/NCHARcolumns (fixes Dbt Docs SYSNAME instead of NVARCHAR datatype #637)test_mssql_seed.py— Tests seeding anumeric(18,0)column with empty rows succeeds (fixes "Error 8115 Arithmetic overflow error converting nvarchar to data type numeric" when seeding a numeric column with some empty rows and specifying column_types #425)Breaking Changes / Migration Notes
moneyandsmallmoneycolumns are no longer classified asis_numeric(). If you have custom code or macros that depend onmoneybeing numeric:is_number()(covers all numeric types including money)is_fixed_numeric()for money types specificallyis_numeric()only fornumeric/decimaltypesRelated PRs & History