Correctness fixes: LABEL escaping, dead columnstore guard, init port, misc#706
Correctness fixes: LABEL escaping, dead columnstore guard, init port, misc#706joshmarkovic wants to merge 8 commits into
Conversation
c479a29 to
1b56f99
Compare
4801c2e to
8da3878
Compare
There was a problem hiding this comment.
The fixes look correct, especially SET XACT_ABORT ON for the DML refresh swap and escaping query_tag before emitting OPTION (LABEL = ...).
I would still ask for regression tests before merging this part, because these are correctness-sensitive paths: DML refresh rollback on insert failure, query tags containing single quotes, the columnstore index guard against a schema-qualified table, and float type inference (this probably not a test, maybe there is a way to include as part of an existing one so we have better correctnes for example this also improves this part #702).
One concern with SET XACT_ABORT ON is that it is a session-level setting. If the dbt adapter reuses the same SQL Server connection for subsequent statements, leaving it enabled could change the behavior of later SQL executed on that connection. For example, a later statement that previously tolerated a recoverable runtime error inside a transaction could instead cause the whole transaction to abort and roll back. That may be desirable, but it should be evaluated as an intentional behavior change rather than left as an implicit side effect.
I would either reset XACT_ABORT after this macro, document why leaving it enabled is safe for this adapter’s connection/session lifecycle, and/or split this into a separate commit so the transaction-behavior change is isolated from the other fixes.
Edit note:
- One part where I think this XACT_ABORT is undesirable is on posthooks, current behavior is data remains updated even after post_hooks fail, would this be a whole behavior change around that it seems?
|
@joshmarkovic touched dml on #710 I suggest separating those edits from the other corrections. |
8da3878 to
994a9d5
Compare
A query_tag containing a single quote broke every query the adapter
emitted, and allowed injection into the OPTION clause. Escape via dbt's
cross-adapter escape_single_quotes() macro (quote doubling on this
adapter), the same helper the EXEC('...') wrappers here already use.
The underscore-joined name never resolves, so the existence check was always false and the DROP never ran. Use the relation's own quoted rendering (relation.include(database=False)), which OBJECT_ID resolves.
With a unique_key the default strategy emits a MERGE via get_incremental_merge_sql, not delete+insert as the comment claimed.
README documented dbt_sqlserver_use_default_schema_concat twice with conflicting flags-vs-vars guidance; merged into one section matching the code (behavior flag primary, vars fallback).
black/isort pre-commit hooks targeted py39 while requires-python is >=3.10 and the manual black-check already used py310; aligned both.
The clean target had a .PHONY declaration and recipe lines but no 'clean:' rule line, so 'make clean' did nothing.
994a9d5 to
b5e142b
Compare
|
@axellpadilla, thanks for the review! I removed the I rebased onto the latest |
|
@joshmarkovic can you send the xact abort changes pr to #710 branch please, I think this is safer at least without carefully inspecting implications, but you can also justify just keeping it on after checking all flow after this DML + post-hook (in-transaction, post-commit), considering flag enabled in #710. |
Batch of small, independent correctness fixes. Every change was verified against a live SQL Server 2022 container (
devops/server.Dockerfileimage): the relevant functional suites (test_query_options,test_index,test_data_types) pass, unit tests pass 110/110, and each fix has a targeted reproduction described below.1.
dbt initsuggested the Postgres portIssue:
profile_template.ymlshippedport: default: 5432— the Postgres port — so everydbt inituser accepting defaults got a non-working profile.Solution: Default changed to 1433. Verified through dbt's own
InitTask.generate_target_from_inputcode path: accepting the default yieldsport: 1433(int).2. Python
floatmapped to SQL ServerbigintIssue: The
datatypesmapping insqlserver_constants.py(consumed bySQLServerConnectionManager.data_type_code_to_nameto report column type names for query metadata) translated Pythonfloatto"bigint".Solution: Map it to
"float". Verified live: acast(1.5 as float)result column produces cursortype_code = <class 'float'>, which now resolves tofloat.3. A single quote in
query_tagbroke every emitted query (and allowed OPTION-clause injection)Issue:
get_query_options()(and the deprecatedapply_label()) interpolated the user-suppliedquery_tagconfig intoOPTION (LABEL = '...')without escaping. A tag containing'produced a syntax error in every statement the adapter emits for that model; a crafted tag could inject arbitrary text into the OPTION clause.Solution: Escape via dbt's cross-adapter
escape_single_quotes()macro (quote doubling'→''on this adapter; the same helper theEXEC('...')wrappers in this repo already use) at both build sites inmetadata.sql. Verified: a model withquery_tag: "rob's o'clock tag"builds cleanly on both code paths, including statements wrapped inEXEC('...')(where the pre-escaped label composes correctly with the wrapper's own quote doubling), and the fulltest_query_optionsfunctional suite passes (26/26) against a live SQL Server 2022.4. Columnstore-index existence guard was dead code; misleading comment on the default incremental strategy
Issue:
sqlserver__create_clustered_columnstore_indexguarded itsDROP INDEXwithobject_id('<schema>_<table>')— an underscore-joined name that never resolves (verified live: always NULL). TheIF EXISTSbranch could therefore never fire, and re-creating a CCI on a table that already has one fails with "You cannot create more than one clustered index". Separately, the comment inincremental_strategies.sqlclaimed the default strategy with aunique_keyperforms delete+insert, when it actually emits aMERGEviaget_incremental_merge_sql.Solution: Use the relation's own quoted rendering —
relation.include(database=False), which emitsobject_id('"schema"."table"')— inindexes.sql; the guard now finds the existing index and the macro drops + recreates it. Verified live:OBJECT_IDresolves the double-quoted form identically to brackets (including underQUOTED_IDENTIFIER OFF), and re-running the rendered batch against a table with an existing CCI drops and recreates it instead of failing. Comment corrected to say MERGE (separate commit).Known pre-existing limitation, unchanged here: tables built via a
__dbt_tmpintermediate + rename keep the index name<schema>_<table>__dbt_tmp_cci, which the macro's computed name does not match, so the guard cannot protect that case.5. Docs/tooling: duplicated README section, black target py39, broken
make cleandbt_sqlserver_use_default_schema_concattwice, with conflictingflags:-vs-vars:guidance. Merged into a single section matching the actual implementation (schema.sql:61-63: behavior flag first,varsas backwards-compat fallback).blackhook still used--target-version=py39, whilerequires-python >= 3.10and the manualblack-checkhook already targeted py310. Bumped the auto hook to py310 to match. (The original commit also bumped theisorttarget, but chore: consolidate isort, flake8, pycln and absolufy-imports into ruff #707 has since consolidated isort/flake8/pycln/absolufy-imports into ruff onmaster; after rebasing onto that, only the black target bump remains.) Verified black at py310 changes zero files, so no reformatting churn for contributors.cleantarget had a.PHONYdeclaration and recipe lines but theclean:rule line itself was missing, somake cleandid nothing. Restored (and it now shows inmake help).(Each of these three is its own commit.)