Skip to content

Correctness fixes: LABEL escaping, dead columnstore guard, init port, misc#706

Open
joshmarkovic wants to merge 8 commits into
dbt-msft:masterfrom
joshmarkovic:fix/audit-quick-wins
Open

Correctness fixes: LABEL escaping, dead columnstore guard, init port, misc#706
joshmarkovic wants to merge 8 commits into
dbt-msft:masterfrom
joshmarkovic:fix/audit-quick-wins

Conversation

@joshmarkovic

@joshmarkovic joshmarkovic commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Batch of small, independent correctness fixes. Every change was verified against a live SQL Server 2022 container (devops/server.Dockerfile image): the relevant functional suites (test_query_options, test_index, test_data_types) pass, unit tests pass 110/110, and each fix has a targeted reproduction described below.

Note: this batch originally also fixed the sys.types join in the catalog queries (joining on system_type_id, which is not unique, fans out and produces duplicate rows / wrong type names for UDT and sysname columns). That commit was dropped while merging master: #289 (persist-docs) independently fixed the same join, so the fix is already on master and is no longer part of this PR.

Note: this batch originally also added SET XACT_ABORT ON to the dml-refresh swap (preventing a failed swap from committing an empty target). That commit has been removed from this PR: the transaction/DML behavior is being handled in #710 (dbt_sqlserver_use_dbt_transactions), which modifies the same swap macro. To keep these correctness fixes independent of that work — and avoid a conflict on the same file — the change is dropped here and deferred to #710.


1. dbt init suggested the Postgres port

Issue: profile_template.yml shipped port: default: 5432 — the Postgres port — so every dbt init user accepting defaults got a non-working profile.

Solution: Default changed to 1433. Verified through dbt's own InitTask.generate_target_from_input code path: accepting the default yields port: 1433 (int).

2. Python float mapped to SQL Server bigint

Issue: The datatypes mapping in sqlserver_constants.py (consumed by SQLServerConnectionManager.data_type_code_to_name to report column type names for query metadata) translated Python float to "bigint".

Solution: Map it to "float". Verified live: a cast(1.5 as float) result column produces cursor type_code = <class 'float'>, which now resolves to float.

3. A single quote in query_tag broke every emitted query (and allowed OPTION-clause injection)

Issue: get_query_options() (and the deprecated apply_label()) interpolated the user-supplied query_tag config into OPTION (LABEL = '...') without escaping. A tag containing ' produced a syntax error in every statement the adapter emits for that model; a crafted tag could inject arbitrary text into the OPTION clause.

Solution: Escape via dbt's cross-adapter escape_single_quotes() macro (quote doubling ''' on this adapter; the same helper the EXEC('...') wrappers in this repo already use) at both build sites in metadata.sql. Verified: a model with query_tag: "rob's o'clock tag" builds cleanly on both code paths, including statements wrapped in EXEC('...') (where the pre-escaped label composes correctly with the wrapper's own quote doubling), and the full test_query_options functional suite passes (26/26) against a live SQL Server 2022.

4. Columnstore-index existence guard was dead code; misleading comment on the default incremental strategy

Issue: sqlserver__create_clustered_columnstore_index guarded its DROP INDEX with object_id('<schema>_<table>') — an underscore-joined name that never resolves (verified live: always NULL). The IF EXISTS branch could therefore never fire, and re-creating a CCI on a table that already has one fails with "You cannot create more than one clustered index". Separately, the comment in incremental_strategies.sql claimed the default strategy with a unique_key performs delete+insert, when it actually emits a MERGE via get_incremental_merge_sql.

Solution: Use the relation's own quoted rendering — relation.include(database=False), which emits object_id('"schema"."table"') — in indexes.sql; the guard now finds the existing index and the macro drops + recreates it. Verified live: OBJECT_ID resolves the double-quoted form identically to brackets (including under QUOTED_IDENTIFIER OFF), and re-running the rendered batch against a table with an existing CCI drops and recreates it instead of failing. Comment corrected to say MERGE (separate commit).

Known pre-existing limitation, unchanged here: tables built via a __dbt_tmp intermediate + rename keep the index name <schema>_<table>__dbt_tmp_cci, which the macro's computed name does not match, so the guard cannot protect that case.

5. Docs/tooling: duplicated README section, black target py39, broken make clean

  • README documented dbt_sqlserver_use_default_schema_concat twice, with conflicting flags:-vs-vars: guidance. Merged into a single section matching the actual implementation (schema.sql:61-63: behavior flag first, vars as backwards-compat fallback).
  • pre-commit: the auto black hook still used --target-version=py39, while requires-python >= 3.10 and the manual black-check hook already targeted py310. Bumped the auto hook to py310 to match. (The original commit also bumped the isort target, but chore: consolidate isort, flake8, pycln and absolufy-imports into ruff #707 has since consolidated isort/flake8/pycln/absolufy-imports into ruff on master; after rebasing onto that, only the black target bump remains.) Verified black at py310 changes zero files, so no reformatting churn for contributors.
  • Makefile: the clean target had a .PHONY declaration and recipe lines but the clean: rule line itself was missing, so make clean did nothing. Restored (and it now shows in make help).

(Each of these three is its own commit.)

@joshmarkovic joshmarkovic marked this pull request as ready for review June 11, 2026 15:01
@joshmarkovic joshmarkovic force-pushed the fix/audit-quick-wins branch from c479a29 to 1b56f99 Compare June 12, 2026 13:14
@joshmarkovic joshmarkovic changed the title Correctness fixes: dml-refresh data loss, catalog UDT types, LABEL escaping, dead columnstore guard, init port, misc Correctness fixes: dml-refresh data loss, LABEL escaping, dead columnstore guard, init port, misc Jun 12, 2026
@joshmarkovic joshmarkovic force-pushed the fix/audit-quick-wins branch 2 times, most recently from 4801c2e to 8da3878 Compare June 12, 2026 13:29
@axellpadilla axellpadilla added this to the v1.10.1rc1 milestone Jun 14, 2026

@axellpadilla axellpadilla left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixes look correct, especially SET XACT_ABORT ON for the DML refresh swap and escaping query_tag before emitting OPTION (LABEL = ...).

I would still ask for regression tests before merging this part, because these are correctness-sensitive paths: DML refresh rollback on insert failure, query tags containing single quotes, the columnstore index guard against a schema-qualified table, and float type inference (this probably not a test, maybe there is a way to include as part of an existing one so we have better correctnes for example this also improves this part #702).

One concern with SET XACT_ABORT ON is that it is a session-level setting. If the dbt adapter reuses the same SQL Server connection for subsequent statements, leaving it enabled could change the behavior of later SQL executed on that connection. For example, a later statement that previously tolerated a recoverable runtime error inside a transaction could instead cause the whole transaction to abort and roll back. That may be desirable, but it should be evaluated as an intentional behavior change rather than left as an implicit side effect.

I would either reset XACT_ABORT after this macro, document why leaving it enabled is safe for this adapter’s connection/session lifecycle, and/or split this into a separate commit so the transaction-behavior change is isolated from the other fixes.

Edit note:

  • One part where I think this XACT_ABORT is undesirable is on posthooks, current behavior is data remains updated even after post_hooks fail, would this be a whole behavior change around that it seems?

@axellpadilla

Copy link
Copy Markdown
Collaborator

@joshmarkovic touched dml on #710 I suggest separating those edits from the other corrections.

@joshmarkovic joshmarkovic force-pushed the fix/audit-quick-wins branch from 8da3878 to 994a9d5 Compare June 15, 2026 23:29
@joshmarkovic joshmarkovic changed the title Correctness fixes: dml-refresh data loss, LABEL escaping, dead columnstore guard, init port, misc Correctness fixes: LABEL escaping, dead columnstore guard, init port, misc Jun 15, 2026
A query_tag containing a single quote broke every query the adapter
emitted, and allowed injection into the OPTION clause. Escape via dbt's
cross-adapter escape_single_quotes() macro (quote doubling on this
adapter), the same helper the EXEC('...') wrappers here already use.
The underscore-joined name never resolves, so the existence check was
always false and the DROP never ran. Use the relation's own quoted
rendering (relation.include(database=False)), which OBJECT_ID resolves.
With a unique_key the default strategy emits a MERGE via
get_incremental_merge_sql, not delete+insert as the comment claimed.
README documented dbt_sqlserver_use_default_schema_concat twice with
conflicting flags-vs-vars guidance; merged into one section matching
the code (behavior flag primary, vars fallback).
black/isort pre-commit hooks targeted py39 while requires-python is
>=3.10 and the manual black-check already used py310; aligned both.
The clean target had a .PHONY declaration and recipe lines but no
'clean:' rule line, so 'make clean' did nothing.
@joshmarkovic joshmarkovic force-pushed the fix/audit-quick-wins branch from 994a9d5 to b5e142b Compare June 15, 2026 23:36
@joshmarkovic

joshmarkovic commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@axellpadilla, thanks for the review!

I removed the SET XACT_ABORT ON / dml-refresh commit from the PR. We can put it in #710, since it edits the same macro and is where transactions are handled.

I rebased onto the latest master to fix pre-commit.ci and testing seems to be covered: the query-tag, columnstore, and float fixes are already covered. Happy to add any anything else, let me know!

@axellpadilla

axellpadilla commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

@joshmarkovic can you send the xact abort changes pr to #710 branch please, I think this is safer at least without carefully inspecting implications, but you can also justify just keeping it on after checking all flow after this DML + post-hook (in-transaction, post-commit), considering flag enabled in #710.

-- 1. Declare a variable to store the original state
DECLARE @OriginalXactAbort INT;

-- 2. Check if the 16384 bit is set in @@OPTIONS
-- If the bitwise AND returns > 0, it was ON (1), otherwise OFF (0)
SET @OriginalXactAbort = CASE WHEN (@@OPTIONS & 16384) > 0 THEN 1 ELSE 0 END;

-- 3. Set XACT_ABORT to your desired state for the critical workload
SET XACT_ABORT ON;

BEGIN TRANSACTION;
    -- Your transactional logic / queries go here
    -- e.g., INSERT INTO MyTable ...
COMMIT TRANSACTION;

-- 4. Restore the original state dynamically
IF @OriginalXactAbort = 1
    SET XACT_ABORT ON;
ELSE
    SET XACT_ABORT OFF;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants