Skip to content

fix(connectors): handle unrecognized types with explicit coverage and diagnostics#3196

Open
atharvalade wants to merge 3 commits into
apache:masterfrom
atharvalade:fix/postgres-unrecognized-types-coerce-string
Open

fix(connectors): handle unrecognized types with explicit coverage and diagnostics#3196
atharvalade wants to merge 3 commits into
apache:masterfrom
atharvalade:fix/postgres-unrecognized-types-coerce-string

Conversation

@atharvalade
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #3175

Rationale

Unrecognized Postgres types silently coerced to String, returning opaque InvalidRecord on failure with no column or type context.

What changed?

The default _ branch in extract_column_value attempted to read any unrecognized Postgres type as Option<String>, failing silently with Error::InvalidRecord and no indication of which column or type caused it.

Added explicit handlers for DATE, TIME, INTERVAL, OID, NAME, BPCHAR, and 13 array types (_BOOL, _INT2, _INT4, _INT8, _FLOAT4, _FLOAT8, _TEXT, _VARCHAR, _CHAR, _BPCHAR, _UUID, _JSON, _JSONB). The default branch now logs a warning with column name and type before attempting String fallback, and returns InvalidRecordValue with a diagnostic message on failure.

Local Execution

  • Passed
  • Pre-commit hooks ran

AI Usage

  1. Opus 4.6
  2. Scaffolding, Writing comments, Writing PR Description
  3. Verified via cargo check, clippy, fmt, and full test suite (24/24 pass)
  4. Yes, all code can be explained

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 25.34435% with 271 lines in your changes missing coverage. Please review.
✅ Project coverage is 16.66%. Comparing base (7bbe10c) to head (161660b).

Files with missing lines Patch % Lines
core/connectors/sources/postgres_source/src/lib.rs 25.34% 269 Missing and 2 partials ⚠️

❌ Your patch check has failed because the patch coverage (25.34%) is below the target coverage (50.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@              Coverage Diff              @@
##             master    #3196       +/-   ##
=============================================
- Coverage     74.44%   16.66%   -57.79%     
  Complexity      943      943               
=============================================
  Files          1231     1229        -2     
  Lines        120911   105982    -14929     
  Branches      97647    82747    -14900     
=============================================
- Hits          90017    17665    -72352     
- Misses        27943    87881    +59938     
+ Partials       2951      436     -2515     
Components Coverage Δ
Rust Core 0.91% <25.34%> (-74.68%) ⬇️
Java SDK 58.44% <ø> (ø)
C# SDK 69.41% <ø> (-0.52%) ⬇️
Python SDK 81.06% <ø> (ø)
Node SDK 91.53% <ø> (ø)
Go SDK 40.20% <ø> (ø)
Files with missing lines Coverage Δ
core/connectors/sources/postgres_source/src/lib.rs 37.49% <25.34%> (-30.37%) ⬇️

... and 725 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@atharvalade atharvalade changed the title fix(postgres): handle unrecognized types with explicit coverage and diagnostics fix(connectors): handle unrecognized types with explicit coverage and diagnostics Apr 29, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.

If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR.

Thank you for your contribution!

@github-actions github-actions Bot added S-stale Inactive issue or pull request and removed S-stale Inactive issue or pull request labels May 7, 2026
@hubcio
Copy link
Copy Markdown
Contributor

hubcio commented May 14, 2026

/ready

@github-actions github-actions Bot added the S-waiting-on-review PR is waiting on a reviewer label May 14, 2026
Copy link
Copy Markdown
Contributor

@slbotbm slbotbm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/author

Comment thread core/connectors/sources/postgres_source/src/lib.rs
Comment thread core/connectors/sources/postgres_source/src/lib.rs Outdated
Copy link
Copy Markdown
Contributor

@slbotbm slbotbm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, the current code also does not handle the following types:

  • TIMETZ
  • TIMETZ[]
  • DATE[]
  • TIME[]
  • TIMESTAMP[]
  • TIMESTAMPTZ[]
  • INTERVAL[]

Comment thread core/connectors/sources/postgres_source/src/lib.rs Outdated
Comment thread core/connectors/sources/postgres_source/src/lib.rs Outdated
Comment thread core/connectors/sources/postgres_source/src/lib.rs Outdated
@slbotbm
Copy link
Copy Markdown
Contributor

slbotbm commented May 14, 2026

/author

@slbotbm
Copy link
Copy Markdown
Contributor

slbotbm commented May 16, 2026

/author

@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels May 16, 2026
@atharvalade atharvalade force-pushed the fix/postgres-unrecognized-types-coerce-string branch from 73fcce0 to bb6236a Compare May 17, 2026 06:44
@atharvalade
Copy link
Copy Markdown
Contributor Author

/ready

@github-actions github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels May 17, 2026
Comment on lines +1275 to +1286
_ => {
let column_name = column.name();
warn!(
"Column '{column_name}' has unrecognized Postgres type '{type_name}', \
attempting String conversion"
);
let value: Option<String> = row.try_get(column_index).map_err(|e| {
error!("Failed to read column '{column_name}' (type '{type_name}'): {e}");
Error::InvalidRecordValue(format!(
"column '{column_name}' has unsupported Postgres type '{type_name}'"
))
})?;
Copy link
Copy Markdown
Contributor

@slbotbm slbotbm May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String is only compatible with text-like types, not types like INET, CIDR, MACADDR, MONEY, BIT/VARBIT. This fallback will fail hard for those cases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default branch now uses try_get_raw() + as_str() which reads the raw text representation directly from the wire without sqlx type-checking, so it works for any type Postgres can render as text.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your change assumes that the bytes will be UTF-8 compatible, and thus can be converted to string using as_str. This may not always be the case, and the connector will fail in those cases.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix for this does not seem to be trivial, and a follow up PR could fix this instead.

Comment on lines +924 to +931
"OID" => {
let value: Option<i32> = row
.try_get(column_index)
.map_err(|_| Error::InvalidRecord)?;
Ok(value
.map(|v| serde_json::Value::from(v as u32 as u64))
.unwrap_or(serde_json::Value::Null))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For OID and OID[], could you write a temporary test and check whether this works? From what I understand, sqlx may reject this since try_get checks compatibility before decoding, and fails on mismatched SQL/Rust types. sqlx::postgres::types::Oid being the native type for OID, i32 would still be rejected.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now decoding via sqlx::postgres::types::Oid which is the native sqlx type for OID (wraps u32, implements Decode and PgHasArrayType); OID[] also uses Vec<Option<Oid>> separately from INT4[]

@slbotbm
Copy link
Copy Markdown
Contributor

slbotbm commented May 18, 2026

/author

@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels May 18, 2026
@atharvalade atharvalade force-pushed the fix/postgres-unrecognized-types-coerce-string branch from 97d4024 to f2cec44 Compare May 22, 2026 06:01
@atharvalade
Copy link
Copy Markdown
Contributor Author

/ready

@github-actions github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels May 22, 2026
Comment thread Cargo.lock
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did u do this many changes in Cargo.lock? you're only adding chrono dep to postgres_source.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased cleanly now, please check

@github-actions github-actions Bot removed the S-waiting-on-review PR is waiting on a reviewer label May 22, 2026
@github-actions github-actions Bot added the S-waiting-on-author PR is waiting on author response label May 22, 2026
@atharvalade atharvalade force-pushed the fix/postgres-unrecognized-types-coerce-string branch from f2cec44 to 43e5006 Compare May 23, 2026 03:26
@atharvalade
Copy link
Copy Markdown
Contributor Author

/ready

@github-actions github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels May 23, 2026
Comment on lines 1013 to 1019
"VARCHAR" | "TEXT" | "CHAR" | "NAME" | "BPCHAR" | "TIMETZ" => {
let value: Option<String> = row
.try_get(column_index)
.map_err(|_| Error::InvalidRecord)?;
Ok(value
.map(serde_json::Value::String)
.unwrap_or(serde_json::Value::Null))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIMETZ does not decode to String - it has a dedicated type: PgTimeTz. Current code will fail is TIMETZ is supplied.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIMETZ now decodes via sqlx::postgres::types::PgTimeTz which gives NaiveTime + FixedOffset, formatted as "HH:MM:SS+offset"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default branch now tries as_str() first and on failure falls back to base64-encoding the raw bytes via as_bytes(), so binary-format types like custom composites won't crash the connector

Comment on lines +1275 to +1286
_ => {
let column_name = column.name();
warn!(
"Column '{column_name}' has unrecognized Postgres type '{type_name}', \
attempting String conversion"
);
let value: Option<String> = row.try_get(column_index).map_err(|e| {
error!("Failed to read column '{column_name}' (type '{type_name}'): {e}");
Error::InvalidRecordValue(format!(
"column '{column_name}' has unsupported Postgres type '{type_name}'"
))
})?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your change assumes that the bytes will be UTF-8 compatible, and thus can be converted to string using as_str. This may not always be the case, and the connector will fail in those cases.

@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels May 23, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.

If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR.

Thank you for your contribution!

@github-actions github-actions Bot added the S-stale Inactive issue or pull request label May 31, 2026
@atharvalade atharvalade force-pushed the fix/postgres-unrecognized-types-coerce-string branch from 43e5006 to 84cc8be Compare May 31, 2026 17:37
@github-actions github-actions Bot removed the S-stale Inactive issue or pull request label May 31, 2026
@atharvalade atharvalade force-pushed the fix/postgres-unrecognized-types-coerce-string branch from 84cc8be to 1d7bd9b Compare May 31, 2026 17:42
@atharvalade
Copy link
Copy Markdown
Contributor Author

/ready

@github-actions github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels May 31, 2026
parts.push(format!(
"{} day{}",
interval.days,
if interval.days.abs() != 1 { "s" } else { "" }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An overflow occurs if days is i32::MIN.

postgres=# SELECT make_interval(days => -2147483648);
  make_interval   
------------------
 -2147483648 days
(1 row)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched from .abs() to .unsigned_abs() which returns u32 and handles i32::MIN without overflow

@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels Jun 1, 2026
@atharvalade
Copy link
Copy Markdown
Contributor Author

/ready

@github-actions github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels Jun 3, 2026
Copy link
Copy Markdown
Contributor

@Standing-Man Standing-Man left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise. The only issue is that the unknown-type fallback nominally returns a string, but it doesn’t actually go through PostgreSQL’s text conversion, so the output may be unreliable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review PR is waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unrecognized Postgres types silently coerce to String

4 participants