fix(connectors): handle unrecognized types with explicit coverage and diagnostics#3196
Conversation
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (25.34%) is below the target coverage (50.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## master #3196 +/- ##
=============================================
- Coverage 74.44% 16.66% -57.79%
Complexity 943 943
=============================================
Files 1231 1229 -2
Lines 120911 105982 -14929
Branches 97647 82747 -14900
=============================================
- Hits 90017 17665 -72352
- Misses 27943 87881 +59938
+ Partials 2951 436 -2515
🚀 New features to boost your workflow:
|
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR. Thank you for your contribution! |
|
/ready |
slbotbm
left a comment
There was a problem hiding this comment.
Separately, the current code also does not handle the following types:
- TIMETZ
- TIMETZ[]
- DATE[]
- TIME[]
- TIMESTAMP[]
- TIMESTAMPTZ[]
- INTERVAL[]
|
/author |
|
/author |
73fcce0 to
bb6236a
Compare
|
/ready |
| _ => { | ||
| let column_name = column.name(); | ||
| warn!( | ||
| "Column '{column_name}' has unrecognized Postgres type '{type_name}', \ | ||
| attempting String conversion" | ||
| ); | ||
| let value: Option<String> = row.try_get(column_index).map_err(|e| { | ||
| error!("Failed to read column '{column_name}' (type '{type_name}'): {e}"); | ||
| Error::InvalidRecordValue(format!( | ||
| "column '{column_name}' has unsupported Postgres type '{type_name}'" | ||
| )) | ||
| })?; |
There was a problem hiding this comment.
String is only compatible with text-like types, not types like INET, CIDR, MACADDR, MONEY, BIT/VARBIT. This fallback will fail hard for those cases.
There was a problem hiding this comment.
the default branch now uses try_get_raw() + as_str() which reads the raw text representation directly from the wire without sqlx type-checking, so it works for any type Postgres can render as text.
There was a problem hiding this comment.
Your change assumes that the bytes will be UTF-8 compatible, and thus can be converted to string using as_str. This may not always be the case, and the connector will fail in those cases.
There was a problem hiding this comment.
The fix for this does not seem to be trivial, and a follow up PR could fix this instead.
| "OID" => { | ||
| let value: Option<i32> = row | ||
| .try_get(column_index) | ||
| .map_err(|_| Error::InvalidRecord)?; | ||
| Ok(value | ||
| .map(|v| serde_json::Value::from(v as u32 as u64)) | ||
| .unwrap_or(serde_json::Value::Null)) | ||
| } |
There was a problem hiding this comment.
For OID and OID[], could you write a temporary test and check whether this works? From what I understand, sqlx may reject this since try_get checks compatibility before decoding, and fails on mismatched SQL/Rust types. sqlx::postgres::types::Oid being the native type for OID, i32 would still be rejected.
There was a problem hiding this comment.
now decoding via sqlx::postgres::types::Oid which is the native sqlx type for OID (wraps u32, implements Decode and PgHasArrayType); OID[] also uses Vec<Option<Oid>> separately from INT4[]
|
/author |
97d4024 to
f2cec44
Compare
|
/ready |
There was a problem hiding this comment.
why did u do this many changes in Cargo.lock? you're only adding chrono dep to postgres_source.
There was a problem hiding this comment.
Rebased cleanly now, please check
f2cec44 to
43e5006
Compare
|
/ready |
| "VARCHAR" | "TEXT" | "CHAR" | "NAME" | "BPCHAR" | "TIMETZ" => { | ||
| let value: Option<String> = row | ||
| .try_get(column_index) | ||
| .map_err(|_| Error::InvalidRecord)?; | ||
| Ok(value | ||
| .map(serde_json::Value::String) | ||
| .unwrap_or(serde_json::Value::Null)) |
There was a problem hiding this comment.
TIMETZ does not decode to String - it has a dedicated type: PgTimeTz. Current code will fail is TIMETZ is supplied.
There was a problem hiding this comment.
TIMETZ now decodes via sqlx::postgres::types::PgTimeTz which gives NaiveTime + FixedOffset, formatted as "HH:MM:SS+offset"
There was a problem hiding this comment.
the default branch now tries as_str() first and on failure falls back to base64-encoding the raw bytes via as_bytes(), so binary-format types like custom composites won't crash the connector
| _ => { | ||
| let column_name = column.name(); | ||
| warn!( | ||
| "Column '{column_name}' has unrecognized Postgres type '{type_name}', \ | ||
| attempting String conversion" | ||
| ); | ||
| let value: Option<String> = row.try_get(column_index).map_err(|e| { | ||
| error!("Failed to read column '{column_name}' (type '{type_name}'): {e}"); | ||
| Error::InvalidRecordValue(format!( | ||
| "column '{column_name}' has unsupported Postgres type '{type_name}'" | ||
| )) | ||
| })?; |
There was a problem hiding this comment.
Your change assumes that the bytes will be UTF-8 compatible, and thus can be converted to string using as_str. This may not always be the case, and the connector will fail in those cases.
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either Thank you for your contribution! |
43e5006 to
84cc8be
Compare
84cc8be to
1d7bd9b
Compare
|
/ready |
| parts.push(format!( | ||
| "{} day{}", | ||
| interval.days, | ||
| if interval.days.abs() != 1 { "s" } else { "" } |
There was a problem hiding this comment.
An overflow occurs if days is i32::MIN.
postgres=# SELECT make_interval(days => -2147483648);
make_interval
------------------
-2147483648 days
(1 row)
There was a problem hiding this comment.
switched from .abs() to .unsigned_abs() which returns u32 and handles i32::MIN without overflow
|
/ready |
Standing-Man
left a comment
There was a problem hiding this comment.
LGTM otherwise. The only issue is that the unknown-type fallback nominally returns a string, but it doesn’t actually go through PostgreSQL’s text conversion, so the output may be unreliable.
Which issue does this PR close?
Closes #3175
Rationale
Unrecognized Postgres types silently coerced to String, returning opaque
InvalidRecordon failure with no column or type context.What changed?
The default
_branch inextract_column_valueattempted to read any unrecognized Postgres type asOption<String>, failing silently withError::InvalidRecordand no indication of which column or type caused it.Added explicit handlers for DATE, TIME, INTERVAL, OID, NAME, BPCHAR, and 13 array types (_BOOL, _INT2, _INT4, _INT8, _FLOAT4, _FLOAT8, _TEXT, _VARCHAR, _CHAR, _BPCHAR, _UUID, _JSON, _JSONB). The default branch now logs a warning with column name and type before attempting String fallback, and returns
InvalidRecordValuewith a diagnostic message on failure.Local Execution
AI Usage