investigate: COPY FROM STDIN broken in remote-worker (multitenant) mode#551
Closed
fuziontech wants to merge 1 commit into
Closed
investigate: COPY FROM STDIN broken in remote-worker (multitenant) mode#551fuziontech wants to merge 1 commit into
fuziontech wants to merge 1 commit into
Conversation
The COPY FROM STDIN handler writes the spooled CSV to the control plane
pod's local /tmp via os.CreateTemp, then sends `COPY ... FROM '/tmp/...'`
to the executor. In standalone and process-backend topologies the
worker shares that filesystem, so the path resolves. In the remote
(multitenant K8s) backend the worker is a separate pod with its own
filesystem and an emptyDir /data mount; CP only mounts /app/certs.
The worker therefore fails with "IO Error: No files found that match
the pattern" on every Fivetran COPY. There is no shared volume between
CP and worker, so this code path cannot work as-is in remote mode.
This change does not fix the underlying limitation. It:
- Documents the constraint in conn.go where the tempfile is created.
- Wraps the worker's "No files found" error with an explicit message
naming the architectural mismatch, so operators see the cause
immediately instead of repeatedly chasing a phantom IO error.
The fix requires an ingest path that ships bytes to the worker:
DoPut with Arrow IPC, a custom CSV-bytes RPC, or staging through
S3/HTTPFS. See PR description for fix-option analysis.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
Author
|
Superseded by the actual fix in #NEW_PR (real implementation, not just diagnostics). Will update with link. |
3 tasks
Member
Author
|
Superseded by the actual fix in #552 — that PR streams CSV bytes through Flight DoPut to the worker pod instead of relying on a shared filesystem. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
COPY FROM STDINfrom any client (Fivetran, dbt, psql\copy, etc.) is broken in the remote-worker / multitenant K8s topology and has been since that backend shipped. Every COPY fails with:This is architectural, not a transient bug. The control plane writes the spool file to its own pod's filesystem and asks the worker — which is a separate pod with no shared volume — to read from that path.
This PR does not yet fix it. It documents the limitation in code and improves the error so operators stop chasing the symptom. Fix options are discussed below.
Reproduction (production, 2026-05-07 ~20:50 UTC)
From the
posthog_data_importuser, control plane podduckgres-67d4574c84-bt6rw, EKSposthog-mw-prod-us:Three different staging tables in the same Fivetran sync hit the same error within the same second. Every Fivetran sync since the migration to MTCP has been losing rows on this path silently (CP returns the COPY error to the client, but Fivetran retries and eventually moves on with the underlying tables empty).
Root cause
server/conn.go:3848(now annotated) creates the spool tempfile viaos.CreateTemp("", "duckgres-copy-*.csv")— i.e. in$TMPDIRof the control plane pod, which resolves to/tmpin the duckgres-cp container.server/conn.go:3908then constructsCOPY <table> FROM '<tmpPath>' ...and ships it to the executor:In remote-backend mode
c.executoris aflightclient.FlightExecutorthat forwards the SQL to a worker pod over Arrow Flight SQLDoPutCommandStatementUpdate(duckdbservice/flight_handler.go:508). The worker callssession.Conn.ExecContext(ctx, query)against its local DuckDB, which opens the path against the worker pod's filesystem.The two pods do not share a volume:
So the tempfile is invisible to the worker and the COPY fails.
The author of
flight_executor.goactually flagged this:…but the
COPY FROM STDINpath inconn.godoes not detect Flight mode and does not have a batched-INSERT fallback. It just hands a local-pathCOPY FROMto the executor.Why it works in standalone / process-backend
/tmpis shared.Why no test caught it
server/conn_test.gohas thorough coverage ofCOPY FROM STDIN(regex, blob fallback, multi-line quoted fields, etc.), but every test runs against an in-process DuckDB executor. There is no integration test that runsCOPY FROM STDINend-to-end through the remote Flight executor against a separate worker process. Adding one would have caught this immediately.Fix options
In rough order of "cleanest first":
1. Arrow Flight SQL
CommandStatementIngest(preferred)Apache Arrow Flight SQL added a dedicated bulk ingest command (Arrow ≥17 / FlightSQL spec rev). The CP would parse the CSV stream into Arrow record batches and
DoPutthem into a transient table on the worker, then runINSERT … SELECT …(or use the ingest command's transactional semantics directly). Pros: standard, no custom RPC. Cons: requires confirming the Go bindings supportCommandStatementIngestand adding the worker handler.2. Custom DoPut for raw CSV bytes
Add a duckgres-specific FlightDescriptor that streams CSV bytes to the worker; the worker writes them to its own
/tmpand runs the existingCOPY FROM <local-path>. Pros: minimal change to the DuckDB-side execution (reuses the optimized parser). Cons: a custom RPC the team has to maintain.3. Stage to S3 / HTTPFS
The CP writes the CSV to a known S3 prefix (one bucket per org or global), and rewrites
COPY FROM 'stdin'toCOPY FROM 's3://…'. Workers already have S3 credentials for DuckLake. Pros: zero code on the worker. Cons: S3 write+read latency for every COPY (Fivetran ships many small batches), and a cleanup story for orphaned CSVs.4. Batched INSERT
What the existing
flight_executor.gocomment hints at: parse the CSV in CP, sendINSERT INTO … VALUES (…), (…)chunks to the worker. Pros: works today over the existing executor. Cons: bypasses DuckDB's parallel CSV parser; ~10–100x slower for big batches, which is exactly what Fivetran sends.Recommendation: option 1 if
CommandStatementIngestis wired through in our Arrow version, otherwise option 2.What this PR contains
server/conn.go) describing the limitation and pointing readers to the relevant remote handler.No behavioral change to any working topology. No fix yet — that's a follow-up.
Test plan
go build ./...andgo build -tags kubernetes ./...go test ./server/...(existing COPY tests still pass; they exercise the in-process path which works fine)🤖 Generated with Claude Code