Postgres: bulk-load DataFrames via COPY (same pattern as the MSSQL adapter and others)

# What
PostgresEngineAdapter doesn't override _df_to_source_queries, so DataFrames from Python models get written as literal VALUES batches i.e. 
`SELECT CAST(...) ... FROM (VALUES (...), (...)) AS t(...)` per select_from_values_for_batch_range, (DEFAULT_BATCH_SIZE=10K rows).

Postgres has to parse and plan each batch as one giant statement, which on my workload capped out around 2–6K rows/s (row width and network will move that number, but the per-row parse cost is structural). If you're serving Python model output to Postgres, the write phase ends up dominating the whole plan.

# Proposed Solution
`MSSQLEngineAdapter._df_to_source_queries` ([mssql.py](https://github.com/TobikoData/sqlmesh/blob/main/sqlmesh/core/engine_adapter/mssql.py)) (along with BigQuery, Snowflake, Clickhouse) all already have the shape: create a temp table, bulk-load the DataFrame into it, return a `SourceQuery` selecting `_casted_columns(...)` from the temp table with `cleanup_func=drop_table`, and fall back to `super()._df_to_source_queries(...)` when the connection can't bulk-load.

Postgres has a native bulk mechanism: psycopg2's `cursor.copy_expert("COPY ... FROM STDIN WITH (FORMAT csv, ...)", buf)`. So this is a port of an existing pattern. Same temp-table lifecycle, same fallback philosophy (MSSQL gates on driver capability; this gates on dtype/content safety, see below).

I've got this running as a subclass override in my project (verified against 0.228.0 and 0.234.1) and checked correctness properly: per-table row counts and content checksums are byte-identical to the literal-SQL path across all 81 tables.

Comparision on 81 Python models (~1M rows, prod state creation , local Postgres):

| | literal VALUES | COPY (suggested improvement) |
|---|---|---|
| model batch execution (81 models) | 96s | 7s (~14x) |


## Some findings and considerations

1. Int columns with NULLs: pandas stores nullable ints as floats, so COPY receives 0.0 and rejects it for int columns. We convert those columns to pandas' nullable Int64 first using a hand-picked set of int types, because sqlglot's INTEGER_TYPES constant wrongly includes Postgres's bit-string type.
2.  Errors instead of rounding: a bad value like 1.5 in an INT column now raises an error, where the old path silently rounded it to 2. Failing fast is better, but it's a behaviour change so the tests pin it down.
3. Temp table names being too long: sqlmesh's generated temp-table names exceed Postgres's 63-character limit for long model names (34 of my 81 models were too long 😅 ). We shorten the name while keeping the suffix; the proper fix belongs in sqlmesh's base class.
4. Risingwave's adapter inherits from the Postgres one, so it would silently get the COPY path on a server that may not support that syntax. The PR should pin Risingwave to its current behaviour until someone verifies COPY works there.
5. NULL edge case: NULLs appear to travel as unquoted \N in the CSV. A real string whose whole value is \N would be mistaken for NULL, so any DataFrame containing one just uses the old write path instead.


Happy to contribute to the feature - implementation, unit tests asserting the `copy_expert` call + CSV payload, and whatever integration coverage you need. Some open questions:

1. Default-on like MSSQL, or behind a connection setting?
2. `_get_temp_table`/`MAX_IDENTIFIER_LENGTH` as a precursor PR, or keep the local clamp in the COPY method?
3. Risingwave: override back to base in the same PR, or does someone have COPY-on-Risingwave knowledge that says it's fine?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgres: bulk-load DataFrames via COPY (same pattern as the MSSQL adapter and others) #5830

What

Proposed Solution

Some findings and considerations

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Postgres: bulk-load DataFrames via COPY (same pattern as the MSSQL adapter and others) #5830

Description

What

Proposed Solution

Some findings and considerations

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions