Is your feature request related to a problem or challenge?
DataFusion's Parquet writer only exposes a row-count limit for row group sizing, via ParquetOptions.max_row_group_size (datafusion.execution.parquet.max_row_group_size, default 1M rows). There is no way to bound a row group by bytes.
A row count could be a poor proxy for row group size depending on your workload, because bytes-per-row varies widely with schema width. The same max_row_group_size = 1M yields a small row group for a narrow schema and a multi-hundred-MB row group for a wide one.
Describe the solution you'd like
Add an optional max_row_group_bytes to ParquetOptions, wired to WriterPropertiesBuilder::set_max_row_group_bytes.
Describe alternatives you've considered
No response
Additional context
The capability is already available on DataFusion main, so no dependency bump is required. I have an implementation ready (config field, WriterPropertiesBuilder wiring, round-trip tests, and docs) and can open a PR against this issue.
Is your feature request related to a problem or challenge?
DataFusion's Parquet writer only exposes a row-count limit for row group sizing, via ParquetOptions.max_row_group_size (datafusion.execution.parquet.max_row_group_size, default 1M rows). There is no way to bound a row group by bytes.
A row count could be a poor proxy for row group size depending on your workload, because bytes-per-row varies widely with schema width. The same max_row_group_size = 1M yields a small row group for a narrow schema and a multi-hundred-MB row group for a wide one.
Describe the solution you'd like
Add an optional
max_row_group_bytestoParquetOptions, wired toWriterPropertiesBuilder::set_max_row_group_bytes.Describe alternatives you've considered
No response
Additional context
The capability is already available on DataFusion main, so no dependency bump is required. I have an implementation ready (config field, WriterPropertiesBuilder wiring, round-trip tests, and docs) and can open a PR against this issue.