Skip to content

feat(parquetconverter): Add max-num-columns config for automatic parquet sharding#7624

Open
yeya24 wants to merge 1 commit into
cortexproject:masterfrom
yeya24:feat/parquet-max-columns
Open

feat(parquetconverter): Add max-num-columns config for automatic parquet sharding#7624
yeya24 wants to merge 1 commit into
cortexproject:masterfrom
yeya24:feat/parquet-max-columns

Conversation

@yeya24

@yeya24 yeya24 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

What this PR does

Bumps parquet-common to include PR #131 which adds automatic parquet file sharding based on a configurable column limit.

The parquet-go library has a hard limit of 32767 columns per file. When a TSDB block has more unique label names than this limit, conversion would fail. This feature allows users to set a lower limit via -parquet-converter.max-num-columns and automatically shard into multiple parquet files when the number of unique label names would exceed it.

Changes

  • Bump parquet-common to 5f32460b5373 (merged column sharding support)
  • Add -parquet-converter.max-num-columns flag (default 0 = library default of 32767)
  • Add MaxNumColumns to Config struct and wire into convert options
  • Add TestConvertWithMaxNumColumns integration test
  • Update v1-guarantees.md with experimental feature entry
  • Regenerate docs and JSON schema

How to test

The new test TestConvertWithMaxNumColumns verifies:

  1. With WithMaxNumColumns(20): produces multiple shards when 50 unique label names exist
  2. With WithMaxNumColumns(10000): produces a single shard (no sharding needed)
go test ./pkg/parquetconverter/... -run TestConvertWithMaxNumColumns -v

@dosubot dosubot Bot added dependencies Pull requests that update a dependency file go Pull requests that update Go code type/feature labels Jun 16, 2026
@yeya24 yeya24 force-pushed the feat/parquet-max-columns branch from 33fcd02 to bc753c2 Compare June 16, 2026 00:29
…uet sharding

Bump parquet-common to include PR #131 which adds automatic parquet
file sharding based on a configurable column limit. The parquet-go
library has a hard limit of 32767 columns; this feature allows users to
set a lower limit and automatically shard into multiple files when the
number of unique label names would exceed it.

Changes:
- Bump parquet-common to 5f32460b5373 (merged column sharding PR)
- Add -parquet-converter.max-num-columns flag (default 0 = library default)
- Add MaxNumColumns to Config struct and wire into convert options
- Add TestConvertWithMaxNumColumns integration test
- Update v1-guarantees.md with experimental feature entry
- Regenerate docs and JSON schema

Signed-off-by: Ben Ye <benye@amazon.com>
@yeya24 yeya24 force-pushed the feat/parquet-max-columns branch from bc753c2 to bf79b91 Compare June 16, 2026 00:45

@SungJin1212 SungJin1212 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file go Pull requests that update Go code lgtm This PR has been approved by a maintainer size/M type/feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants