Skip to content

Backport of #100420/#100295 - Resolve problems with paths and compatibility problems with Spark in Azure (v2)#1785

Merged
zvonand merged 14 commits into
antalya-26.3from
backport/antalya/iceberg_path_100420
Jun 1, 2026
Merged

Backport of #100420/#100295 - Resolve problems with paths and compatibility problems with Spark in Azure (v2)#1785
zvonand merged 14 commits into
antalya-26.3from
backport/antalya/iceberg_path_100420

Conversation

@arthurpassos
Copy link
Copy Markdown
Collaborator

@arthurpassos arthurpassos commented May 12, 2026

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

This PR addresses several issues: fixes inconsistent path handling in Iceberg caused by mixed usage of storage paths and metadata paths; enforces that Iceberg tables write down a table location which is either a URL or an absolute path; adds a fallback for counting file sizes in Azure because some ClickHouse readers don't support byte counting after traversal; version-hint.txt is now handled in a manner compatible with Spark; introduces type-level abstractions that make it harder to mix up path types in the future; adds tests for Azure and Local that verify cross-engine interoperability without intermediate uploading/downloading; fixes usage of position deletes, which previously relied on path inference heuristics where that approach is inappropriate

Backport of ClickHouse#100420 and ClickHouse#100295

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9762404674

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

per_file_record_counts.push_back(static_cast<Int64>(*current_file_num_rows));
per_file_byte_sizes.push_back(static_cast<Int64>(file_bytes));
/// todo arthur fix the wrong counter for file bytes, probably by backporting something else
per_file_byte_sizes.push_back(static_cast<Int64>(buffer_bytes));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Store the Azure fallback size in per-file stats

When the write buffer does not report bytes (the Azure path handled just above), buffer_bytes remains 0, so per_file_byte_sizes records 0 even though total_bytes was corrected from object metadata. This value is returned by getDataFileEntries and written to the import/export sidecar, which later becomes the Iceberg manifest file_size_in_bytes, so exported/imported Azure data files can be committed with a zero file size. Use the fallback size for both counters instead of only total_bytes.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

Workflow [PR], commit [fae3086]

…hunk`

In `addRequestedFileLikeStorageVirtualsToChunk`, the `_row_number` handling
block uses `return` instead of `continue` after adding the column to the
chunk. This causes the function to exit the loop early, skipping any
remaining virtual columns (e.g. `_data_lake_snapshot_version`). When a
query requests both `_row_number` and another virtual column after it, the
chunk has fewer columns than expected, resulting in:
"Invalid number of columns in chunk pushed to OutputPort."

The fix was originally in ClickHouse#100116 but was lost during merge because
ClickHouse#100208 (revert of ClickHouse#99163) had reintroduced the `return` on master
after the fix branch had already resolved it via a different code structure.

The regression test `04050_iceberg_virtual_columns_return_bug` is already
on master from ClickHouse#100116.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100177&sha=f10711f066fd101124e088ce33061de51ebae0e9&name_0=PR&name_1=Stateless%20tests%20%28amd_debug%2C%20parallel%29

ClickHouse#87890

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@arthurpassos arthurpassos changed the title Backport of #100420 - Resolve problems with paths and compatibility problems with Spark in Azure (v2) Backport of #100420/#100295 - Resolve problems with paths and compatibility problems with Spark in Azure (v2) May 13, 2026
@ianton-ru
Copy link
Copy Markdown

Looks like tests test_named_collections_encrypted2 can't be executed multiple times in parallel.

per_file_record_counts.push_back(static_cast<Int64>(*current_file_num_rows));
per_file_byte_sizes.push_back(static_cast<Int64>(file_bytes));
/// todo arthur fix the wrong counter for file bytes, probably by backporting something else
per_file_byte_sizes.push_back(static_cast<Int64>(total_bytes));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is different behavior.
Before only new delta was pushed into per_file_byte_sizes, now - full summa.
Different, if it is not a first iteration and total_bytes was not zero.

@svb-alt svb-alt added the backport Backport label May 16, 2026
@arthurpassos
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1576bb8cd2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

{
String catalog_filename = metadata_path.serialize();
if (!catalog_filename.starts_with(blob_storage_type_name))
catalog_filename = blob_storage_type_name + "://" + blob_storage_namespace_name + "/" + catalog_filename;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Strip leading slashes before catalog URI updates

For catalog-backed S3 tables created with the default write_full_path_in_iceberg_metadata = 0, the metadata location is now normalized to start with /, so FileNamesGenerator produces paths like /warehouse/db/table/metadata/v2.metadata.json. This branch then prepends s3://<bucket>/ without removing that leading slash, updating Glue to s3://<bucket>//warehouse/... while the file was actually written via resolve to warehouse/...; the catalog will point at a non-existent metadata object after the first insert.

Useful? React with 👍 / 👎.

@arthurpassos
Copy link
Copy Markdown
Collaborator Author

@il9ue Hi Daniel. Can you please fix the remaining conflicts? You can identify them by the build failures, they are related to path resolution x IcebergMetadata::truncate. I believe you are the best person to do that as you have more context.

@il9ue
Copy link
Copy Markdown

il9ue commented May 22, 2026

Alright, let me take a look closely.

The path resolution refactor in ClickHouse#100420/ClickHouse#100295 changed `FileNamesGenerator`'s
constructor signature (5 args → 4 args, with `is_transactional` mapped to
`use_uuid_in_metadata` and `config_path` removed) and removed
`generateMetadataName()` in favour of `generateMetadataPathWithInfo()`.
The `generateManifestList` signature also changed: first arg is now
`IcebergPathResolver` instead of `FileNamesGenerator`, and manifest entry
sizes is `std::vector<Int64>` instead of a scalar.

Adapt the `TRUNCATE TABLE` code path (from #1655) to the new API:

- Two-branch `FileNamesGenerator` construction (transactional vs. non-transactional,
  reading location from JSON) → single construction using
  `persistent_components.path_resolver.getTableLocation()` with
  `is_transactional` as `use_uuid_in_metadata`.
- `generateMetadataName()` → `generateMetadataPathWithInfo()`, returning
  `GeneratedMetadataFileWithInfo { path, version, compression_method }`.
- Storage paths derived via `path_resolver.resolve(path)` where raw strings
  were previously returned.
- Catalog filename via `path_resolver.resolveForCatalog(path)`, replacing the
  manual `location + metadata_name` concatenation.
- `generateManifestList(filename_generator, ...)` → `generateManifestList(path_resolver, ...)`.

Behavior of truncate is preserved: clears data files by writing a new
snapshot with empty manifests and updates the catalog pointer. The new
`IcebergPathResolver` abstraction handles both transactional (full URI)
and non-transactional (bare path) cases transparently.

Refs: #1785, #1655
@il9ue
Copy link
Copy Markdown

il9ue commented May 26, 2026

@arthurpassos
Pushed a fix for the IcebergMetadata::truncate conflicts in 2c83074.

  • The collision was with the new FileNamesGenerator API (5→4 args, generateMetadataName → generateMetadataPathWithInfo) and the changed generateManifestList signature.
  • Truncate now routes both storage and catalog paths through IcebergPathResolver instead of branching on is_transactional.
  • Build is clean locally; leaving the MultipleFileWriter byte-size and IcebergPath leading-slash review items to you since those are from the upstream merge rather than the truncate collision.

@arthurpassos
Copy link
Copy Markdown
Collaborator Author

@arthurpassos Pushed a fix for the IcebergMetadata::truncate conflicts in 2c83074.

  • The collision was with the new FileNamesGenerator API (5→4 args, generateMetadataName → generateMetadataPathWithInfo) and the changed generateManifestList signature.
  • Truncate now routes both storage and catalog paths through IcebergPathResolver instead of branching on is_transactional.
  • Build is clean locally; leaving the MultipleFileWriter byte-size and IcebergPath leading-slash review items to you since those are from the upstream merge rather than the truncate collision.

04034_iceberg_spark_style_location is failing. Could it be related?

@ianton-ru
Copy link
Copy Markdown

04034_iceberg_spark_style_location is failing. Could it be related?

It's related. Test passed on antalya-26.3 and 100% failed on this branch, not flaky.

@ianton-ru
Copy link
Copy Markdown

@arthurpassos @il9ue
Difference is the next.
antalya-26.3, in .../metadata/v2.metadata.json

....
"location": "04034_iceberg_spark/04034_iceberg_spark_style_location_test_7f25o3e5/",
....
"snapshots": [{"manifest-list": "/04034_iceberg_spark/04034_iceberg_spark_style_location_test_7f25o3e5/metadata/snap-1259861766-2-a4925325-83eb-492e-8a08-be414b3d64c1.avro",
....

location does not start with '/', manifest-list starts, and script does not change manifest-list value.
With current PR both, location and manifest-list start with slash character, and test script rewrites manifest-list on something like s3a://spark-bucket/warehouse/db/spark_table/metadata/snap-1120247156-2-d24c449e-1c9f-488e-b351-be95f61296b5.avro.

I don't know where is the bug - in PR, in test, or need to port some other PR too.

@arthurpassos
Copy link
Copy Markdown
Collaborator Author

@arthurpassos @il9ue Difference is the next. antalya-26.3, in .../metadata/v2.metadata.json

....
"location": "04034_iceberg_spark/04034_iceberg_spark_style_location_test_7f25o3e5/",
....
"snapshots": [{"manifest-list": "/04034_iceberg_spark/04034_iceberg_spark_style_location_test_7f25o3e5/metadata/snap-1259861766-2-a4925325-83eb-492e-8a08-be414b3d64c1.avro",
....

location does not start with '/', manifest-list starts, and script does not change manifest-list value. With current PR both, location and manifest-list start with slash character, and test script rewrites manifest-list on something like s3a://spark-bucket/warehouse/db/spark_table/metadata/snap-1120247156-2-d24c449e-1c9f-488e-b351-be95f61296b5.avro.

I don't know where is the bug - in PR, in test, or need to port some other PR too.

Thanks for the investigation. I don't have time to look at it today, but I will try to investigate it as well tomorrow

@ianton-ru
Copy link
Copy Markdown

test_storage_iceberg_with_spark/test_remote_initiator.py::test_remote_initiator_after_non_remote is unstable, ignore it

Copy link
Copy Markdown

@ianton-ru ianton-ru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zvonand zvonand merged commit 76ee3a1 into antalya-26.3 Jun 1, 2026
642 of 686 checks passed
@CarlosFelipeOR
Copy link
Copy Markdown
Collaborator

AI audit note: This review comment was generated by AI (gpt-5.3-codex).

Audit update for PR #1785 (Iceberg path/metadata backport scope):

Confirmed defects:

High: Malformed catalog metadata location for pre-qualified Azure-style URIs
Impact: Transactional catalog commits can store an invalid metadata location (e.g. azure://<container>/wasb://...), which can make subsequent table refresh/read paths fail.
Anchor: src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergPath.h / IcebergPathResolver::resolveForCatalog
Trigger: Any transactional-catalog update path (INSERT/mutation/truncate/expire_snapshots) where metadata_info.path is already URI-qualified with a scheme not equal to storage getTypeName (realistic for Spark-written Azure tables using wasb:///abfss://).
Why defect: resolveForCatalog uses starts_with(blob_storage_type_name) instead of checking whether the metadata path is already a full URI, so valid pre-qualified paths are re-prefixed.

String resolveForCatalog(const IcebergPathFromMetadata & metadata_path) const
{
    String catalog_filename = metadata_path.serialize();
    if (!catalog_filename.starts_with(blob_storage_type_name))
        catalog_filename = blob_storage_type_name + "://" + blob_storage_namespace_name + "/" + catalog_filename;
    return catalog_filename;
}

Fix direction (short): In resolveForCatalog, detect an existing URI scheme (<scheme>://) and return as-is; only prefix bare object keys/paths.
Regression test direction (short): Add resolveForCatalog unit coverage for azure type with metadata paths wasb://... and abfss://... (must stay unchanged), plus a relative-path case (must be prefixed).

Coverage summary:

Scope reviewed: End-to-end changed call graph in Iceberg metadata/write paths (FileNamesGenerator, IcebergPathResolver, IcebergWrites, Mutations, IcebergMetadata, manifest read/write, virtual-column plumbing).
Categories failed: Catalog path normalization for already-qualified metadata URIs in transactional commit paths.
Categories passed: Path-type propagation (IcebergPathFromMetadata) through manifest/snapshot flows; version-hint numeric resolution fallback; mutation virtual-column flow (_row_number control-flow fix and _iceberg_metadata_file_path propagation); manifest-size accounting fallbacks.
Assumptions/limits: Static audit only in this run (no runtime execution of integration suites / no live transactional Azure catalog E2E).

@CarlosFelipeOR CarlosFelipeOR added the verified Approved for release label Jun 1, 2026
@CarlosFelipeOR
Copy link
Copy Markdown
Collaborator

CarlosFelipeOR commented Jun 2, 2026

QA Verification — PR #1785 (Antalya 26.3 Iceberg path handling)

Verdict

Approve from QA.

The PR achieves its scope (Iceberg path handling, Azure file-size fallback, Spark version-hint.txt, position deletes) and introduces no collateral regression. See the conclusion at the bottom for the full reasoning.


CI triage

Failures justification

Stateless and Integration tests

From Checks New Fails in this run, there are 8 entries (5 integration + 3 stateless), all non-blocking for this PR:

  • test_storage_iceberg_with_spark/test_remote_initiator.py::test_remote_initiator_after_non_remote[...] (5 cases in Integration tests (amd_asan, targeted))

    • Already known and fixed in PR Fix test_remote_initiator_after_non_remote #1856.
    • Passes in other relevant jobs (amd_binary, arm_binary, and amd_asan non-targeted splits).
    • CI DB history confirms this is not new to this PR (126 fails across 12 PRs in the last 120 days).
  • 00157_cache_dictionary (Stateless tests (amd_debug, sequential)) — pre-existing flaky; CI DB shows 180 fails across 50 PRs since March 2026.

  • 03760_backup_tar_archive (Stateless tests (arm_asan, azure, sequential, 1/2)) — pre-existing flaky on arm/azure stateless lanes; 88 fails across 32 PRs since April 2026.

  • Scraping system tables (Stateless tests (arm_asan, azure, sequential, 1/2)) — pre-existing intermittent meta-check; 64 fails across 34 PRs since March 2026.

Regression tests

Failing regression jobs in this PR run: iceberg_2 (aarch64), swarms, tiered_storage, settings, version.

  • swarms, settings, version: known QA-tracked issues (already fixed by QA, not re-investigated).
  • tiered_storage: known flaky, failing before this PR.
  • iceberg_2 (aarch64): system.query_log flush race; addressed by clickhouse-regression commit abaadb5 (SYSTEM FLUSH LOGS in metrics wait step). The release lane (Regression release iceberg_2) is green on this PR (1578 scenarios, 5 xfail).

iceberg/export partition is a regression suite relevant to this PR but was not executed in this CI run (workflow filter excluded it). It was executed separately here: Regression run. 5 failure types were observed:

  1. /iceberg/export partition/no catalog/system monitoring/every system․replicated_partition_exports column is populated on success — Fails because last_exception was intentionally removed from system.replicated_partition_exports in 26.3. Test needs to be updated to the new schema.

  2. /iceberg/export partition/no catalog/schema evolution/rejected﹕ RENAME COLUMN on iceberg destination — Fails because RENAME COLUMN is intentionally allowed now (commit f33a1b09). Outdated expectation.

  3. /iceberg/export partition/no catalog/concurrent writes/duplicate EXPORT inside one ALTER commits at most once — Duplicate-export error code intentionally changed in 26.3; test still asserts the old code.

  4. /iceberg/export partition/no catalog/transactions/duplicate export within TTL is rejected — Same as Update README.md #3.

  5. /iceberg/export partition/no catalog/manifest integrity/external iceberg reader round-trips exported dataNot a PR 1785 regression. The PR's path fix lets PyIceberg actually open the exported files now (this test was previously XFail because data-file paths lacked the s3:// scheme). Opening surfaces a separate, pre-existing limitation: the CH Iceberg write path emits Parquet without Iceberg field-ids and without schema.name-mapping.default on the table, causing a strict-reader ValueError. Not export-partition-specific: the same MultipleFileWriter is used by plain INSERT INTO <iceberg>, so the gap belongs to the broader CH Iceberg write path, not to this PR.


Tests to validate the functionality

New / edited tests in the PR

All new and updated tests introduced by this PR passed across the relevant integration lanes (amd_binary, amd_asan, db disk, old analyzer, arm_binary, distributed plan):

Regression coverage in clickhouse-regression

  • Regression release iceberg_1 (green): 489 scenarios, 9 xfail.
  • Regression release iceberg_2 (green): 1578 scenarios, 5 xfail. The aarch64 lane failure was a flush race, not a server regression (fix in clickhouse-regression abaadb5).
  • iceberg/export partition: see the separate run above. Among the path-related xfails this PR was expected to address, /iceberg/export partition/no catalog/manifest integrity/data file paths live under the table prefix moved from XFail to PASS, which corroborates the type-level path abstraction is correctly writing absolute URIs into manifest entries.

Conclusion

The PR's correctness is established by the combination of three independent sources of evidence:

  1. Direct coverage of every change vector by the PR's own new tests, exercised end-to-end against real external engines:

    PR change Test that exercises it
    Path abstraction (metadata vs storage) test_storage_iceberg_interoperability_azure and ..._local — CH↔Spark write/read/append on Azure and local. If paths weren't consistent, Spark couldn't open CH-written tables (or vice-versa).
    Azure file-size fallback All 4 interoperability_azure tests (they read CH-written data on Azure).
    Spark version-hint.txt test_writes_create_version_hint.py.
    Position deletes test_spark_delete_ch_read, test_ch_delete_spark_read (both Azure and local).

    All passed across amd_binary, amd_asan (db disk, old analyzer), and arm_binary (distributed plan).

  2. No collateral regression on the broader Iceberg surface: Regression release iceberg_1 and iceberg_2 are green on this PR (489 + 1578 scenarios). Iceberg integration suites that don't touch the PR's change vectors (test_storage_iceberg_concurrent, test_named_collections_encrypted2) pass. Every CI failure in this run was classified as known/pre-existing, unrelated to PR Backport of #100420/#100295 - Resolve problems with paths and compatibility problems with Spark in Azure (v2) #1785, or addressed elsewhere.

  3. Corroborating signal on the specific symptom the PR targets: the path-related xfail /iceberg/export partition/.../data file paths live under the table prefix moved from XFail to PASS, directly confirming the new IcebergPath/IcebergPathResolver abstraction writes absolute URIs into manifest entries.

Taken together — direct cross-engine coverage of each change vector + no collateral on existing suites + corroborating xfail flip — the PR meets its objective (path consistency, Azure compatibility, Spark interop) without introducing regression. ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants