From 973add93a5565bf3dd6326a6c428b5f4f7f24fb2 Mon Sep 17 00:00:00 2001
From: Andrew Xie <dev@xie.is>
Date: Thu, 11 Jun 2026 15:24:29 -0400
Subject: [PATCH] docs: Document using Spark rewrite_table_path to copy table
 metadata

---
 docs/rewrite-table-path.md | 70 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)
 create mode 100644 docs/rewrite-table-path.md

diff --git a/docs/rewrite-table-path.md b/docs/rewrite-table-path.md
new file mode 100644
index 0000000..72d749a
--- /dev/null
+++ b/docs/rewrite-table-path.md
@@ -0,0 +1,70 @@
+# Rewriting table paths using Spark
+
+Copying a table to a new location in S3 requires rewriting metadata files since they contain absolute paths.
+This can be done using the [rewrite_table_path](https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_table_path) procedure in spark-iceberg.
+
+## Configure Spark
+
+Spark must be configured to connect to the REST catalog that holds your source Iceberg table.
+The file [examples/docker-compose/docker-compose-spark-iceberg.yaml](/examples/docker-compose/docker-compose-spark-iceberg.yaml)
+in this repo contains a simple Docker Compose to launch spark-iceberg.
+
+Modify the following values under `configs -> spark-defaults.conf`:
+- Set `spark.sql.catalog.default.uri` to your catalog URI
+- Set `spark.sql.catalog.default.header.authorization` with your bearer token (ex: bearer your-token)
+- Set `spark.sql.catalog.default.warehouse` to the path of your catalog warehouse (ex: s3://your-warehouse-iceberg)
+- Remove `spark.sql.catalog.default.s3.endpoint`, this is only used for the default MinIO configuration
+- Set `spark.sql.catalog.default.s3.access-key` to your S3 access key
+- Set `spark.sql.catalog.default.s3.secret-key` to your S3 secret access key
+- Add `spark.sql.catalog.default.s3.session-token` with your S3 session token, if using
+- Set `spark.sql.catalog.default.client.region` to your S3 region
+- Set `spark.sql.catalog.default.s3.ssl-enabled` to true
+
+## Launch spark-sql
+
+After modifying the config values above, bring up the container:
+
+```
+cd ../examples/docker-compose
+docker compose -f docker-compose-spark-iceberg.yaml up -d
+```
+
+Then, launch spark-sql inside the container with additional options. Ensure valid S3 credentials are set.
+
+```
+docker exec -it spark-iceberg spark-sql --packages org.apache.hadoop:hadoop-aws:3.3.4 \
+  --conf spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
+  --conf spark.hadoop.fs.s3a.access.key=YOUR_ACCESS_KEY \
+  --conf spark.hadoop.fs.s3a.secret.key=YOUR_SECRET_KEY \
+  --conf spark.hadoop.fs.s3a.session.token=YOUR_SESSION_TOKEN
+```
+
+## Run procedure
+
+Run the `rewrite_table_path` procedure to copy the Iceberg table metadata files.
+Every absolute path with the `source_prefix` is replaced by the `target_prefix`.
+The `staging_location` is the location where the copied metadata files are written.
+
+```
+CALL default.system.rewrite_table_path(
+  table => 'ns.table_name',
+  source_prefix => 's3://your-warehouse-iceberg/ns/table_name',
+  target_prefix => 's3://your-warehouse-iceberg/new_ns/table_name_rewritten',
+  staging_location => 's3://your-warehouse-iceberg/new_ns/table_name_rewritten/metadata);
+```
+
+See the [Iceberg docs](https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_table_path) for info on
+other arguments.
+
+After copying the metadata, you must also copy the data files. This can be a simple copy using an external tool like the
+AWS CLI. The data files should keep the same directory structure relative to the source and target prefixes.
+
+Finally, register the copied table with the ice-rest-catalog using ice. This can either be the same
+or a different catalog (if copied to a different bucket).
+
+```
+ice insert new_ns.table_name_rewritten -p 's3://your-warehouse-iceberg/new_ns/table_name_rewritten/data/*.parquet' --no-copy --thread-count=10 
+```
+
+If the table is partitioned, add the `--partition` flag with the partition scheme as JSON and change the `s3://` path to
+match the path of your data files.