apache · andygrove · May 28, 2026 · mbutrovich · May 29, 2026 · mbutrovich
diff --git a/README.md b/README.md
@@ -35,10 +35,12 @@ under the License.
 
 <img src="docs/source/_static/images/DataFusionComet-Logo-Light.png" width="512" alt="logo"/>
 
-Apache DataFusion Comet is a high-performance accelerator for Apache Spark, built on top of the powerful
-[Apache DataFusion] query engine. Comet is designed to significantly enhance the
-performance of Apache Spark workloads while leveraging commodity hardware and seamlessly integrating with the
-Spark ecosystem without requiring any code changes.
+Apache DataFusion Comet is a high-performance accelerator for Apache Spark. Comet keeps Spark queries
+**Arrow-native end-to-end**: operators, expressions, shuffle, and broadcast all stay in Apache Arrow
+columnar format, avoiding the per-row overhead of Spark's row-based engine. Within the Arrow-native
+pipeline, operators and expressions execute as Rust code (via the [Apache DataFusion] query engine)
+or as JVM code that operates directly on Arrow batches. Comet integrates with the Spark ecosystem
+without requiring any code changes.
 
 **Comet provides a ~2x speedup for TPC-DS @ SF 1000 (1TB), resulting in ~50% cost savings.**
 
@@ -58,17 +60,22 @@ See the [Comet Benchmarking Guide](https://datafusion.apache.org/comet/contribut
 
 ## What Comet Accelerates
 
-Comet replaces Spark operators and expressions with native Rust implementations that run on Apache DataFusion.
-It uses Apache Arrow for zero-copy data transfer between the JVM and native code.
+Comet replaces Spark operators and expressions with implementations that consume and produce Apache Arrow
+batches. Most run as native Rust code on top of Apache DataFusion; some run as JVM code over Arrow batches.
+Either way the work stays in the Comet pipeline without falling back to Spark's row-based engine.
-Either way the work stays in the Comet pipeline without falling back to Spark's row-based engine.
+Either way, query execution stays in the Comet pipeline without falling back to Spark's row-based engine.
-Either way the work stays in the Comet pipeline without falling back to Spark's row-based engine.
+Either way, query execution stays in the Comet pipeline without falling back to Spark's row-based engine.
 
 - **Parquet scans**: native Parquet reader integrated with Spark's query planner
 - **Apache Iceberg**: accelerated Parquet scans when reading Iceberg tables from Spark
   (see the [Iceberg guide](https://datafusion.apache.org/comet/user-guide/iceberg.html))
-- **Shuffle**: native columnar shuffle with support for hash and range partitioning
+- **Shuffle**: Arrow-IPC columnar shuffle with support for hash and range partitioning, in a native Rust
+  implementation paired with a JVM fallback for unsupported partition key types
 - **Expressions**: hundreds of supported Spark expressions across math, string, datetime, array,
   map, JSON, hash, and predicate categories
 - **Aggregations**: hash aggregate with support for `FILTER (WHERE ...)` clauses
 - **Joins**: hash join, sort-merge join, and broadcast join
+- **Scala/Java UDFs**: experimental support for keeping Scala/Java scalar UDFs in the Comet pipeline
+  via Spark's whole-stage codegen (see the
+  [Scala UDF guide](https://datafusion.apache.org/comet/user-guide/scala_java_udfs.html))
 
 For the authoritative lists, see the [supported expressions](https://datafusion.apache.org/comet/user-guide/expressions.html)
 and [supported operators](https://datafusion.apache.org/comet/user-guide/operators.html) pages.