Skip to content

Commit f15e76f

Browse files
committed
Address review comments
1 parent 766e002 commit f15e76f

2 files changed

Lines changed: 9 additions & 6 deletions

File tree

src/components/fundable/descriptions/ParquetNullOptimizations.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Apache Parquet is an open source, column-oriented data file format designed for
44
efficient data storage and retrieval. Together with Apache Arrow for in-memory data,
5-
it has become for the de facto standard for efficient columnar analytics.
5+
it has become for the *de facto* standard for efficient columnar analytics.
66

77
While Parquet and Arrow are most often used together, they have incompatible physical
88
representations of data with optional values: data where some values can be
@@ -24,11 +24,14 @@ for flat (non-nested) data:
2424
2. avoiding decoding definition levels entirely when a data page's statistics shows
2525
it cannot contain any nulls (or, conversely, when it cannot contain any non-null values).
2626

27-
This work can optionally be extended so as to apply to schemas with moderate amounts
28-
of nesting.
27+
As a subsequent task, these optimizations may be extended so as to apply to schemas
28+
with moderate amounts of nesting.
29+
30+
This work will benefit to applications using Arrow C++ or any of its language
31+
bindings (such as PyArrow, R-Arrow...).
2932

3033
Depending on the typology of Parquet data, this could make Parquet reading 2x
31-
faster, even more in some cases. If you are ensure whether your workload could
34+
faster, even more in some cases. If you are unsure whether your workload could
3235
benefit, we can discuss this based on sample Parquet files you provide us.
3336

3437
##### Are you interested in this project? Either entirely or partially, contact us for more information on how to help us fund it

src/components/fundable/projectsDetails.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -129,9 +129,9 @@ export const fundableProjectsDetails = {
129129
},
130130
{
131131
category: "Apache Arrow and Parquet",
132-
title: "Parquet C++ reader optimizations",
132+
title: "Parquet reader optimizations",
133133
pageName: "ParquetNullOptimizations",
134-
shortDescription: "Converting Parquet optional values to nullable Arrow data is often a performance bottleneck.",
134+
shortDescription: "Converting Parquet optional values to nullable Arrow data is often a performance bottleneck. We will optimize that step for the most common cases.",
135135
description: ParquetNullOptimizationsMD,
136136
price: "TBD",
137137
maxNbOfFunders: 1,

0 commit comments

Comments
 (0)