|
| 1 | +#### Overview |
| 2 | + |
| 3 | +Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics. |
| 4 | + |
| 5 | +Representation of string and binary data in Arrow traditionally uses the Binary layout, where the entire string data resides in a separate buffer that is accessed using indirect indexing from a buffer of offsets. |
| 6 | + |
| 7 | +Recently, the Arrow project added the Binary View layout, a more efficient layout inspired from modern execution engines where the beginning of each string is packed directly within the offsets buffer. This allows short strings to be read and processed directly without going through an additional indirection. |
| 8 | + |
| 9 | +However, while basic support is present, Binary View is not universally supported by all Arrow components. |
| 10 | + |
| 11 | +We propose to finish implementing support for Binary View and String View types in all components of Arrow C++: |
| 12 | + |
| 13 | +* scalar compute kernels: |
| 14 | + - `equal`, `less_equal`, etc. |
| 15 | + - `is_in`, `index_in` |
| 16 | + - `ascii_*`, `binary_*`, `utf8_*` |
| 17 | + - `string_is_ascii` |
| 18 | + - `count_substring` |
| 19 | + - `extract_regex`, `extract_regex_span` |
| 20 | + - `split_pattern`, `split_pattern_regex` |
| 21 | + - `coalesce` |
| 22 | + |
| 23 | +* vector compute kernels: |
| 24 | + - `take`, `filter`, `scatter` |
| 25 | + - `run_end_encode`, `run_end_decode` |
| 26 | + - `sort_indices`, `rank`, `rank_normal`, `rank_quantile` |
| 27 | + - `partition_nth_indices` |
| 28 | + - `select_k_unstable` |
| 29 | + - `replace_with_mask` |
| 30 | + - `fill_null_forward`, `fill_null_backward`, `drop_null` |
| 31 | + |
| 32 | +* aggregate compute kernels: |
| 33 | + - `count_distinct` |
| 34 | + - `first`, `last`, `min`, `max` |
| 35 | + - `index` |
| 36 | + |
| 37 | +* CSV reader and writer |
| 38 | + |
| 39 | +* ORC reader and writer |
| 40 | + |
| 41 | +Funders can decide to fund the entire package, or choose the components they are interested in. |
| 42 | + |
| 43 | +##### Are you interested in this project? Either entirely or partially, contact us for more information on how to help us fund it |
0 commit comments