Skip to content

[CALCITE-4460] Support custom delimiter when parsing CSV tables#4893

Open
Diveyam-Mishra wants to merge 2 commits intoapache:mainfrom
Diveyam-Mishra:CALCITE-4460
Open

[CALCITE-4460] Support custom delimiter when parsing CSV tables#4893
Diveyam-Mishra wants to merge 2 commits intoapache:mainfrom
Diveyam-Mishra:CALCITE-4460

Conversation

@Diveyam-Mishra
Copy link
Copy Markdown

Jira Link

CALCITE-4460

Changes Proposed

This PR adds support for custom field delimiters (separators) in the Calcite file adapter. Previously, the adapter hardcoded comma (,) as the only separator, preventing use of other common delimiters like pipes (|) or tabs.

What changed

  • CsvTableFactory: Parses an optional "separator" operand from the model JSON. Validates it is exactly one character; rejects multi-character values with IllegalArgumentException.
  • CsvTable: Added separator field and a new constructor that accepts it. Passes separator to CsvEnumerator.deduceRowType().
  • CsvTranslatableTable: Added separator constructor; passes separator to CsvEnumerator in project().
  • CsvEnumerator: Added separator parameter to constructors, openCsv(), and deduceRowType(). Uses CSVReader(reader, separator).
  • CsvStreamReader: Added package-private constructor CsvStreamReader(Source, char).

All existing constructors default separator to CSVParser.DEFAULT_SEPARATOR (comma), preserving full backward compatibility.

Scope

Per guidance from @julianhyde, changes are limited to the file adapter only. The CSV adapter (example code) is kept simple and unchanged.

Example configuration

{
  "version": "1.0",
  "defaultSchema": "TEST",
  "schemas": [
    {
      "name": "TEST",
      "tables": [
        {
          "name": "MY_TABLE",
          "type": "custom",
          "factory": "org.apache.calcite.adapter.file.CsvTableFactory",
          "operand": {
            "file": "data.psv",
            "separator": "|"
          }
        }
      ]
    }
  ]
}

Tests added
testCsvCustomSeparatorPipe — reads pipe-delimited data, verifies correct parsing
testCsvCustomSeparatorInvalidMultiChar — verifies multi-character separator is rejected
testCsvDefaultSeparatorBackwardCompat — verifies omitting separator defaults to comma

Copilot AI review requested due to automatic review settings April 19, 2026 05:14
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for configuring a custom field separator when parsing CSV tables in Calcite’s file adapter, enabling non-comma delimited files (e.g., |, \t) via model JSON while preserving comma as the default.

Changes:

  • Add "separator" operand parsing and validation in CsvTableFactory, defaulting to CSVParser.DEFAULT_SEPARATOR when omitted.
  • Thread the separator through CsvTable/CsvTranslatableTable into CsvEnumerator, including row-type deduction and CSV reader creation.
  • Add tests and fixtures covering pipe delimiter support, invalid multi-character separators, and default behavior compatibility.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
file/src/main/java/org/apache/calcite/adapter/file/CsvTableFactory.java Parses/validates optional separator operand and passes it into table construction.
file/src/main/java/org/apache/calcite/adapter/file/CsvTable.java Stores separator and uses it for row-type deduction and field type inference.
file/src/main/java/org/apache/calcite/adapter/file/CsvTranslatableTable.java Passes separator into CsvEnumerator during projection execution.
file/src/main/java/org/apache/calcite/adapter/file/CsvEnumerator.java Adds separator-aware constructors and uses it in openCsv and deduceRowType.
file/src/main/java/org/apache/calcite/adapter/file/CsvStreamReader.java Adds a separator-accepting constructor for streaming reads.
file/src/test/java/org/apache/calcite/adapter/file/FileAdapterTest.java Adds coverage for custom separator, invalid separator, and default behavior.
file/src/test/resources/custom-separator.json New model demonstrating `"separator": "
file/src/test/resources/sales-csv/PIPE_DELIMITED.csv New pipe-delimited CSV fixture for tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dssysolyatin
Copy link
Copy Markdown
Contributor

It would be worth documenting this new feature. site/_docs/file_adapter.md could be a place to add this information

DEPTNO:int|NAME:string
10|"Sales"
20|"Marketing"
30|"Accounts"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test record containing an escape character? I'd like to verify whether the custom delimiter and escape character function correctly in combination. Additionally, I had another thought: regarding support for custom escape characters—I'm not sure if Calcite supports this, though naturally, this doesn't necessarily need to be implemented in the current PR. Overall, this PR looks good.

@julianhyde
Copy link
Copy Markdown
Contributor

I notice that you added duplicate constructors for CsvTable etc. Since the constructors are package-private, you don't need to ensure backwards compatibility. Just add an extra parameter (the separator character).

You should probably change user documentation.

Make sure you test escaping/quoting with the custom separator.

+1 to merge when the above issues are fixed.

@xiedeyantu
Copy link
Copy Markdown
Member

xiedeyantu commented Apr 22, 2026

I have approved the CI, you can take a look at the detail.

@Diveyam-Mishra
Copy link
Copy Markdown
Author

Diveyam-Mishra commented Apr 22, 2026

I think this is done from my end now U can check @xiedeyantu
If there is anything left please let me know

@Override public Enumerator<@Nullable Object[]> enumerator() {
return new CsvEnumerator<>(source, cancelFlag, false, null,
CsvEnumerator.arrayConverter(fieldTypes, fields, false));
CsvEnumerator.arrayConverter(fieldTypes, fields, false), ',');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a default parameter of ',' without modifying so many constructors, and have the constructors without the "char separator" also call the constructors with the "char separator"?

Comment thread site/_docs/adapter.md Outdated
* [Elasticsearch adapter](elasticsearch_adapter.html)
(<a href="{{ site.apiRoot }}/org/apache/calcite/adapter/elasticsearch/package-summary.html">calcite-elasticsearch</a>)
* [File adapter](file_adapter.html) (<a href="{{ site.apiRoot }}/org/apache/calcite/adapter/file/package-summary.html">calcite-file</a>)
* [File adapter](file_adapter.html) (<a href="{{ site.apiRoot }}/org/apache/calcite/adapter/file/package-summary.html">calcite-file</a>), including support for custom CSV delimiters
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove ", including support for custom CSV delimiters".

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 1, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants