Fix PPL CalciteException for non-ASCII string literals (e.g. Chinese characters)#5504
Conversation
visitLiteral() built VARCHAR/CHAR types using typeFactory.createSqlType(SqlTypeName.VARCHAR) without specifying a charset. Calcite defaults to ISO-8859-1, which cannot encode non-Latin characters, causing a CalciteException at query time. Fix: explicitly create the type with UTF-8 charset and IMPLICIT collation via typeFactory.createTypeWithCharsetAndCollation() for both the CHAR(1) and VARCHAR branches of the STRING literal case. This is a regression introduced in 3.6.0 when the PPL/Calcite integration was added. SQL queries were unaffected because the SQL path uses a different literal-building flow. Fixes opensearch-project/OpenSearch#21880 Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds an explicit UTF-8 charset/collation when producing Calcite string literals to prevent non-ASCII literals from throwing, and introduces a regression test for the reported failure.
Changes:
- Update
visitLiteralto buildCHAR/VARCHARtypes withUTF-8charset and implicit collation. - Add a regression test covering Chinese/Arabic literals and the
CHAR(1)path.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java | Forces UTF-8 charset/collation for string literals to avoid Calcite NlsString rejection of non-ASCII. |
| core/src/test/java/org/opensearch/sql/calcite/CalciteRexNodeVisitorTest.java | Adds regression coverage for non-ASCII string literal visitation and CHAR(1) behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
PR Reviewer Guide 🔍(Review updated until commit cd5d733)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to cd5d733
Previous suggestionsSuggestions up to commit 9e379cd
|
- Remove unused realRexBuilder variable (context.rexBuilder is already a real ExtendedRexBuilder backed by TYPE_FACTORY via the constructor) - Add charset assertions to verify resulting RelDataType carries UTF-8, so future accidental charset drops are caught - Remove unused RexBuilder import Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
|
Persistent review updated to latest commit cd5d733 |
Summary
Hi @dai-chen
PPL queries containing non-ASCII string literals (Chinese, Arabic, etc.) fail with a
CalciteExceptionon OpenSearch 3.6.0, while the identical query worked on 3.1 and the equivalent SQL query works fine on 3.6.0.Root cause: In
CalciteRexNodeVisitor.visitLiteral(), theSTRINGcase builds aVARCHAR/CHARtype usingtypeFactory.createSqlType(SqlTypeName.VARCHAR)without specifying a charset. Calcite defaults to ISO-8859-1, which cannot encode non-Latin characters — causing the exception insideRexBuilder.makeLiteral()→NlsString.<init>().Fix: Explicitly create the type with UTF-8 charset and
IMPLICITcollation viatypeFactory.createTypeWithCharsetAndCollation()for both theCHAR(1)andVARCHARbranches of theSTRINGliteral case.Changes
CalciteRexNodeVisitor.javaCHAR/VARCHARtypes for string literalsCalciteRexNodeVisitorTest.javaTest plan
testVisitLiteralNonAsciiStringDoesNotThrow— verifies Chinese (未处置), Arabic (مرحبا), and single non-ASCII char (中) literals build successfully without throwingCalciteExceptionCalciteRexNodeVisitorTesttests continue to passFixes opensearch-project/OpenSearch#21880