Skip to content

Fix/array type ip binary#5516

Open
vinaykpud wants to merge 2 commits into
opensearch-project:mainfrom
vinaykpud:fix/array_type_ip_binary
Open

Fix/array type ip binary#5516
vinaykpud wants to merge 2 commits into
opensearch-project:mainfrom
vinaykpud:fix/array_type_ip_binary

Conversation

@vinaykpud
Copy link
Copy Markdown
Contributor

@vinaykpud vinaykpud commented Jun 4, 2026

Description

Companion fix for the LIST nullability change in opensearch-project/OpenSearch#21997. After that fix, stats list(<field>) returns HTTP 200 for 10 of 12 element types — but list(ip_value) and list(binary_value) still fail with HTTP 400 ExpressionEvaluationException: unsupported object class [B. This PR fixes those two.

Root cause

Two layers, each correct in isolation:

  1. PPLBuiltinOperators.LIST was registered with return type PPLReturnTypes.STRING_ARRAY, which always returns ARRAY<VARCHAR> regardless of input element type — type information is lost.
  2. The UDT-aware response-boundary dispatch from Added UDT for IP and Binary support #5463 (AnalyticsExecutionEngine.toExprValue) checks type instanceof IpType / BinaryType per cell. With the column type collapsed to ARRAY<VARCHAR> it never matches, so per-element byte[] values fall through to ExprValueUtils.fromObjectValue, which has no byte[] case and throws.

Fix

Two small changes:

  1. PPLBuiltinOperators.LIST: switch return-type inference from STRING_ARRAY to ARG0_ARRAY, so the column type at the response boundary becomes ARRAY<IpType> / ARRAY<BinaryType> (matching the existing TAKE registration).
  2. AnalyticsExecutionEngine.toExprValue: when the column is ARRAY<IpType> / ARRAY<BinaryType>, recurse element-wise and apply the same canonical-IP-string / Base64 conversion already used for scalar IpType / BinaryType cells. Extracted the two byte[] converters into helpers (ipBytesToExprValue, binaryBytesToExprValue) so the scalar and array paths share them.

Testing

Query Before After
stats list(ip_value) HTTP 400 unsupported object class [B [["127.0.0.1"]]
stats list(binary_value) HTTP 400 unsupported object class [B [["U29tZSBiaW5hcnkgYmxvYg=="]]
stats list(<other 10 types>) HTTP 200 HTTP 200 (no regression)
`| fields ip_value, binary_value` HTTP 200 (#5463 path) HTTP 200 (no regression)

Out of scope

  • VALUES aggregate registration still uses STRING_ARRAY. Same fix would apply if anyone exercises values(ip_value) / values(binary_value); flagging for follow-up.
  • ListAggFunction.java still does String.valueOf(byte[]) — latent on the V3 (non-analytics-engine) path. Analytics path uses DataFusion's LOCAL_ARRAY_AGG_OP instead, so it doesn't fire here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit dad3901.

PathLineSeverityDescription
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java111highThe entire body of isAnalyticsIndex() has been commented out and replaced with an unconditional 'return true'. This is in production source code (not test code), and it bypasses all routing logic — the cluster-level composite setting check, the per-index pluggable dataformat check, and the null/empty query guard. Every incoming query is now unconditionally routed through the analytics (Calcite) execution engine regardless of index configuration or cluster settings. This eliminates operator control over which execution path handles queries and could be used to exploit behavioral or security differences in the analytics execution path.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 1 | Medium: 0 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@vinaykpud vinaykpud force-pushed the fix/array_type_ip_binary branch from b1f89a9 to b567be6 Compare June 4, 2026 19:07
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

PR Reviewer Guide 🔍

(Review updated until commit b491034)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

When processing arrays with mixed element types (e.g., a list containing both byte[] and non-byte[] elements), the code falls back to ExprValueUtils.fromObjectValue(elem) for non-byte[] elements. If fromObjectValue cannot handle those element types, it will throw an exception. This can occur if the array contains unexpected object types that are neither null, byte[], nor types supported by fromObjectValue.

if (component instanceof IpType) {
  List<ExprValue> elems = new ArrayList<>(list.size());
  for (Object elem : list) {
    if (elem == null) {
      elems.add(ExprValueUtils.nullValue());
    } else if (elem instanceof byte[] eb) {
      elems.add(ipBytesToExprValue(eb));
    } else {
      elems.add(ExprValueUtils.fromObjectValue(elem));
    }
  }
  return new ExprCollectionValue(elems);
} else if (component instanceof BinaryType) {
  List<ExprValue> elems = new ArrayList<>(list.size());
  for (Object elem : list) {
    if (elem == null) {
      elems.add(ExprValueUtils.nullValue());
    } else if (elem instanceof byte[] eb) {
      elems.add(binaryBytesToExprValue(eb));
    } else {
      elems.add(ExprValueUtils.fromObjectValue(elem));
    }
  }
  return new ExprCollectionValue(elems);

@vinaykpud vinaykpud force-pushed the fix/array_type_ip_binary branch from b567be6 to 69d24da Compare June 4, 2026 19:08
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

PR Code Suggestions ✨

Latest suggestions up to b491034

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check for component type

Add a null check for type.getComponentType() before using it. If the type is a list
but doesn't have a component type, this could result in a NullPointerException when
checking component instanceof IpType.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [235-261]

 if (value instanceof List<?> list) {
   RelDataType component = type.getComponentType();
+  if (component == null) {
+    return ExprValueUtils.fromObjectValue(value);
+  }
   if (component instanceof IpType) {
     List<ExprValue> elems = new ArrayList<>(list.size());
     for (Object elem : list) {
       if (elem == null) {
         elems.add(ExprValueUtils.nullValue());
       } else if (elem instanceof byte[] eb) {
         elems.add(ipBytesToExprValue(eb));
       } else {
         elems.add(ExprValueUtils.fromObjectValue(elem));
       }
     }
     return new ExprCollectionValue(elems);
   } else if (component instanceof BinaryType) {
     ...
   }
 }
Suggestion importance[1-10]: 7

__

Why: Adding a null check for component before using it prevents potential NullPointerException when type.getComponentType() returns null. This is a defensive programming practice that improves robustness, though the likelihood of encountering this scenario depends on the upstream type system guarantees.

Medium
General
Extract duplicate list conversion logic

The duplicate code blocks for IpType and BinaryType array handling differ only in
the conversion function called. Extract this logic into a helper method that accepts
a conversion function to eliminate code duplication and improve maintainability.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [238-260]

-List<ExprValue> elems = new ArrayList<>(list.size());
-for (Object elem : list) {
-  if (elem == null) {
-    elems.add(ExprValueUtils.nullValue());
-  } else if (elem instanceof byte[] eb) {
-    elems.add(ipBytesToExprValue(eb));
-  } else {
-    elems.add(ExprValueUtils.fromObjectValue(elem));
+return convertListToExprCollection(list, this::ipBytesToExprValue);
+
+// Helper method:
+private static ExprCollectionValue convertListToExprCollection(
+    List<?> list, Function<byte[], ExprValue> byteArrayConverter) {
+  List<ExprValue> elems = new ArrayList<>(list.size());
+  for (Object elem : list) {
+    if (elem == null) {
+      elems.add(ExprValueUtils.nullValue());
+    } else if (elem instanceof byte[] eb) {
+      elems.add(byteArrayConverter.apply(eb));
+    } else {
+      elems.add(ExprValueUtils.fromObjectValue(elem));
+    }
   }
+  return new ExprCollectionValue(elems);
 }
-return new ExprCollectionValue(elems);
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies code duplication between IpType and BinaryType list handling blocks. Extracting this into a helper method would improve maintainability and reduce code duplication, though the impact is moderate since the duplicated code is relatively small and localized.

Low

Previous suggestions

Suggestions up to commit 2734707
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check for component type

Add a null check for component before using instanceof to prevent potential
NullPointerException. The getComponentType() method may return null for non-array
types, which would cause the instanceof checks to fail unexpectedly.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [235-261]

 if (value instanceof List<?> list) {
   RelDataType component = type.getComponentType();
-  if (component instanceof IpType) {
-    List<ExprValue> elems = new ArrayList<>(list.size());
-    for (Object elem : list) {
-      if (elem == null) {
-        elems.add(ExprValueUtils.nullValue());
-      } else if (elem instanceof byte[] eb) {
-        elems.add(ipBytesToExprValue(eb));
-      } else {
-        elems.add(ExprValueUtils.fromObjectValue(elem));
+  if (component != null) {
+    if (component instanceof IpType) {
+      List<ExprValue> elems = new ArrayList<>(list.size());
+      for (Object elem : list) {
+        if (elem == null) {
+          elems.add(ExprValueUtils.nullValue());
+        } else if (elem instanceof byte[] eb) {
+          elems.add(ipBytesToExprValue(eb));
+        } else {
+          elems.add(ExprValueUtils.fromObjectValue(elem));
+        }
       }
+      return new ExprCollectionValue(elems);
+    } else if (component instanceof BinaryType) {
+      ...
     }
-    return new ExprCollectionValue(elems);
-  } else if (component instanceof BinaryType) {
-    ...
   }
 }
Suggestion importance[1-10]: 7

__

Why: Adding a null check for component before using instanceof is a valid defensive programming practice. The getComponentType() method could return null for non-array types, and while the code may work without it in the current context, the check prevents potential NullPointerException and makes the code more robust.

Medium
General
Extract duplicated list conversion logic

Extract the duplicated list conversion logic for IpType and BinaryType into a
separate helper method. This reduces code duplication and improves maintainability
since both branches have identical structure with only the conversion function
differing.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [238-260]

-List<ExprValue> elems = new ArrayList<>(list.size());
-for (Object elem : list) {
-  if (elem == null) {
-    elems.add(ExprValueUtils.nullValue());
-  } else if (elem instanceof byte[] eb) {
-    elems.add(ipBytesToExprValue(eb));
-  } else {
-    elems.add(ExprValueUtils.fromObjectValue(elem));
+return convertListToExprCollection(list, this::ipBytesToExprValue);
+
+// Helper method:
+private static ExprCollectionValue convertListToExprCollection(
+    List<?> list, Function<byte[], ExprValue> byteArrayConverter) {
+  List<ExprValue> elems = new ArrayList<>(list.size());
+  for (Object elem : list) {
+    if (elem == null) {
+      elems.add(ExprValueUtils.nullValue());
+    } else if (elem instanceof byte[] eb) {
+      elems.add(byteArrayConverter.apply(eb));
+    } else {
+      elems.add(ExprValueUtils.fromObjectValue(elem));
+    }
   }
+  return new ExprCollectionValue(elems);
 }
-return new ExprCollectionValue(elems);
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies code duplication between the IpType and BinaryType list conversion branches. Extracting this into a helper method would improve maintainability and reduce duplication, though the impact is moderate since the duplicated code is localized and relatively simple.

Low
Suggestions up to commit 69d24da
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check for component type

Add null check for component before using instanceof checks. If
type.getComponentType() returns null for non-array types, the subsequent instanceof
checks could behave unexpectedly or mask errors.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [235-261]

 if (value instanceof List<?> list) {
   RelDataType component = type.getComponentType();
-  if (component instanceof IpType) {
-    List<ExprValue> elems = new ArrayList<>(list.size());
-    for (Object elem : list) {
-      if (elem == null) {
-        elems.add(ExprValueUtils.nullValue());
-      } else if (elem instanceof byte[] eb) {
-        elems.add(ipBytesToExprValue(eb));
-      } else {
-        elems.add(ExprValueUtils.fromObjectValue(elem));
+  if (component != null) {
+    if (component instanceof IpType) {
+      List<ExprValue> elems = new ArrayList<>(list.size());
+      for (Object elem : list) {
+        if (elem == null) {
+          elems.add(ExprValueUtils.nullValue());
+        } else if (elem instanceof byte[] eb) {
+          elems.add(ipBytesToExprValue(eb));
+        } else {
+          elems.add(ExprValueUtils.fromObjectValue(elem));
+        }
       }
+      return new ExprCollectionValue(elems);
+    } else if (component instanceof BinaryType) {
+      List<ExprValue> elems = new ArrayList<>(list.size());
+      for (Object elem : list) {
+        if (elem == null) {
+          elems.add(ExprValueUtils.nullValue());
+        } else if (elem instanceof byte[] eb) {
+          elems.add(binaryBytesToExprValue(eb));
+        } else {
+          elems.add(ExprValueUtils.fromObjectValue(elem));
+        }
+      }
+      return new ExprCollectionValue(elems);
     }
-    return new ExprCollectionValue(elems);
-  } else if (component instanceof BinaryType) {
-    List<ExprValue> elems = new ArrayList<>(list.size());
-    for (Object elem : list) {
-      if (elem == null) {
-        elems.add(ExprValueUtils.nullValue());
-      } else if (elem instanceof byte[] eb) {
-        elems.add(binaryBytesToExprValue(eb));
-      } else {
-        elems.add(ExprValueUtils.fromObjectValue(elem));
-      }
-    }
-    return new ExprCollectionValue(elems);
   }
 }
Suggestion importance[1-10]: 7

__

Why: Adding a null check for component before instanceof checks is a valid defensive programming practice. If getComponentType() returns null for non-array types, the current code would skip the special handling and fall through to fromObjectValue, which is reasonable but could mask issues. The suggestion improves robustness.

Medium
Suggestions up to commit b567be6
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check for component type

Add null check for component before using instanceof checks. If
type.getComponentType() returns null for non-array types, the code will proceed
without proper validation, potentially causing unexpected behavior.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [234-260]

 if (value instanceof List<?> list) {
   RelDataType component = type.getComponentType();
-  if (component instanceof IpType) {
-    List<ExprValue> elems = new ArrayList<>(list.size());
-    for (Object elem : list) {
-      if (elem == null) {
-        elems.add(ExprValueUtils.nullValue());
-      } else if (elem instanceof byte[] eb) {
-        elems.add(ipBytesToExprValue(eb));
-      } else {
-        elems.add(ExprValueUtils.fromObjectValue(elem));
+  if (component != null) {
+    if (component instanceof IpType) {
+      List<ExprValue> elems = new ArrayList<>(list.size());
+      for (Object elem : list) {
+        if (elem == null) {
+          elems.add(ExprValueUtils.nullValue());
+        } else if (elem instanceof byte[] eb) {
+          elems.add(ipBytesToExprValue(eb));
+        } else {
+          elems.add(ExprValueUtils.fromObjectValue(elem));
+        }
       }
+      return new org.opensearch.sql.data.model.ExprCollectionValue(elems);
+    } else if (component instanceof BinaryType) {
+      List<ExprValue> elems = new ArrayList<>(list.size());
+      for (Object elem : list) {
+        if (elem == null) {
+          elems.add(ExprValueUtils.nullValue());
+        } else if (elem instanceof byte[] eb) {
+          elems.add(binaryBytesToExprValue(eb));
+        } else {
+          elems.add(ExprValueUtils.fromObjectValue(elem));
+        }
+      }
+      return new org.opensearch.sql.data.model.ExprCollectionValue(elems);
     }
-    return new org.opensearch.sql.data.model.ExprCollectionValue(elems);
-  } else if (component instanceof BinaryType) {
-    List<ExprValue> elems = new ArrayList<>(list.size());
-    for (Object elem : list) {
-      if (elem == null) {
-        elems.add(ExprValueUtils.nullValue());
-      } else if (elem instanceof byte[] eb) {
-        elems.add(binaryBytesToExprValue(eb));
-      } else {
-        elems.add(ExprValueUtils.fromObjectValue(elem));
-      }
-    }
-    return new org.opensearch.sql.data.model.ExprCollectionValue(elems);
   }
 }
Suggestion importance[1-10]: 7

__

Why: Adding a null check for component before using instanceof checks is a valid defensive programming practice. If type.getComponentType() returns null for non-array types, the current code would skip the type-specific handling and fall through to the default path, which is likely the intended behavior. However, making this explicit improves code clarity and prevents potential NPEs if the logic changes.

Medium
General
Extract duplicated list conversion logic

Extract the duplicated list conversion logic into a separate helper method. The same
pattern appears for both IpType and BinaryType, differing only in the conversion
function used for byte arrays.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [237-247]

-List<ExprValue> elems = new ArrayList<>(list.size());
-for (Object elem : list) {
-  if (elem == null) {
-    elems.add(ExprValueUtils.nullValue());
-  } else if (elem instanceof byte[] eb) {
-    elems.add(ipBytesToExprValue(eb));
-  } else {
-    elems.add(ExprValueUtils.fromObjectValue(elem));
-  }
-}
-return new org.opensearch.sql.data.model.ExprCollectionValue(elems);
+return convertListToExprCollection(list, AnalyticsExecutionEngine::ipBytesToExprValue);
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies code duplication between the IpType and BinaryType list conversion blocks. Extracting this into a helper method would improve maintainability and reduce code duplication. However, the improved_code snippet only shows the call site without the helper method implementation, and the score is moderate since this is a code quality improvement rather than a bug fix.

Low

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Persistent review updated to latest commit 69d24da

@vinaykpud vinaykpud force-pushed the fix/array_type_ip_binary branch from 69d24da to 2734707 Compare June 4, 2026 19:15
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Persistent review updated to latest commit 2734707

@vinaykpud vinaykpud added enhancement New feature or request PPL Piped processing language labels Jun 4, 2026
@ahkcs
Copy link
Copy Markdown
Collaborator

ahkcs commented Jun 4, 2026

Please take a look at the CI failures

@vinaykpud vinaykpud force-pushed the fix/array_type_ip_binary branch from 2734707 to dad3901 Compare June 4, 2026 21:11
@vinaykpud vinaykpud closed this Jun 4, 2026
@vinaykpud vinaykpud reopened this Jun 4, 2026
vinaykpud added 2 commits June 5, 2026 00:35
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
@vinaykpud vinaykpud force-pushed the fix/array_type_ip_binary branch from dad3901 to b491034 Compare June 5, 2026 00:36
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Persistent review updated to latest commit b491034

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants