Search before asking
I searched for "view schema drift index out of bounds" and "external catalog view schema change" in the issue tracker and did not find a duplicate.
Doris Version
master (confirmed in LogicalView.java as of commit aa91628)
What's Wrong
Querying a Doris view that was created with SELECT * over an external (Hive) catalog table fails with an IndexOutOfBoundsException after:
- Adding a new column to the underlying Hive table (
ALTER TABLE … ADD COLUMNS)
- Refreshing the base table metadata in Doris (
REFRESH TABLE <base_table>)
The view itself is not refreshed; only the base table is refreshed.
Error:
ERROR 1105 (HY000): errCode = 2, detailMessage = Index 3 out of bounds for length 3
Reproducer
-- Step 1: Create Hive table (3 columns + partition)
-- (executed in Hive)
CREATE TABLE test.test_view_schema_drift (
id bigint,
name string,
age string
)
PARTITIONED BY (dt string)
STORED AS PARQUET;
-- Step 2: In Doris (Hive catalog context)
SWITCH hive;
DESCRIBE test.test_view_schema_drift; -- shows 3 non-partition columns
CREATE VIEW test.test_view AS
SELECT * FROM test.test_view_schema_drift
WHERE dt = date_sub(current_date(), 1);
SELECT * FROM test.test_view WHERE 1=0;
-- OK: returns 3 columns (empty result)
-- Step 3: Add a column in Hive
ALTER TABLE test.test_view_schema_drift ADD COLUMNS (score string COMMENT 'new col');
-- Step 4: Refresh base table in Doris
SWITCH hive;
REFRESH TABLE test.test_view_schema_drift;
SELECT * FROM test.test_view_schema_drift WHERE 1=0;
-- OK: returns 4 columns now
SELECT * FROM test.test_view WHERE 1=0;
-- FAIL: Index 3 out of bounds for length 3
Root Cause
LogicalView.computeOutput() iterates over childOutput (the output of the re-analyzed view body, which reflects the refreshed 4-column base table). For each slot it calls view.getFullSchema().get(i).
view.getFullSchema() is derived from the view's metadata in the Hive metastore, which was created when the base table had 3 columns. Since only REFRESH TABLE base_table was called (not REFRESH TABLE view), the view's stored schema still has 3 columns. When i = 3, get(3) throws IndexOutOfBoundsException.
// LogicalView.java – before fix
for (int i = 0; i < childOutput.size(); i++) { // childOutput.size() = 4
...
if (CollectionUtils.isEmpty(view.getFullSchema())) {
qualified = originSlot.withQualifier(fullQualifiers);
} else {
// BUG: view.getFullSchema().size() == 3, crashes at i == 3
qualified = originSlot.withOneLevelTableAndColumnAndQualifier(
view, view.getFullSchema().get(i), fullQualifiers);
}
}
The isEmpty() guard added in #40715 handles null/empty fullSchema but not the under-sized case introduced by schema drift.
Expected Behavior
Querying the view should not throw. The new column (added after view creation) should appear in the result set with its qualifier correctly applied (falling back to withQualifier() as the existing null-guard branch does).
Impact / Workaround
Impact: Any user who
- Creates a Doris
VIEW (using SELECT *) on an external catalog table, AND
- Later adds columns to the base table and calls
REFRESH TABLE <base_table>
will hit this crash when querying the view.
Workaround: Execute REFRESH TABLE <view> (or DROP VIEW + CREATE VIEW) after adding columns, so that the view's stored schema is also refreshed before querying.
Proposed Fix
Extend the guard condition to also cover i >= fullSchema.size():
List<Column> fullSchema = view.getFullSchema();
if (CollectionUtils.isEmpty(fullSchema) || i >= fullSchema.size()) {
qualified = originSlot.withQualifier(fullQualifiers);
} else {
qualified = originSlot
.withOneLevelTableAndColumnAndQualifier(view, fullSchema.get(i), fullQualifiers);
}
PR: (to be linked after submission)
Search before asking
I searched for "view schema drift index out of bounds" and "external catalog view schema change" in the issue tracker and did not find a duplicate.
Doris Version
master (confirmed in
LogicalView.javaas of commit aa91628)What's Wrong
Querying a Doris view that was created with
SELECT *over an external (Hive) catalog table fails with anIndexOutOfBoundsExceptionafter:ALTER TABLE … ADD COLUMNS)REFRESH TABLE <base_table>)The view itself is not refreshed; only the base table is refreshed.
Error:
Reproducer
Root Cause
LogicalView.computeOutput()iterates overchildOutput(the output of the re-analyzed view body, which reflects the refreshed 4-column base table). For each slot it callsview.getFullSchema().get(i).view.getFullSchema()is derived from the view's metadata in the Hive metastore, which was created when the base table had 3 columns. Since onlyREFRESH TABLE base_tablewas called (notREFRESH TABLE view), the view's stored schema still has 3 columns. Wheni = 3,get(3)throwsIndexOutOfBoundsException.The
isEmpty()guard added in #40715 handlesnull/emptyfullSchemabut not the under-sized case introduced by schema drift.Expected Behavior
Querying the view should not throw. The new column (added after view creation) should appear in the result set with its qualifier correctly applied (falling back to
withQualifier()as the existing null-guard branch does).Impact / Workaround
Impact: Any user who
VIEW(usingSELECT *) on an external catalog table, ANDREFRESH TABLE <base_table>will hit this crash when querying the view.
Workaround: Execute
REFRESH TABLE <view>(orDROP VIEW + CREATE VIEW) after adding columns, so that the view's stored schema is also refreshed before querying.Proposed Fix
Extend the guard condition to also cover
i >= fullSchema.size():PR: (to be linked after submission)