Skip to content

[Bug] SELECT * FROM <view> throws "Index N out of bounds for length N" after ALTER TABLE ADD COLUMNS + REFRESH TABLE on the underlying Hive table #64006

@zhaorongsheng

Description

@zhaorongsheng

Search before asking

I searched for "view schema drift index out of bounds" and "external catalog view schema change" in the issue tracker and did not find a duplicate.

Doris Version

master (confirmed in LogicalView.java as of commit aa91628)

What's Wrong

Querying a Doris view that was created with SELECT * over an external (Hive) catalog table fails with an IndexOutOfBoundsException after:

  1. Adding a new column to the underlying Hive table (ALTER TABLE … ADD COLUMNS)
  2. Refreshing the base table metadata in Doris (REFRESH TABLE <base_table>)

The view itself is not refreshed; only the base table is refreshed.

Error:

ERROR 1105 (HY000): errCode = 2, detailMessage = Index 3 out of bounds for length 3

Reproducer

-- Step 1: Create Hive table (3 columns + partition)
-- (executed in Hive)
CREATE TABLE test.test_view_schema_drift (
  id     bigint,
  name   string,
  age    string
)
PARTITIONED BY (dt string)
STORED AS PARQUET;

-- Step 2: In Doris (Hive catalog context)
SWITCH hive;
DESCRIBE test.test_view_schema_drift;   -- shows 3 non-partition columns

CREATE VIEW test.test_view AS
  SELECT * FROM test.test_view_schema_drift
  WHERE dt = date_sub(current_date(), 1);

SELECT * FROM test.test_view WHERE 1=0;
-- OK: returns 3 columns (empty result)

-- Step 3: Add a column in Hive
ALTER TABLE test.test_view_schema_drift ADD COLUMNS (score string COMMENT 'new col');

-- Step 4: Refresh base table in Doris
SWITCH hive;
REFRESH TABLE test.test_view_schema_drift;

SELECT * FROM test.test_view_schema_drift WHERE 1=0;
-- OK: returns 4 columns now

SELECT * FROM test.test_view WHERE 1=0;
-- FAIL: Index 3 out of bounds for length 3

Root Cause

LogicalView.computeOutput() iterates over childOutput (the output of the re-analyzed view body, which reflects the refreshed 4-column base table). For each slot it calls view.getFullSchema().get(i).

view.getFullSchema() is derived from the view's metadata in the Hive metastore, which was created when the base table had 3 columns. Since only REFRESH TABLE base_table was called (not REFRESH TABLE view), the view's stored schema still has 3 columns. When i = 3, get(3) throws IndexOutOfBoundsException.

// LogicalView.java – before fix
for (int i = 0; i < childOutput.size(); i++) {   // childOutput.size() = 4
    ...
    if (CollectionUtils.isEmpty(view.getFullSchema())) {
        qualified = originSlot.withQualifier(fullQualifiers);
    } else {
        // BUG: view.getFullSchema().size() == 3, crashes at i == 3
        qualified = originSlot.withOneLevelTableAndColumnAndQualifier(
            view, view.getFullSchema().get(i), fullQualifiers);
    }
}

The isEmpty() guard added in #40715 handles null/empty fullSchema but not the under-sized case introduced by schema drift.

Expected Behavior

Querying the view should not throw. The new column (added after view creation) should appear in the result set with its qualifier correctly applied (falling back to withQualifier() as the existing null-guard branch does).

Impact / Workaround

Impact: Any user who

  1. Creates a Doris VIEW (using SELECT *) on an external catalog table, AND
  2. Later adds columns to the base table and calls REFRESH TABLE <base_table>

will hit this crash when querying the view.

Workaround: Execute REFRESH TABLE <view> (or DROP VIEW + CREATE VIEW) after adding columns, so that the view's stored schema is also refreshed before querying.

Proposed Fix

Extend the guard condition to also cover i >= fullSchema.size():

List<Column> fullSchema = view.getFullSchema();
if (CollectionUtils.isEmpty(fullSchema) || i >= fullSchema.size()) {
    qualified = originSlot.withQualifier(fullQualifiers);
} else {
    qualified = originSlot
            .withOneLevelTableAndColumnAndQualifier(view, fullSchema.get(i), fullQualifiers);
}

PR: (to be linked after submission)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions