Skip to content

fix: admin-adjustable max columns with a clear overflow message#4669

Open
Ma77Ball wants to merge 12 commits intoapache:mainfrom
Ma77Ball:fix/adjustable_univocity
Open

fix: admin-adjustable max columns with a clear overflow message#4669
Ma77Ball wants to merge 12 commits intoapache:mainfrom
Ma77Ball:fix/adjustable_univocity

Conversation

@Ma77Ball
Copy link
Copy Markdown
Contributor

@Ma77Ball Ma77Ball commented May 2, 2026

What changes were proposed in this PR?

  • Admin-adjustable max columns. Adds csv_parser_max_columns and result_table_columns_per_batch to Admin → Settings → Result Panel. The CSV scan operator and the result table read these from site_settings at runtime;
    defaults live in default.conf.
  • Readable overflow error. When a CSV row exceeds the limit, the operator now throws RuntimeException("Max columns of N exceeded.") Instead of letting Univocity's raw stack trace through. Detection looks at the cause
    (ArrayIndexOutOfBoundsException) and parses the index from both the Java 8 and Java 9+ message formats — needed because Univocity's own hint string is silently dropped on Java 9+.
  • Refactor. Extracts the repeated "read int/long from site_settings with fallback" pattern into common/dao/SiteSettings, replacing inline jOOQ in ConfigResource, CSVScanSourceOpExec, and DatasetResource.

Any related issues, documentation, discussions?

closes #4589
Related #3825

How was this PR tested?

  • New CSVScanSourceOpExecSpec: happy path, end-of-input, overflow translation against a real parser, and isColumnOverflow cases for both AIOOBE message formats.
  • Expanded admin-settings.component.spec.ts and result-table-frame.component.spec.ts for load/save/reset and the columnLimit fallback chain.
  • sbt Test/compile clean across DAO, ConfigService, WorkflowOperator, FileService. Frontend specs need Node ≥ 20.19 to run.
  • Manual: lowered the limit, ran a workflow on a CSV that exceeded it, confirmed the result panel shows Max columns of N exceeded.

Was this PR authored or co-authored using generative AI tooling?

Co-authored with: Claude Opus 4.7

@github-actions github-actions Bot added fix frontend Changes related to the frontend GUI service common labels May 2, 2026
@Ma77Ball Ma77Ball changed the title admin-adjustable max columns with a clear overflow message fix: admin-adjustable max columns with a clear overflow message May 2, 2026
Copy link
Copy Markdown
Contributor

@aglinxinyuan aglinxinyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to make this Admin-adjustable? I don't feel this makes much sense.

It should be either fixed at a large number or user-adjustable.

@Yicong-Huang
Copy link
Copy Markdown
Contributor

#3825 is already closed?

@chenlica
Copy link
Copy Markdown
Contributor

chenlica commented May 2, 2026

Why do we want to make this Admin-adjustable? I don't feel this makes much sense.

It should be either fixed at a large number or user-adjustable.

We want this parameter to be adjustable by the admins. I am OK with fixing it.

@chenlica
Copy link
Copy Markdown
Contributor

chenlica commented May 2, 2026

@kunwp1 @SarahAsad23 Feel free to chime in based on a related use case with a collaborator.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 2, 2026

Codecov Report

❌ Patch coverage is 51.11111% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.52%. Comparing base (9e8ddfc) to head (bfc87e3).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...mponent/admin/settings/admin-settings.component.ts 46.66% 8 Missing ⚠️
...operator/source/scan/csv/CSVScanSourceOpExec.scala 64.70% 4 Missing and 2 partials ⚠️
...ain/scala/org/apache/texera/dao/SiteSettings.scala 0.00% 5 Missing ⚠️
...pache/texera/service/resource/ConfigResource.scala 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #4669      +/-   ##
============================================
- Coverage     43.53%   43.52%   -0.02%     
+ Complexity     2084     2080       -4     
============================================
  Files           957      958       +1     
  Lines         34077    34110      +33     
  Branches       3753     3755       +2     
============================================
+ Hits          14836    14846      +10     
- Misses        18445    18463      +18     
- Partials        796      801       +5     
Flag Coverage Δ
access-control-service 28.12% <ø> (ø)
amber 41.93% <56.00%> (-0.02%) ⬇️
computing-unit-managing-service 0.00% <ø> (ø)
config-service 0.00% <0.00%> (ø)
file-service 32.18% <100.00%> (-1.06%) ⬇️
frontend 35.31% <46.66%> (+0.02%) ⬆️
workflow-compiling-service 47.72% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@aglinxinyuan
Copy link
Copy Markdown
Contributor

Why do we want to make this Admin-adjustable? I don't feel this makes much sense.

It should be either fixed at a large number or user-adjustable.

We want this parameter to be adjustable by the admins. I am OK with fixing it.

Let's make it a large fixed number, unless there is a disadvantage for that.

@Ma77Ball
Copy link
Copy Markdown
Contributor Author

Ma77Ball commented May 2, 2026

Why do we want to make this Admin-adjustable? I don't feel this makes much sense.

It should be either fixed at a large number or user-adjustable.

We want this parameter to be adjustable by the admins. I am OK with fixing it.

Let's make it a large fixed number, unless there is a disadvantage for that.

The disadvantage is that univocity allocates memory per parser based on that number (10,000 is negligible, but something like 1,000,000 is not when you want to scale the platform).

@Ma77Ball
Copy link
Copy Markdown
Contributor Author

Ma77Ball commented May 2, 2026

#3825 is already closed?

That PR was closed after we decided in a meeting to keep it static, as @aglinxinyuan suggested (it only fixes the display of CSVs in the result panel with more columns). The problem is that users are hitting the 512 limit and need it increased.

@aglinxinyuan
Copy link
Copy Markdown
Contributor

Why do we want to make this Admin-adjustable? I don't feel this makes much sense.

It should be either fixed at a large number or user-adjustable.

We want this parameter to be adjustable by the admins. I am OK with fixing it.

Let's make it a large fixed number, unless there is a disadvantage for that.

The disadvantage is that univocity allocates memory per parser based on that number (10,000 is negligible, but something like 1,000,000 is not when you want to scale the platform).

This makes sense to me now. Why do we also need batch size adjustable for result table? Can we just pick a fixed number?

@github-actions github-actions Bot added the platform Non-amber Scala service paths label May 2, 2026
@Ma77Ball
Copy link
Copy Markdown
Contributor Author

Ma77Ball commented May 3, 2026

Why do we want to make this Admin-adjustable? I don't feel this makes much sense.

It should be either fixed at a large number or user-adjustable.

We want this parameter to be adjustable by the admins. I am OK with fixing it.

Let's make it a large fixed number, unless there is a disadvantage for that.

The disadvantage is that univocity allocates memory per parser based on that number (10,000 is negligible, but something like 1,000,000 is not when you want to scale the platform).

This makes sense to me now. Why do we also need batch size adjustable for result table? Can we just pick a fixed number?

That setting I added because the result panel settings menu seems a little empty, and I thought it might be a nice feature for users. I can remove it if it should be in a different PR or if it's not needed.

@aglinxinyuan
Copy link
Copy Markdown
Contributor

Why do we want to make this Admin-adjustable? I don't feel this makes much sense.

It should be either fixed at a large number or user-adjustable.

We want this parameter to be adjustable by the admins. I am OK with fixing it.

Let's make it a large fixed number, unless there is a disadvantage for that.

The disadvantage is that univocity allocates memory per parser based on that number (10,000 is negligible, but something like 1,000,000 is not when you want to scale the platform).

This makes sense to me now. Why do we also need batch size adjustable for result table? Can we just pick a fixed number?

That setting I added because the result panel settings menu seems a little empty, and I thought it might be a nice feature for users. I can remove it if it should be in a different PR or if it's not needed.

Let's don't over complicate the settings.

@chenlica
Copy link
Copy Markdown
Contributor

chenlica commented May 3, 2026

Agreed. Let's take a simple solution.

@Ma77Ball
Copy link
Copy Markdown
Contributor Author

Ma77Ball commented May 3, 2026

@aglinxinyuan, please review again. I removed the other column-per-result-panel setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common fix frontend Changes related to the frontend GUI platform Non-amber Scala service paths service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CSV File Scan cannot read a csv with more than 512 columns

5 participants