Skip to content

Improve error message when column_name mismatches the dataset#3808

Open
aireenmei wants to merge 1 commit intomainfrom
aireen/grain_error
Open

Improve error message when column_name mismatches the dataset#3808
aireenmei wants to merge 1 commit intomainfrom
aireen/grain_error

Conversation

@aireenmei
Copy link
Copy Markdown
Collaborator

@aireenmei aireenmei commented May 4, 2026

Description

Currently when there is mismatch between the train_data_columns or eval_data_columns set by user and the columns in dataset, the error message is unclear, with no pointer to the available columns and the related flags. This PR improves it.

Tests

Manually tested.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@aireenmei aireenmei force-pushed the aireen/grain_error branch from 0469cdf to 5a18ee2 Compare May 4, 2026 21:13
@aireenmei aireenmei changed the title Improve error handling in Grain pipeline Improve error message when column_name mismatches the dataset May 4, 2026
@aireenmei aireenmei force-pushed the aireen/grain_error branch from 5a18ee2 to 7594af7 Compare May 5, 2026 00:03
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 56.25000% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/maxtext/input_pipeline/input_pipeline_utils.py 61.53% 2 Missing and 3 partials ⚠️
src/maxtext/input_pipeline/tfds_data_processing.py 33.33% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@aireenmei aireenmei force-pushed the aireen/grain_error branch from 7594af7 to d4f2343 Compare May 5, 2026 00:19
@aireenmei aireenmei marked this pull request as ready for review May 5, 2026 00:20
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

🤖 Hi @RissyRan, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## 📋 Review Summary

The Pull Request successfully improves the error messages when dataset columns do not match the user-provided configuration. This is a valuable addition for usability, as it provides clear feedback on available columns and how to resolve the mismatch.

🔍 General Feedback

  • Consistency: The error reporting logic is applied consistently across TFRecord and TFDS input pipelines.
  • Exception Types: I've suggested using ValueError instead of KeyError to align with the existing exception handling patterns in the MaxText input pipeline.
  • Error Formatting: A minor improvement was suggested to handle potential duplicate missing columns and provide a slightly more descriptive error message.

Copy link
Copy Markdown
Collaborator

@RissyRan RissyRan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@aireenmei aireenmei force-pushed the aireen/grain_error branch from d4f2343 to 29371f8 Compare May 5, 2026 01:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants