Improve error message when column_name mismatches the dataset#3808
Open
Improve error message when column_name mismatches the dataset#3808
Conversation
0469cdf to
5a18ee2
Compare
5a18ee2 to
7594af7
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
7594af7 to
d4f2343
Compare
|
🤖 Hi @RissyRan, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
The Pull Request successfully improves the error messages when dataset columns do not match the user-provided configuration. This is a valuable addition for usability, as it provides clear feedback on available columns and how to resolve the mismatch.
🔍 General Feedback
- Consistency: The error reporting logic is applied consistently across TFRecord and TFDS input pipelines.
- Exception Types: I've suggested using
ValueErrorinstead ofKeyErrorto align with the existing exception handling patterns in the MaxText input pipeline. - Error Formatting: A minor improvement was suggested to handle potential duplicate missing columns and provide a slightly more descriptive error message.
d4f2343 to
29371f8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Currently when there is mismatch between the train_data_columns or eval_data_columns set by user and the columns in dataset, the error message is unclear, with no pointer to the available columns and the related flags. This PR improves it.
Tests
Manually tested.
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.