🎯 Goal (What & Why)
With the merging of PR #264, we will have a separate evaluate command that uses the training config to evaluate a trained model by loading the latest checkpoint.
However, we also need the capability to evaluate arbitrary models — potentially with automatic downloading from the Hugging Face Hub for all supported models.
Here, let's discuss the features we need for such a command and what the configuration file should look like.
🚀 Execution Plan
(This section may start as an incomplete draft but must be defined before implementation begins.)
Step 1: What is the smallest working version?
(Describe the simplest way to implement this feature with minimal effort.)
Step 2: What additional optimizations are possible (but optional)?
(List potential refinements that can be added in later PRs if needed.)
📌 Acceptance Criteria (Must-Haves for Completion)
- The feature must be functional and tested.
- The implementation must be documented in practical terms.
- The PR must include a performance/impact summary.
- No refactors unless directly necessary for feature completion.
🛠️ Project Management
🎯 Goal (What & Why)
With the merging of PR #264, we will have a separate
evaluatecommand that uses the training config to evaluate a trained model by loading the latest checkpoint.However, we also need the capability to evaluate arbitrary models — potentially with automatic downloading from the Hugging Face Hub for all supported models.
Here, let's discuss the features we need for such a command and what the configuration file should look like.
🚀 Execution Plan
Step 1: What is the smallest working version?
Step 2: What additional optimizations are possible (but optional)?
📌 Acceptance Criteria (Must-Haves for Completion)
🛠️ Project Management
Estimatefield (in days) in the GitHub project.Sizefield to categorize the PR size (Small/Medium/Large).