Add Kwai-Klear SWE-smith mini agent trajectories dataset (#178)#190
Closed
neubig wants to merge 1 commit into
Closed
Add Kwai-Klear SWE-smith mini agent trajectories dataset (#178)#190neubig wants to merge 1 commit into
neubig wants to merge 1 commit into
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #178
This PR was created by an AI agent (OpenHands) on behalf of the user.
Summary
Kwai-Klear/SWE-smith-mini_swe_agent_plus-trajectories-66k.sample_raw.json,sample_std.json, andsample_sft.jsongenerated from the conversion pipeline.function_callandobservationroles instead of rewriting them to legacy role names.Dataset Source
trainFiles Added
datasets/kwai-klear_swe-smith-mini_swe_agent_plus-trajectories-66k/README.mddatasets/kwai-klear_swe-smith-mini_swe_agent_plus-trajectories-66k/requirements.txtdatasets/kwai-klear_swe-smith-mini_swe_agent_plus-trajectories-66k/extract_raw.pydatasets/kwai-klear_swe-smith-mini_swe_agent_plus-trajectories-66k/schema_raw.pydatasets/kwai-klear_swe-smith-mini_swe_agent_plus-trajectories-66k/raw_to_standardized.pydatasets/kwai-klear_swe-smith-mini_swe_agent_plus-trajectories-66k/sample_raw.jsondatasets/kwai-klear_swe-smith-mini_swe_agent_plus-trajectories-66k/sample_std.jsondatasets/kwai-klear_swe-smith-mini_swe_agent_plus-trajectories-66k/sample_sft.jsonSchema Mapping Summary
systemmessages are omitted because the OpenHands SFT converter supplies the target system prompt.usermessage maps to aTextObservationwith sourceuser.usermessages map toTextObservationwith sourceenvironmentbecause they are command execution results.assistantmessages containing a fenced bash command map toCodeAction(language="bash"); theTHOUGHT:text before the command is preserved as the action description.MessageAction.MINI_SWE_AGENT_FINAL_OUTPUTsubmission command receive a final success observation and<finish>message for OpenHands SFT conversion.Design Decisions
resolvedfield. Chosen approach: Treat trajectories with the explicitMINI_SWE_AGENT_FINAL_OUTPUTsubmission command as completed and append the final ADP success/finish events. Example: The sample trajectories end withecho MINI_SWE_AGENT_FINAL_OUTPUT && git add -A && git diff --cached. Alternatives rejected: Adding a finish event to every row unconditionally could mislabel interrupted rows; omitting finish events would produce samples without a terminal completion signal.usermessages include both the initial task and later shell outputs. Chosen approach: Map only the first user message to sourceuser; map subsequent user messages to sourceenvironment. Example:<returncode>0</returncode>...messages become environment observations. Alternatives rejected: Treating every user message as sourceuserloses the command/observation alternation.THOUGHT:section and a fenced bash block rather than structured tool calls. Chosen approach: Extract the fenced command asCodeAction(language="bash")and keep the thought text indescription. Example:THOUGHT: ... ```bash\nfind ...\n```becomes a bash code action. Alternatives rejected: Keeping the entire assistant turn asMessageActionwould lose executable structure.function_callandobservationroles after producing them. Chosen approach: Preserve those roles to match current ADP SFT validation. Example: Generated bash calls remainfrom: function_call. Alternatives rejected: Post-processing generated JSON would not be reproducible from the converter.Tests Run
python -m pytest tests/test_dataset_structure.py -vpython -m pytest tests/test_raw_schemas.py -v -k kwaipython -m pytest tests/test_standardized_schemas.py -v -k kwaipython -m pytest tests/test_std_to_sft_conversion.py -v -k kwaipython -m pytest tests/test_datasets_from_parameter.py -vpython -m pytest tests/test_std_to_sft_from_parameter_simple.py tests/test_std_to_sft_structure.py -vpython -m ruff check agents/openhands/std_to_sft.py datasets/kwai-klear_swe-smith-mini_swe_agent_plus-trajectories-66kKnown Limitations
requirements.txton this Python 3.13 environment failed becausebrowsergym-coreattempted to build an incompatiblegreenlet; focused dependencies were installed to run the relevant validation commands.