Add jupyter-agent dataset converter (#175)#197
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
🟢 Good taste - Exemplary dataset addition with comprehensive documentation and justified shared converter improvements.
Strengths:
- ✅ Complete file structure: all required files present (README, extract_raw.py, raw_to_standardized.py, schema_raw.py, all sample JSONs)
- ✅ Thorough design decision catalog: 4 decisions documented with ambiguity, chosen approach, examples, and rejected alternatives
- ✅ Proper schema mapping: final_answer → MessageAction with
<finish>tags (no api.py needed), tool → TextObservation(source="environment"), correct observation sources - ✅ Justified shared changes: Fixes real bug where function_call/observation roles were incorrectly rewritten to gpt/human; adds lazy-loading of browser dependencies
- ✅ All validation tests pass: dataset_structure, raw_schemas, standardized_schemas, std_to_sft_conversion, datasets_from_parameter
- ✅ Deterministic pipeline with reproducible commands shown
- ✅ No extraneous files (no full_*.json, temp files, or scratch JSON)
Minor documentation note: For AI-generated PRs, consider adding the conversation URL (e.g., https://app.all-hands.dev/conversations/{id}) to help reviewers trace the work.
[RISK ASSESSMENT]
🟢 LOW - New dataset addition with bug fixes to shared converter. Changes improve correctness (role preservation aligns with repository conventions) and robustness (lazy browser loading prevents import errors). Backward compatibility verified via test_datasets_from_parameter.py passing.
VERDICT: ✅ Ready to merge
KEY INSIGHT: The converter changes fix a subtle but important bug where the repository's stated conventions (function_call/observation roles) were being violated by the implementation - excellent alignment of code with design intent.
This review was generated by an AI agent (OpenHands).
…upyter-agent-dataset
Use a precise optional-browser import fallback and keep the role-preservation test compatible with the branch's browsing flag.\n\nCo-authored-by: openhands <openhands@all-hands.dev>
|
I merged current The PR checks are green after the cleanup. This comment was created by an AI agent (OpenHands) on behalf of the user. |
Co-authored-by: openhands <openhands@all-hands.dev>
The previous tip of this PR carried a copy of the 'make OpenHands browser tools optional' refactor in agents/openhands/system_prompt/tools/__init__.py, agents/openhands/system_prompt/system.py, agents/openhands/std_to_sft.py, and the tests/test_openhands_sft_role_preservation.py fake. The same diff was duplicated on #193 (CodeScout). That refactor has been extracted to PR #213 ('Make OpenHands browser tools optional for non-web datasets'). Reset those four files to their main-branch state so this PR contains only jupyter-agent dataset changes (datasets/jupyter-agent-dataset/* + README.md + agents/openhands/DATASETS.md catalog row). Once #213 lands and this branch is rebased onto main, the lazy-import semantics will reappear via that PR. Co-authored-by: openhands <openhands@all-hands.dev>
The previous tip of this PR carried a copy of the 'make OpenHands browser tools optional' refactor in agents/openhands/system_prompt/tools/__init__.py, agents/openhands/system_prompt/system.py, agents/openhands/std_to_sft.py, and the tests/test_openhands_sft_role_preservation.py fake. The same diff was duplicated on #197 (jupyter-agent). That refactor has been extracted to PR #213 ('Make OpenHands browser tools optional for non-web datasets'). Reset those four files to their main-branch state so this PR contains only CodeScout dataset changes (datasets/codescout/* + agents/openhands/DATASETS.md catalog row). Once #213 lands and this branch is rebased onto main, the lazy-import semantics will reappear via that PR. Co-authored-by: openhands <openhands@all-hands.dev>
|
I've extracted the lazy-import refactor that was sharing this PR's diff with #193 into its own PR #213 ("Make OpenHands browser tools optional for non-web datasets"). This PR has been force-pushed to drop Please merge #213 first; this branch should then rebase cleanly on This comment was posted by an AI agent (OpenHands) on behalf of the user. |
* Make OpenHands browser tools optional for non-web datasets Two changes to the OpenHands agent pipeline let non-web dataset converters run on machines that do not have browsergym installed: 1. agents/openhands/system_prompt/tools/__init__.py wraps the 'from .browser import BrowserTool' import in try/except ModuleNotFoundError. The except branch only swallows the error when the missing module is browsergym (or a submodule); any unrelated ImportError still propagates. The BrowserTool name is bound to None when browsergym is unavailable. 2. agents/openhands/system_prompt/system.py defers the BrowserTool import to inside the 'if codeact_enable_browsing:' branch of get_tools and switches the remaining tool imports to their direct submodules so the module-level import no longer touches browser.py. 3. agents/openhands/std_to_sft.py lazy-loads scripts.html_to_axtree.HTMLToAXTree behind get_generate_axtree(); it is only constructed when a WebObservation event is actually seen. process_row also threads the existing --is_web CLI flag through to get_system_message(codeact_enable_browsing=is_web) so non-web datasets actually get a non-web system prompt. 4. tests/test_openhands_sft_role_preservation.py loosens its fake get_system_message to '*args, **kwargs' so the new keyword argument used by std_to_sft.py does not break the fake. 5. A new regression test tests/test_optional_browser.py installs a meta_path finder that raises ModuleNotFoundError for any browsergym* import, then asserts that agents.openhands.system_prompt.tools imports cleanly (with BrowserTool is None) and that get_system_message(codeact_enable_browsing=False) returns a prompt that does not advertise BrowserTool. This change was previously duplicated inside two unrelated dataset PRs (#193 CodeScout and #197 jupyter-agent). Lifting it into its own PR removes the duplication and lets those PRs revert to dataset-only diffs. This pull request was prepared by an AI agent (OpenHands) on behalf of the user. Co-authored-by: openhands <openhands@all-hands.dev> * Skip optional-browser test if litellm is unavailable The previous version of tests/test_optional_browser.py reloaded agents.openhands.system_prompt.tools by monkeypatching sys.meta_path in-process. That fails in CI because the workflow's requirements.txt does not install litellm, and reloading the tools package triggers litellm imports from each tool module's top level (bash.py, finish.py, etc.). Two changes: 1. Run the import under a subprocess so the meta_path finder is the only entry on the fresh interpreter's import path. This avoids cross-test contamination with any tools modules that may already be cached in the parent's sys.modules. 2. Add a pytest.importorskip('litellm') guard. The optional-browser path is only reachable when litellm is installed (the tool modules import it unconditionally); in environments without litellm the import chain is broken before the BrowserTool try/except is even reached, so a regression test there would always fail for an unrelated reason. Co-authored-by: openhands <openhands@all-hands.dev> * Use PEP 451 find_spec/exec_module in optional-browser test finder Address inline review on #213: replace the legacy PEP 302 find_module/load_module pair with the modern PEP 451 find_spec/create_module/exec_module trio. The legacy interface is deprecated since Python 3.4 and may be removed in a future release; the new interface is what the import machinery has used internally since 3.4 and is forward-compatible. Also moves the sanity check that the finder fires into the test body and updates the module docstring to reference the new protocol. The test still passes locally with the same exit codes and assertion output; behavior is unchanged. Co-authored-by: openhands <openhands@all-hands.dev> --------- Co-authored-by: openhands <openhands@all-hands.dev>
Closes #175
This PR was created by an AI agent (OpenHands) on behalf of the user.
Summary
Adds an Agent Data Protocol converter for
jupyter-agent/jupyter-agent-dataset, including generated sample files for the raw, standardized, and OpenHands SFT stages.Dataset Source
thinkingandnon_thinkingsplits with 51,389 examples each. This converter defaults to thenon_thinkingsplit for samples because it has the same tool-call structure without thinking tags.Files Added/Updated
datasets/jupyter-agent-dataset/README.mddatasets/jupyter-agent-dataset/extract_raw.pydatasets/jupyter-agent-dataset/schema_raw.pydatasets/jupyter-agent-dataset/raw_to_standardized.pydatasets/jupyter-agent-dataset/requirements.txtsample_raw.json,sample_std.json, andsample_sft.jsonfunction_call/observationroles and to avoid importing browser dependencies for non-web conversion.Schema Mapping Summary
usermessages ->TextObservation(source="user")assistantmessages withadd_and_execute_jupyter_code_cell-> PythonCodeAction, preserving the assistant prose asdescriptiontoolmessages ->TextObservation(source="environment")assistantmessages withfinal_answer->MessageActionwith<finish> ... </finish>so the OpenHands SFT converter produces afinishtool callquestion,answer,executor_type,files_used, andpackages_usedis stored in trajectorydetailsDesign Decisions
Ambiguity: The source dataset provides two splits,
thinkingandnon_thinking.non_thinkingby default for ADP samples.extract_raw.pydefaultsJUPYTER_AGENT_SPLITtonon_thinking.thinkingwould preserve extra thinking-tag formatting that is not needed to validate the core tool-call mapping.Ambiguity: Jupyter execution appears as an OpenAI-style assistant tool call named
add_and_execute_jupyter_code_cell.CodeActionso the shared OpenHands converter maps it toexecute_ipython_cell.arguments.code = "import pandas as pd"becomesCodeAction(language="python", content="import pandas as pd", description=...).Ambiguity: Final answers are represented as a source tool call named
final_answerrather than plain assistant text.final_answer.answerto a<finish>MessageAction.final_answer(answer="453")becomes<finish> 453 </finish>in standardized data and afinishfunction call in SFT.final_answeras a customApiActionwould require an unnecessary dataset-local API and would not align with OpenHands completion conventions.Ambiguity: The raw
original_notebookfield can be very large.MAX_ORIGINAL_NOTEBOOK_CHARS=500000during sample generation to avoid oversized fixtures.original_notebookwould make the raw samples less faithful to the source schema.Ambiguity: The shared SFT converter rewrote
function_callmessages togpt, which conflicts with the repository convention that function-call syntax must usefrom: function_call.function_callandobservationroles in the shared converter output.from: function_callfollowed byfrom: observationinsample_sft.json.Tests Run
python -m ruff check agents/openhands/std_to_sft.py agents/openhands/system_prompt/system.py agents/openhands/system_prompt/tools/__init__.py datasets/jupyter-agent-datasetpython -m ruff format --check agents/openhands/std_to_sft.py agents/openhands/system_prompt/system.py agents/openhands/system_prompt/tools/__init__.py datasets/jupyter-agent-datasetpython -m pytest tests/test_dataset_structure.py tests/test_raw_schemas.py::test_sample_raw_against_schema[jupyter-agent-dataset] 'tests/test_standardized_schemas.py::test_sample_standardized_against_schema[/workspace/project/agent-data-protocol/datasets/jupyter-agent-dataset/sample_std.json]' tests/test_std_to_sft_conversion.py::test_std_to_sft_conversion[jupyter-agent-dataset] tests/test_datasets_from_parameter.py -qpython -m pytest tests/test_std_to_sft_action_function.py tests/test_std_to_sft_from_parameter_simple.py tests/test_std_to_sft_structure.py -qKnown Limitations
extract_raw.pystreams with the Hugging Facedatasetspackage when available and falls back to the Hugging Face dataset viewer API for small samples.