Add jupyter-agent dataset converter (#175) by neubig · Pull Request #197 · neulab/agent-data-protocol

neubig · 2026-05-14T04:02:44Z

Closes #175

This PR was created by an AI agent (OpenHands) on behalf of the user.

Summary

Adds an Agent Data Protocol converter for jupyter-agent/jupyter-agent-dataset, including generated sample files for the raw, standardized, and OpenHands SFT stages.

Dataset Source

Source: https://huggingface.co/datasets/jupyter-agent/jupyter-agent-dataset
Source code: https://github.com/huggingface/jupyter-agent
License: Apache-2.0
Size/split: the dataset card lists thinking and non_thinking splits with 51,389 examples each. This converter defaults to the non_thinking split for samples because it has the same tool-call structure without thinking tags.

Files Added/Updated

Added datasets/jupyter-agent-dataset/README.md
Added datasets/jupyter-agent-dataset/extract_raw.py
Added datasets/jupyter-agent-dataset/schema_raw.py
Added datasets/jupyter-agent-dataset/raw_to_standardized.py
Added datasets/jupyter-agent-dataset/requirements.txt
Added generated sample_raw.json, sample_std.json, and sample_sft.json
Updated the shared OpenHands SFT converter to preserve function_call/observation roles and to avoid importing browser dependencies for non-web conversion.

Schema Mapping Summary

Raw user messages -> TextObservation(source="user")
Raw assistant messages with add_and_execute_jupyter_code_cell -> Python CodeAction, preserving the assistant prose as description
Raw tool messages -> TextObservation(source="environment")
Raw assistant messages with final_answer -> MessageAction with <finish> ... </finish> so the OpenHands SFT converter produces a finish tool call
Metadata such as question, answer, executor_type, files_used, and packages_used is stored in trajectory details

Design Decisions

Ambiguity: The source dataset provides two splits, thinking and non_thinking.
- Chosen approach: Use non_thinking by default for ADP samples.
- Example: extract_raw.py defaults JUPYTER_AGENT_SPLIT to non_thinking.
- Alternatives rejected: Sampling thinking would preserve extra thinking-tag formatting that is not needed to validate the core tool-call mapping.
Ambiguity: Jupyter execution appears as an OpenAI-style assistant tool call named add_and_execute_jupyter_code_cell.
- Chosen approach: Convert it to a Python CodeAction so the shared OpenHands converter maps it to execute_ipython_cell.
- Example: A tool call with arguments.code = "import pandas as pd" becomes CodeAction(language="python", content="import pandas as pd", description=...).
- Alternatives rejected: Adding a dataset-specific API would duplicate the existing notebook execution path in the shared OpenHands SFT converter.
Ambiguity: Final answers are represented as a source tool call named final_answer rather than plain assistant text.
- Chosen approach: Convert final_answer.answer to a <finish> MessageAction.
- Example: final_answer(answer="453") becomes <finish> 453 </finish> in standardized data and a finish function call in SFT.
- Alternatives rejected: Preserving final_answer as a custom ApiAction would require an unnecessary dataset-local API and would not align with OpenHands completion conventions.
Ambiguity: The raw original_notebook field can be very large.
- Chosen approach: Keep the field in raw samples for fidelity, but use MAX_ORIGINAL_NOTEBOOK_CHARS=500000 during sample generation to avoid oversized fixtures.
- Example: The committed sample keeps three source rows with notebook metadata under that threshold.
- Alternatives rejected: Dropping original_notebook would make the raw samples less faithful to the source schema.
Ambiguity: The shared SFT converter rewrote function_call messages to gpt, which conflicts with the repository convention that function-call syntax must use from: function_call.
- Chosen approach: Preserve function_call and observation roles in the shared converter output.
- Example: Jupyter code execution now appears as from: function_call followed by from: observation in sample_sft.json.
- Alternatives rejected: Hand-editing the generated SFT sample would not be reproducible from the pipeline.

Tests Run

python -m ruff check agents/openhands/std_to_sft.py agents/openhands/system_prompt/system.py agents/openhands/system_prompt/tools/__init__.py datasets/jupyter-agent-dataset
python -m ruff format --check agents/openhands/std_to_sft.py agents/openhands/system_prompt/system.py agents/openhands/system_prompt/tools/__init__.py datasets/jupyter-agent-dataset
python -m pytest tests/test_dataset_structure.py tests/test_raw_schemas.py::test_sample_raw_against_schema[jupyter-agent-dataset] 'tests/test_standardized_schemas.py::test_sample_standardized_against_schema[/workspace/project/agent-data-protocol/datasets/jupyter-agent-dataset/sample_std.json]' tests/test_std_to_sft_conversion.py::test_std_to_sft_conversion[jupyter-agent-dataset] tests/test_datasets_from_parameter.py -q
python -m pytest tests/test_std_to_sft_action_function.py tests/test_std_to_sft_from_parameter_simple.py tests/test_std_to_sft_structure.py -q

Known Limitations

The full source dataset is large; extract_raw.py streams with the Hugging Face datasets package when available and falls back to the Hugging Face dataset viewer API for small samples.
The original Kaggle input files referenced by the trajectories are not redistributed in this repository.

Co-authored-by: openhands <openhands@all-hands.dev>

github-actions

🟢 Good taste - Exemplary dataset addition with comprehensive documentation and justified shared converter improvements.

Strengths:

✅ Complete file structure: all required files present (README, extract_raw.py, raw_to_standardized.py, schema_raw.py, all sample JSONs)
✅ Thorough design decision catalog: 4 decisions documented with ambiguity, chosen approach, examples, and rejected alternatives
✅ Proper schema mapping: final_answer → MessageAction with <finish> tags (no api.py needed), tool → TextObservation(source="environment"), correct observation sources
✅ Justified shared changes: Fixes real bug where function_call/observation roles were incorrectly rewritten to gpt/human; adds lazy-loading of browser dependencies
✅ All validation tests pass: dataset_structure, raw_schemas, standardized_schemas, std_to_sft_conversion, datasets_from_parameter
✅ Deterministic pipeline with reproducible commands shown
✅ No extraneous files (no full_*.json, temp files, or scratch JSON)

Minor documentation note: For AI-generated PRs, consider adding the conversation URL (e.g., https://app.all-hands.dev/conversations/{id}) to help reviewers trace the work.

[RISK ASSESSMENT]
🟢 LOW - New dataset addition with bug fixes to shared converter. Changes improve correctness (role preservation aligns with repository conventions) and robustness (lazy browser loading prevents import errors). Backward compatibility verified via test_datasets_from_parameter.py passing.

VERDICT: ✅ Ready to merge

KEY INSIGHT: The converter changes fix a subtle but important bug where the repository's stated conventions (function_call/observation roles) were being violated by the implementation - excellent alignment of code with design intent.

This review was generated by an AI agent (OpenHands).

…upyter-agent-dataset

Use a precise optional-browser import fallback and keep the role-preservation test compatible with the branch's browsing flag.\n\nCo-authored-by: openhands <openhands@all-hands.dev>

neubig · 2026-05-16T02:44:50Z

I merged current main into this branch, resolved the shared converter update, and tightened the optional browser import fallback to avoid masking unrelated import errors. I also made the shared role-preservation test compatible with this branch's codeact_enable_browsing call.

The PR checks are green after the cleanup.

This comment was created by an AI agent (OpenHands) on behalf of the user.

Co-authored-by: openhands <openhands@all-hands.dev>

The previous tip of this PR carried a copy of the 'make OpenHands browser tools optional' refactor in agents/openhands/system_prompt/tools/__init__.py, agents/openhands/system_prompt/system.py, agents/openhands/std_to_sft.py, and the tests/test_openhands_sft_role_preservation.py fake. The same diff was duplicated on #193 (CodeScout). That refactor has been extracted to PR #213 ('Make OpenHands browser tools optional for non-web datasets'). Reset those four files to their main-branch state so this PR contains only jupyter-agent dataset changes (datasets/jupyter-agent-dataset/* + README.md + agents/openhands/DATASETS.md catalog row). Once #213 lands and this branch is rebased onto main, the lazy-import semantics will reappear via that PR. Co-authored-by: openhands <openhands@all-hands.dev>

The previous tip of this PR carried a copy of the 'make OpenHands browser tools optional' refactor in agents/openhands/system_prompt/tools/__init__.py, agents/openhands/system_prompt/system.py, agents/openhands/std_to_sft.py, and the tests/test_openhands_sft_role_preservation.py fake. The same diff was duplicated on #197 (jupyter-agent). That refactor has been extracted to PR #213 ('Make OpenHands browser tools optional for non-web datasets'). Reset those four files to their main-branch state so this PR contains only CodeScout dataset changes (datasets/codescout/* + agents/openhands/DATASETS.md catalog row). Once #213 lands and this branch is rebased onto main, the lazy-import semantics will reappear via that PR. Co-authored-by: openhands <openhands@all-hands.dev>

neubig · 2026-05-18T01:04:18Z

I've extracted the lazy-import refactor that was sharing this PR's diff with #193 into its own PR #213 ("Make OpenHands browser tools optional for non-web datasets"). This PR has been force-pushed to drop agents/openhands/system_prompt/tools/__init__.py, agents/openhands/system_prompt/system.py, agents/openhands/std_to_sft.py, and the tests/test_openhands_sft_role_preservation.py fake — those files are now back at their main-branch state here, and the new diff is dataset-only (plus the agents/openhands/DATASETS.md catalog row).

Please merge #213 first; this branch should then rebase cleanly on main with the lazy-import semantics reappearing via that PR.

This comment was posted by an AI agent (OpenHands) on behalf of the user.

* Make OpenHands browser tools optional for non-web datasets Two changes to the OpenHands agent pipeline let non-web dataset converters run on machines that do not have browsergym installed: 1. agents/openhands/system_prompt/tools/__init__.py wraps the 'from .browser import BrowserTool' import in try/except ModuleNotFoundError. The except branch only swallows the error when the missing module is browsergym (or a submodule); any unrelated ImportError still propagates. The BrowserTool name is bound to None when browsergym is unavailable. 2. agents/openhands/system_prompt/system.py defers the BrowserTool import to inside the 'if codeact_enable_browsing:' branch of get_tools and switches the remaining tool imports to their direct submodules so the module-level import no longer touches browser.py. 3. agents/openhands/std_to_sft.py lazy-loads scripts.html_to_axtree.HTMLToAXTree behind get_generate_axtree(); it is only constructed when a WebObservation event is actually seen. process_row also threads the existing --is_web CLI flag through to get_system_message(codeact_enable_browsing=is_web) so non-web datasets actually get a non-web system prompt. 4. tests/test_openhands_sft_role_preservation.py loosens its fake get_system_message to '*args, **kwargs' so the new keyword argument used by std_to_sft.py does not break the fake. 5. A new regression test tests/test_optional_browser.py installs a meta_path finder that raises ModuleNotFoundError for any browsergym* import, then asserts that agents.openhands.system_prompt.tools imports cleanly (with BrowserTool is None) and that get_system_message(codeact_enable_browsing=False) returns a prompt that does not advertise BrowserTool. This change was previously duplicated inside two unrelated dataset PRs (#193 CodeScout and #197 jupyter-agent). Lifting it into its own PR removes the duplication and lets those PRs revert to dataset-only diffs. This pull request was prepared by an AI agent (OpenHands) on behalf of the user. Co-authored-by: openhands <openhands@all-hands.dev> * Skip optional-browser test if litellm is unavailable The previous version of tests/test_optional_browser.py reloaded agents.openhands.system_prompt.tools by monkeypatching sys.meta_path in-process. That fails in CI because the workflow's requirements.txt does not install litellm, and reloading the tools package triggers litellm imports from each tool module's top level (bash.py, finish.py, etc.). Two changes: 1. Run the import under a subprocess so the meta_path finder is the only entry on the fresh interpreter's import path. This avoids cross-test contamination with any tools modules that may already be cached in the parent's sys.modules. 2. Add a pytest.importorskip('litellm') guard. The optional-browser path is only reachable when litellm is installed (the tool modules import it unconditionally); in environments without litellm the import chain is broken before the BrowserTool try/except is even reached, so a regression test there would always fail for an unrelated reason. Co-authored-by: openhands <openhands@all-hands.dev> * Use PEP 451 find_spec/exec_module in optional-browser test finder Address inline review on #213: replace the legacy PEP 302 find_module/load_module pair with the modern PEP 451 find_spec/create_module/exec_module trio. The legacy interface is deprecated since Python 3.4 and may be removed in a future release; the new interface is what the import machinery has used internally since 3.4 and is forward-compatible. Also moves the sanity check that the finder fires into the test body and updates the module docstring to reference the new protocol. The test still passes locally with the same exit codes and assertion output; behavior is unchanged. Co-authored-by: openhands <openhands@all-hands.dev> --------- Co-authored-by: openhands <openhands@all-hands.dev>

Add jupyter-agent dataset converter

adc14cb

Co-authored-by: openhands <openhands@all-hands.dev>

neubig added the review-this Trigger the OpenHands PR review workflow label May 15, 2026 — with OpenHands AI

github-actions Bot approved these changes May 15, 2026

View reviewed changes

openhands-agent added 2 commits May 16, 2026 02:36

Merge remote-tracking branch 'origin/main' into openhands/issue-175-j…

eaeb47a

…upyter-agent-dataset

Tighten browser import fallback for jupyter-agent

58ff9b7

Use a precise optional-browser import fallback and keep the role-preservation test compatible with the branch's browsing flag.\n\nCo-authored-by: openhands <openhands@all-hands.dev>

openhands-agent added 2 commits May 17, 2026 20:00

Merge branch 'main' into openhands/issue-175-jupyter-agent-dataset

87ba552

Merge main, regenerate sample_std.json with schema_version 1.1.0

adfe9ab

Co-authored-by: openhands <openhands@all-hands.dev>

neubig mentioned this pull request May 18, 2026

Make OpenHands browser tools optional for non-web datasets #213

Merged

neubig mentioned this pull request May 18, 2026

Integrate CodeScout dataset for issue #184 #193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add jupyter-agent dataset converter (#175)#197

Add jupyter-agent dataset converter (#175)#197
neubig wants to merge 6 commits into
mainfrom
openhands/issue-175-jupyter-agent-dataset

neubig commented May 14, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

neubig commented May 16, 2026

Uh oh!

neubig commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

neubig commented May 14, 2026

Summary

Dataset Source

Files Added/Updated

Schema Mapping Summary

Design Decisions

Tests Run

Known Limitations

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

neubig commented May 16, 2026

Uh oh!

neubig commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants