Skip to content

Make OpenHands browser tools optional for non-web datasets#213

Merged
neubig merged 3 commits into
mainfrom
refactor-optional-browser
May 18, 2026
Merged

Make OpenHands browser tools optional for non-web datasets#213
neubig merged 3 commits into
mainfrom
refactor-optional-browser

Conversation

@neubig
Copy link
Copy Markdown
Contributor

@neubig neubig commented May 18, 2026

Summary

Extract the lazy-import refactor that was previously duplicated inside PRs #193 (CodeScout) and #197 (jupyter-agent) into its own change so those PRs can revert to dataset-only diffs.

Motivation

Non-web datasets (CodeScout, jupyter-agent, etc.) currently cannot run agents/openhands/std_to_sft.py on environments that do not have browsergym installed, because:

  • agents/openhands/system_prompt/tools/__init__.py does an unconditional from .browser import BrowserTool and browser.py imports browsergym at module load.
  • agents/openhands/std_to_sft.py constructs HTMLToAXTree(dataset) at module load even when the dataset has no WebObservation events.

The fix is to defer browser-related imports until they are actually needed.

Changes

  • agents/openhands/system_prompt/tools/__init__.py — Wrap the BrowserTool re-export in try/except ModuleNotFoundError. The handler only swallows the error when the missing module is browsergym (or a submodule); any other ImportError still propagates. BrowserTool is bound to None when browsergym is unavailable.
  • agents/openhands/system_prompt/system.py — Switch the top-level tool imports from the package __init__ to their direct submodules so module load no longer touches browser.py. Defer from agents.openhands.system_prompt.tools.browser import BrowserTool to inside the if codeact_enable_browsing: branch of get_tools.
  • agents/openhands/std_to_sft.py — Lazy-load scripts.html_to_axtree.HTMLToAXTree behind a get_generate_axtree() helper; it is only constructed when a WebObservation event is actually encountered. Also thread the existing --is_web CLI flag into get_system_message(codeact_enable_browsing=is_web) so non-web datasets actually get a non-web system prompt (today the default True is always used).
  • tests/test_openhands_sft_role_preservation.py — Loosen the fake get_system_message to *args, **kwargs to accept the new keyword argument.
  • tests/test_optional_browser.py (new) — Regression test (skipped when litellm is absent) that installs a sys.meta_path finder which raises ModuleNotFoundError for any browsergym* import, then asserts (a) agents.openhands.system_prompt.tools imports cleanly with BrowserTool is None and (b) get_system_message(codeact_enable_browsing=False) returns a prompt that does not advertise BrowserTool.

Validation

python -m pytest tests/ → 183 passed, 12 skipped, 4 warnings.

Evidence — end-to-end conversion of a non-web dataset without browsergym

Driver script (full source below): installs a sys.meta_path finder that raises ModuleNotFoundError for any browsergym* import, sets MY_DATASET=codeactinstruct (a non-web dataset already in the repo), imports the production agents.openhands.std_to_sft module, and calls main_with_args(line, is_web=False, api_env=None) on one record from datasets/codeactinstruct/sample_std.json.

Control run — on main (without this PR)

[sanity] browsergym blocked as expected: No module named 'browsergym'
Traceback (most recent call last):
  ...
  File ".../agents/openhands/std_to_sft.py", line 14, in <module>
    from agents.openhands.system_prompt.system import get_system_message
  File ".../agents/openhands/system_prompt/system.py", line 3, in <module>
    from agents.openhands.system_prompt.tools import (
  File ".../agents/openhands/system_prompt/tools/__init__.py", line 2, in <module>
    from .browser import BrowserTool
  File ".../agents/openhands/system_prompt/tools/browser.py", line 1, in <module>
    from browsergym.core.action.highlevel import HighLevelActionSet
ModuleNotFoundError: No module named 'browsergym'

→ The pipeline fails to import; converter is unusable.

Treatment run — on this branch (refactor-optional-browser)

[sanity] browsergym blocked as expected: No module named 'browsergym'
OK: produced SFT record with 8 conversation turns
system prompt length: 13399 chars (browser tools omitted)

→ The pipeline runs to completion, returns a well-formed SFT record (verified via json.loads + structural assertions on conversations/system), and the system prompt does not advertise BrowserTool (asserted in the driver).

Driver script

"""Driver: block any browsergym import (PEP 451 finder), then run std_to_sft on one record."""
import importlib.machinery
import sys


class _BlockBrowserGym:
    def find_spec(self, fullname, path=None, target=None):
        if fullname.startswith("browsergym"):
            return importlib.machinery.ModuleSpec(fullname, self)
        return None

    def create_module(self, spec):
        return None

    def exec_module(self, module):
        raise ModuleNotFoundError(
            f"No module named {module.__name__!r}", name=module.__name__
        )


sys.meta_path.insert(0, _BlockBrowserGym())

try:
    import browsergym  # noqa: F401
    print("UNEXPECTED: browsergym imported successfully", file=sys.stderr)
    sys.exit(99)
except ModuleNotFoundError as e:
    print(f"[sanity] browsergym blocked as expected: {e}", file=sys.stderr)

import os
os.environ["MY_DATASET"] = "codeactinstruct"

import importlib
std_to_sft = importlib.import_module("agents.openhands.std_to_sft")

import json
with open("datasets/codeactinstruct/sample_std.json") as f:
    sample = json.load(f)
record_line = json.dumps(sample[0])
out = std_to_sft.main_with_args(record_line, is_web=False, api_env=None)
if not out:
    print("FAIL: std_to_sft.main_with_args returned no output", file=sys.stderr)
    sys.exit(1)
parsed = json.loads(out)
assert "conversations" in parsed and isinstance(parsed["conversations"], list)
assert "system" in parsed
assert "BrowserTool" not in parsed["system"], "system prompt unexpectedly mentions BrowserTool"
print(f"OK: produced SFT record with {len(parsed['conversations'])} conversation turns")
print(f"system prompt length: {len(parsed['system'])} chars (browser tools omitted)")

Follow-up

Once this is merged, PRs #193 and #197 will be rebased onto main to drop their copies of these four files; their diffs should then contain only their respective datasets/ directory plus the README.md/agents/openhands/DATASETS.md catalog entries (already done — see #197 and #193).


This PR was prepared by an AI agent (OpenHands) on behalf of the user. Originating conversation context is available to the requester.

Two changes to the OpenHands agent pipeline let non-web dataset
converters run on machines that do not have browsergym installed:

1. agents/openhands/system_prompt/tools/__init__.py wraps the
   'from .browser import BrowserTool' import in try/except
   ModuleNotFoundError. The except branch only swallows the error when
   the missing module is browsergym (or a submodule); any unrelated
   ImportError still propagates. The BrowserTool name is bound to None
   when browsergym is unavailable.

2. agents/openhands/system_prompt/system.py defers the BrowserTool
   import to inside the 'if codeact_enable_browsing:' branch of
   get_tools and switches the remaining tool imports to their direct
   submodules so the module-level import no longer touches browser.py.

3. agents/openhands/std_to_sft.py lazy-loads
   scripts.html_to_axtree.HTMLToAXTree behind get_generate_axtree(); it
   is only constructed when a WebObservation event is actually seen.
   process_row also threads the existing --is_web CLI flag through to
   get_system_message(codeact_enable_browsing=is_web) so non-web
   datasets actually get a non-web system prompt.

4. tests/test_openhands_sft_role_preservation.py loosens its fake
   get_system_message to '*args, **kwargs' so the new keyword argument
   used by std_to_sft.py does not break the fake.

5. A new regression test tests/test_optional_browser.py installs a
   meta_path finder that raises ModuleNotFoundError for any
   browsergym* import, then asserts that
   agents.openhands.system_prompt.tools imports cleanly (with
   BrowserTool is None) and that
   get_system_message(codeact_enable_browsing=False) returns a prompt
   that does not advertise BrowserTool.

This change was previously duplicated inside two unrelated dataset PRs
(#193 CodeScout and #197 jupyter-agent). Lifting it into its own PR
removes the duplication and lets those PRs revert to dataset-only
diffs.

This pull request was prepared by an AI agent (OpenHands) on behalf of
the user.

Co-authored-by: openhands <openhands@all-hands.dev>
neubig pushed a commit that referenced this pull request May 18, 2026
The previous tip of this PR carried a copy of the
'make OpenHands browser tools optional' refactor in
agents/openhands/system_prompt/tools/__init__.py,
agents/openhands/system_prompt/system.py,
agents/openhands/std_to_sft.py, and the
tests/test_openhands_sft_role_preservation.py fake. The same diff was
duplicated on #193 (CodeScout). That refactor has been extracted to
PR #213 ('Make OpenHands browser tools optional for non-web datasets').

Reset those four files to their main-branch state so this PR contains
only jupyter-agent dataset changes (datasets/jupyter-agent-dataset/* +
README.md + agents/openhands/DATASETS.md catalog row). Once #213 lands
and this branch is rebased onto main, the lazy-import semantics will
reappear via that PR.

Co-authored-by: openhands <openhands@all-hands.dev>
neubig pushed a commit that referenced this pull request May 18, 2026
The previous tip of this PR carried a copy of the
'make OpenHands browser tools optional' refactor in
agents/openhands/system_prompt/tools/__init__.py,
agents/openhands/system_prompt/system.py,
agents/openhands/std_to_sft.py, and the
tests/test_openhands_sft_role_preservation.py fake. The same diff was
duplicated on #197 (jupyter-agent). That refactor has been extracted to
PR #213 ('Make OpenHands browser tools optional for non-web datasets').

Reset those four files to their main-branch state so this PR contains
only CodeScout dataset changes (datasets/codescout/* +
agents/openhands/DATASETS.md catalog row). Once #213 lands and this
branch is rebased onto main, the lazy-import semantics will reappear
via that PR.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable — The lazy-import refactor is clean and correct. One must-fix per the PR evidence policy, plus a minor test annotation issue.

This review was generated by an AI agent (OpenHands) on behalf of the user.

Comment thread tests/test_optional_browser.py Outdated
return self
return None

def load_module(self, name): # pragma: no cover - exercised via import machinery
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: load_module IS exercised by the import machinery when find_module returns self — that's exactly the path that raises ModuleNotFoundError and exercises the try/except in __init__.py. The # pragma: no cover annotation incorrectly excludes a covered (and critical) line from coverage. Remove it.

Suggested change
def load_module(self, name): # pragma: no cover - exercised via import machinery
def load_module(self, name):

generate_axtree = None


def get_generate_axtree():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important (PR description): The PR description's Validation section only shows pytest output. Per the project's evidence policy, unit tests alone do not count as proof that the change works. Please add an Evidence section showing an actual end-to-end invocation — e.g. running std_to_sft.py on a non-web dataset in an environment without browsergym installed, with the resulting output pasted. A link to the originating OpenHands conversation (https://app.all-hands.dev/conversations/{id}) would also satisfy this requirement.

The previous version of tests/test_optional_browser.py reloaded
agents.openhands.system_prompt.tools by monkeypatching sys.meta_path
in-process. That fails in CI because the workflow's requirements.txt
does not install litellm, and reloading the tools package triggers
litellm imports from each tool module's top level (bash.py, finish.py,
etc.).

Two changes:

1. Run the import under a subprocess so the meta_path finder is the
   only entry on the fresh interpreter's import path. This avoids
   cross-test contamination with any tools modules that may already be
   cached in the parent's sys.modules.

2. Add a pytest.importorskip('litellm') guard. The optional-browser
   path is only reachable when litellm is installed (the tool modules
   import it unconditionally); in environments without litellm the
   import chain is broken before the BrowserTool try/except is even
   reached, so a regression test there would always fail for an
   unrelated reason.

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig
Copy link
Copy Markdown
Contributor Author

neubig commented May 18, 2026

Addressed both review comments:

1. # pragma: no cover annotation on _BlockBrowserGym.load_module — the suggestion was based on an intermediate state; the pushed version of tests/test_optional_browser.py does not contain that annotation. The current outer code path actually doesn't have a class-level load_module at all (the meta-path finder lives inside the subprocess preamble string), so coverage is not a concern. Verified with grep -n pragma tests/test_optional_browser.py → no matches.

2. Evidence section per project policy — added a full ## Evidence block to the PR description with both a control run (on main, fails with ModuleNotFoundError: No module named 'browsergym' at the import chain) and a treatment run (on this branch, produces a valid 8-turn SFT record with system prompt length 13,399 chars and no BrowserTool mention). The driver script is reproducible and pasted in full so a reviewer can re-run it.

Please re-review.

This comment was posted by an AI agent (OpenHands) on behalf of the user.

@neubig neubig added the review-this Trigger the OpenHands PR review workflow label May 18, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable — The lazy-import refactor is clean and correct. The unresolved evidence thread on std_to_sft.py:28 still needs to be addressed before merge: the Validation section only shows pytest output, which does not satisfy the project's evidence policy. Please add an Evidence section with an actual end-to-end invocation (e.g. running std_to_sft.py --is_web=no through the pipeline on a non-web dataset) and its output.

One minor new note on the test blocker below.

This review was generated by an AI agent (OpenHands) on behalf of the user.

Comment thread tests/test_optional_browser.py Outdated
Comment on lines +47 to +53
def find_module(self, name, path=None):
if name.startswith("browsergym"):
return self
return None

def load_module(self, name):
raise ModuleNotFoundError(f"No module named {name!r}", name=name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: _BlockBrowserGym uses the deprecated PEP 302 find_module/load_module interface (deprecated since Python 3.4). It still works in Python 3.12, but the modern find_spec/exec_module protocol is forward-compatible with future Python versions:

def find_spec(self, fullname, path, target=None):
    if fullname.startswith("browsergym"):
        import importlib.machinery
        return importlib.machinery.ModuleSpec(fullname, self)
    return None

def create_module(self, spec):
    return None

def exec_module(self, module):
    raise ModuleNotFoundError(f"No module named {module.__name__!r}", name=module.__name__)

Address inline review on #213: replace the legacy PEP 302
find_module/load_module pair with the modern PEP 451
find_spec/create_module/exec_module trio. The legacy interface is
deprecated since Python 3.4 and may be removed in a future release; the
new interface is what the import machinery has used internally since
3.4 and is forward-compatible. Also moves the sanity check that the
finder fires into the test body and updates the module docstring to
reference the new protocol.

The test still passes locally with the same exit codes and assertion
output; behavior is unchanged.

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig
Copy link
Copy Markdown
Contributor Author

neubig commented May 18, 2026

Addressed the new review:

PEP 451 modernization (tests/test_optional_browser.py L53) — replaced the legacy PEP 302 find_module/load_module pair with the modern PEP 451 find_spec/create_module/exec_module trio in both the test file and the driver-script snippet in the PR description. Updated docstring to reference the new protocol. Test still passes locally with 2/2 green.

Evidence (PR description) — note that the ## Evidence section was added in the previous round; the prior re-review bot may have been looking at a cached version. The current PR body now has ## Summary / ## Motivation / ## Changes / ## Validation / ## Evidence — end-to-end conversion of a non-web dataset without browsergym (### Control / ### Treatment / ### Driver script) / ## Follow-up. The control run shows a real ModuleNotFoundError against main; the treatment run shows a real 8-turn SFT record produced from datasets/codeactinstruct/sample_std.json.

Re-running the driver locally with the updated PEP 451 finder produces the same output:

[sanity] browsergym blocked as expected: No module named 'browsergym'
OK: produced SFT record with 8 conversation turns
system prompt length: 13399 chars (browser tools omitted)

Re-applying review-this label to trigger another review.

This comment was posted by an AI agent (OpenHands) on behalf of the user.

@neubig
Copy link
Copy Markdown
Contributor Author

neubig commented May 18, 2026

Addressed both points from the latest review:

  1. PEP 451 modernization (tests/test_optional_browser.py line 53) — replaced legacy find_module/load_module with find_spec/create_module/exec_module in both the test file (commit 5bf4e33) and the driver snippet in the PR description. Test still green locally (2/2).
  2. Evidence section — confirmed it's now in the PR description (see ## Evidence — end-to-end conversion of a non-web dataset without browsergym with Control/Treatment/Driver subsections). Earlier re-review may have hit a cached body.

Re-running the modernized driver locally:

[sanity] browsergym blocked as expected: No module named 'browsergym'
OK: produced SFT record with 8 conversation turns
system prompt length: 13399 chars (browser tools omitted)

This comment was posted by an AI agent (OpenHands) on behalf of the user.

@neubig neubig added review-this Trigger the OpenHands PR review workflow and removed review-this Trigger the OpenHands PR review workflow labels May 18, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste — All three previously unresolved threads are resolved in the current code:

  • PEP 451 protocol: test_optional_browser.py now uses find_spec/create_module/exec_module throughout; the deprecated find_module/load_module interface is gone.
  • # pragma: no cover: No such annotation exists in the current file.
  • Evidence section: The PR description now includes a full end-to-end Evidence section with a control run (showing the pre-fix import failure) and a treatment run (showing the converter producing a valid SFT record), satisfying the project's evidence policy.

The lazy-import refactor is clean and correct. No new issues found.

This review was generated by an AI agent (OpenHands) on behalf of the user.

@neubig neubig merged commit 40b0489 into main May 18, 2026
6 of 7 checks passed
@neubig neubig deleted the refactor-optional-browser branch May 18, 2026 02:31
neubig pushed a commit that referenced this pull request May 18, 2026
Brings in the shared OpenHands SFT converter changes (lazy-load
browser tools, preserve function_call/observation roles) from
PR #213, which this branch previously depended on.

Co-authored-by: openhands <openhands@all-hands.dev>
neubig added a commit that referenced this pull request May 23, 2026
* Add jupyter-agent dataset converter

Co-authored-by: openhands <openhands@all-hands.dev>

* Tighten browser import fallback for jupyter-agent

Use a precise optional-browser import fallback and keep the role-preservation test compatible with the branch's browsing flag.\n\nCo-authored-by: openhands <openhands@all-hands.dev>

* Merge main, regenerate sample_std.json with schema_version 1.1.0

Co-authored-by: openhands <openhands@all-hands.dev>

* Drop shared lazy-import refactor; depend on #213

The previous tip of this PR carried a copy of the
'make OpenHands browser tools optional' refactor in
agents/openhands/system_prompt/tools/__init__.py,
agents/openhands/system_prompt/system.py,
agents/openhands/std_to_sft.py, and the
tests/test_openhands_sft_role_preservation.py fake. The same diff was
duplicated on #193 (CodeScout). That refactor has been extracted to
PR #213 ('Make OpenHands browser tools optional for non-web datasets').

Reset those four files to their main-branch state so this PR contains
only jupyter-agent dataset changes (datasets/jupyter-agent-dataset/* +
README.md + agents/openhands/DATASETS.md catalog row). Once #213 lands
and this branch is rebased onto main, the lazy-import semantics will
reappear via that PR.

Co-authored-by: openhands <openhands@all-hands.dev>

---------

Co-authored-by: openhands <openhands@all-hands.dev>
neubig added a commit that referenced this pull request May 23, 2026
* Integrate CodeScout dataset

Co-authored-by: openhands <openhands@all-hands.dev>

* Tighten optional browser import handling

Only suppress missing browsergym dependencies when browser tools are unavailable, and keep the role-preservation test compatible with the branch's browsing flag.\n\nCo-authored-by: openhands <openhands@all-hands.dev>

* Merge main, regenerate sample_std.json with schema_version 1.1.0

Co-authored-by: openhands <openhands@all-hands.dev>

* Drop shared lazy-import refactor; depend on #213

The previous tip of this PR carried a copy of the
'make OpenHands browser tools optional' refactor in
agents/openhands/system_prompt/tools/__init__.py,
agents/openhands/system_prompt/system.py,
agents/openhands/std_to_sft.py, and the
tests/test_openhands_sft_role_preservation.py fake. The same diff was
duplicated on #197 (jupyter-agent). That refactor has been extracted to
PR #213 ('Make OpenHands browser tools optional for non-web datasets').

Reset those four files to their main-branch state so this PR contains
only CodeScout dataset changes (datasets/codescout/* +
agents/openhands/DATASETS.md catalog row). Once #213 lands and this
branch is rebased onto main, the lazy-import semantics will reappear
via that PR.

Co-authored-by: openhands <openhands@all-hands.dev>

* codescout: emit reward on final action, drop tools/reward_dict from details

Address review feedback on PR #193:

1. The per-trajectory reward signal is now attached to the agent's final
   action (the localization_finish MessageAction) as content[-1].reward
   using the upstream multilevel_localization_f1_reward. CodeScout
   localization is single-step from a reward perspective: the agent
   inspects the repo, then emits one localization_finish action; that
   action's reward is the F1 score across file/module/entity
   granularity. The component sub-rewards (file/module/entity/multiturn)
   are not propagated since they aren't the canonical reward signal the
   standardized schema needs to carry.

   Care is taken to attach the reward to the last Action (walking content
   from the end), not to the last content item, because some raw rollouts
   include a trailing tool-response observation that echoes the locations.
   The reward semantically belongs on the agent's action that produced
   the locations, not on the environment's echo.

2. details no longer contains 'tools' (the per-instance OpenAI tool
   schema is dataset-wide infrastructure, not a per-trajectory detail)
   or 'reward_dict' (subsumed by the per-step reward field above). The
   trajectory's details dict now contains only fields that vary by
   trajectory: source_dataset, source_config, source_split, row_id, and
   (when present) instance_id, step, rollout_number.

3. Regenerated sample_std.json and sample_sft.json. Updated README to
   document the reward mapping.

Co-authored-by: openhands <openhands@all-hands.dev>

* codescout: drop unnecessary HF datasets import workaround

The repo root has no datasets/__init__.py, so 'from datasets import
load_dataset' resolves to the installed HuggingFace datasets package
even with the repo root on PYTHONPATH (namespace-package fallthrough).
Every other dataset extractor uses the plain form; codescout was
diverging unnecessarily.

Co-authored-by: openhands <openhands@all-hands.dev>

* codescout: fix tool-call/observation pairing when id is None

The storage side used `tool_call_id or f"tool_{index}"` as the
fallback key while the lookup used `tool_call.id or ""`. When an
upstream message lacked a tool_call_id, storage created a unique
positional key while the lookup queried "", silently missing the
match. The observation then fell through to the unmatched-fallback
loop and appeared after all paired action/observation pairs instead
of immediately after its producing action.

Aligning both sides on "" lets a single None-id pair still pair
in place (the only case observed). The committed CodeScout samples
always carry tool_call ids, so sample_std.json regenerates byte-
identically; this only affects future or pathological data.

Verified with a constructed None-id record: observation now appears
immediately after its code_action, not at the end of the trajectory.

Co-authored-by: openhands <openhands@all-hands.dev>

* codescout: drop trailing tool-response echo after localization_finish

Real OpenHands sessions terminate immediately on finish, but the raw
rollouts include a tool-response message after localization_finish
that just repeats the submitted locations. Dropping that echo aligns
the standardized trajectory with real agent behavior and prevents
SFT generation from emitting a phantom observation turn after the
finish action.

The locations are already captured in the finish MessageAction
content, so no information is lost.

Updated README to document the design decision. Regenerated
sample_std.json and sample_sft.json; sample_std.json shrinks by one
TextObservation per trajectory that had a finish-echo pair (2 of 3
sample trajectories).

Co-authored-by: openhands <openhands@all-hands.dev>

* codescout: document orphaned-tool-message defensive branch

The 'tool' message branch in the top-level loop only fires for
malformed inputs where a tool message is not immediately preceded
by an assistant message with tool_calls. Well-formed CodeScout
rollouts pre-consume tool messages via tool_observations() in the
assistant branch and skip past them with index = next_index.

Added a comment explaining what cases this branch is meant to
cover so future maintainers don't think it's dead code.

Co-authored-by: openhands <openhands@all-hands.dev>

* codescout: warn on tool_call_id key collision

If multiple tool messages share the same tool_call_id (or all share
the None fallback key ""), the OrderedDict storage loses all but
the last and the lookup then incorrectly pairs the surviving
observation with every same-keyed tool_call. Not observed in either
CodeScout source, but the warning ensures the latent bug surfaces
loudly if it ever appears in the full 58.9K extraction.

Co-authored-by: openhands <openhands@all-hands.dev>

* codescout: allow null "arguments" in tool_call.function

Some raw rows have `"arguments": null` instead of an empty string,
which previously raised a Pydantic ValidationError and silently
dropped those trajectories from the full extraction. The downstream
`parse_arguments()` already treats None as an empty arguments object,
so making the field Optional[str] = None is the minimal fix.

Co-authored-by: openhands <openhands@all-hands.dev>

---------

Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-this Trigger the OpenHands PR review workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants