Skip to content

Add ClawBench to GUI Agent benchmarks#1

Open
reacher-z wants to merge 1 commit into
dataanswer:mainfrom
reacher-z:add-clawbench
Open

Add ClawBench to GUI Agent benchmarks#1
reacher-z wants to merge 1 commit into
dataanswer:mainfrom
reacher-z:add-clawbench

Conversation

@reacher-z
Copy link
Copy Markdown

Adds ClawBench to the GUI Agent table.

ClawBench evaluates browser agents on live production websites (real Uber Eats, Indeed, Craigslist, etc., not Docker mocks). Two-stage scoring: a deterministic HTTP-request interception check at the per-task URL/method schema, then an LLM judge on the intercepted payload — so an agent that hits the right endpoint but submits the wrong thing fails.

Sits next to WebArena / Mind2Web / WorkArena in the table — distinguished by live (not Docker-hosted) sites + payload-correctness scoring (not just element/action match).

Affiliation disclosure: I'm one of the maintainers; happy to adjust columns/copy or drop entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant