Add v0.1.0 Python client SDK#1
Conversation
Set up the initial Python tooling for the xAgent SDK client:
- pyproject.toml: hatchling build, py>=3.11, httpx + pydantic deps,
ruff + mypy + pytest configuration, PEP 735 dev dependency group.
- .pre-commit-config.yaml: ruff-check / ruff-format / mypy / codespell
hooks pinned to the same versions used by the xagent backend repo.
- .github/workflows/ci.yml: pre-commit gate plus pytest matrix on
Python 3.11 and 3.14.
- src/xagent_sdk/{__init__,_version}.py: empty package exposing
__version__ = "0.1.0" so importers and tooling have an anchor.
- tests/__init__.py: package marker so future tests are discovered.
- README.md: minimal stub describing project status and dev workflow.
- .gitignore: ignore uv.lock and .claude/.
Introduce HTTPClient in xagent_sdk._http, the internal transport layer that future endpoint methods will share. Responsibilities are scoped narrowly: configure an httpx.Client with the Bearer header, a User-Agent identifying the SDK version, base_url normalization, 30s/10s timeouts, a 10-connection pool, and context-manager close. The class returns raw httpx.Response objects. Status-code-to-exception mapping and response parsing are deferred to subsequent commits so this layer stays free of v1 contract assumptions. A transport= parameter is exposed for test injection (httpx.MockTransport) without yet adding tests; test coverage lands in a later commit.
Define the SDK's exception layer mirroring the backend's stable error
codes from V1 envelope responses.
XAgentError is the base class with code/message/http_status. Six
subclasses map 1:1 to the backend's stable codes (invalid_api_key,
agent_not_found, task_not_found, task_busy, rate_limited,
internal_error). Three SDK-coined subclasses cover cases the server
does not model: InvalidInput (from FastAPI 422 {"detail": [...]}),
XAgentTransportError (network or timeout below HTTP), TaskTimeout
(reserved for wait/run local deadlines, used in a later commit).
errors.from_response() maps an httpx.Response to the right subclass.
Malformed or non-V1 bodies fall back to InternalError. HTTPClient is
extended to wrap httpx.HTTPError into XAgentTransportError so all
errors raised from this SDK descend from XAgentError.
All public types are re-exported from xagent_sdk so users can write
`from xagent_sdk import TaskBusy` etc.
Materialize the v1 success-path response types as frozen dataclasses with module-level pydantic TypeAdapter parsers. Two enums (TaskStatus, StepType) capture the closed status / type vocabularies the backend ships. Five dataclasses (MeResponse, CreateTaskResult, AppendResult, TaskInfo, Step) cover the five success responses; AppendResult carries accepted_at (not created_at) to match the backend wire shape, and Step.id is str (not int) carrying a "<type>:<seq>" prefix the backend exposes for client-side de-dupe. TypeAdapter handles ISO datetime parsing, enum coercion, and Optional handling without bespoke conversion code. Pydantic is kept strictly internal: the public surface is plain @DataClass(frozen=True) so the SDK does not pin downstream apps to a particular pydantic version. All seven new types are re-exported from xagent_sdk. The five parsers are module-private; endpoint methods in a later commit will call them directly.
Wire the v1 endpoints into the public SDK surface.
XAgentClient is the user-facing class. It resolves api_key and base_url
in the order: explicit kwarg -> environment variable
(XAGENT_API_KEY / XAGENT_BASE_URL) -> raise ValueError. A future
release will add a hardcoded production base_url default once the
xAgent team finalizes the prod endpoint; the resolution order is
designed to keep that addition backward-compatible.
The client owns one HTTPClient (connection pool), exposes me() for the
identity probe, and dispatches a single _request helper that maps
4xx/5xx responses to the right XAgentError subclass before returning
to callers.
TasksAPI (mounted as client.tasks) provides four methods on top of
that helper: create, append, get, steps. Both write methods take
message: str and wrap it as {"role": "user", "content": ...}
internally; the v1 contract pins role to "user" so exposing the field
would only mislead.
Docstrings call out thread-safety, the fork() caveat, the
transport= power-user knob, and that me() does not cache.
XAgentClient is re-exported from xagent_sdk. Polling helpers (wait,
run) and tests land in subsequent commits.
Add the two helpers that close the loop on single-turn task usage:
- TasksAPI.wait(task_id, timeout, poll_interval) polls GET
/v1/chat/tasks/{id} until the task reaches a terminal status and
returns the final TaskInfo. Raises TaskTimeout when the wall-clock
deadline elapses. Other errors from the underlying get() propagate
unchanged -- retry semantics are the caller's business.
- TasksAPI.run(agent_id, message, timeout, poll_interval, metadata)
bundles create + wait + steps into one call and returns a
RunResult carrying the final snapshot plus the full step timeline.
Equivalent to the lower-level trio with a single deadline; use the
trio directly for multi-turn flows.
RunResult is a frozen dataclass with .output / .status property
shortcuts over the embedded TaskInfo, re-exported from xagent_sdk.
Terminal states mirror the backend's own definition
(v1/tasks.py:170): only COMPLETED and FAILED. PAUSED is non-terminal
because the backend allows append() onto a paused task, which flips
it back to RUNNING; a wait()ing observer should see that transition
rather than return early with PAUSED.
The sleep between polls is capped to the time remaining before the
deadline so an unusually long poll_interval cannot overshoot the
caller's requested wall-clock timeout.
The wait/run defaults (timeout=120s, poll_interval=1.0s) match the
hand-off document. TaskTimeout, previously reserved by the exception
hierarchy, is now raised by wait() when its deadline elapses.
Add tests/unit/ with full coverage of the v0.1.0 surface: error envelope parsing and exception classes (test_errors.py), pydantic dataclass parsing and frozen behavior (test_types.py), the HTTPClient wrapper including transport-error wrapping (test_http.py), XAgentClient construction, env-var fallback, and the me() probe (test_client.py), and the TasksAPI write+read endpoints plus the wait/run polling helpers (test_tasks.py). 68 unit tests total, hermetic, ~0.5s. Tests use httpx.MockTransport rather than pytest-httpx so they exercise the SDK's own transport= injection point. A clean_xagent_env autouse fixture strips XAGENT_* environment variables between tests to prevent ambient-config bleed. Polling-helper tests check terminal-status membership (only COMPLETED and FAILED, mirroring backend), the PAUSED non-terminal contract, last-observed-status reporting in TaskTimeout, propagation of underlying errors, and the sleep cap that prevents poll_interval from overshooting the caller's wall-clock timeout. Add tests/e2e/ scaffolding with a single smoke test marked @pytest.mark.e2e. The fixture skips when XAGENT_API_KEY or XAGENT_BASE_URL is unset, so CI naturally skips e2e while local developers can run it with both env vars set. pyproject.toml registers the e2e marker and adds `-m 'not e2e'` to addopts so the default `pytest` invocation never tries to reach a real backend. Remove pytest-httpx from the dev dependency group and the mypy pre-commit hook since the suite uses MockTransport directly; add it back if a future test needs declarative HTTP mocking.
Replace the bootstrap stub with a full README aimed at v0.1.0 users: install (with a git-tag pin and Python 3.11+ note), quick-start one-liner, a four-example walkthrough (identity probe, single-turn run with tool_call step inspection, explicit multi-turn append, error handling), an API reference table, configuration knobs, status semantics including the PAUSED non-terminal contract, the version policy (SemVer in 0.x means minor bumps may break; never install from @main), and a development section with the NO_PROXY=localhost gotcha that bites users running e2e through a system proxy. All code examples use prompts and outputs that have been verified end-to-end against a real xAgent backend so the docs do not drift from actual SDK behavior.
There was a problem hiding this comment.
Code Review
This pull request implements the initial version of the xAgent Python SDK, providing a synchronous client for interacting with the xAgent v1 HTTP API. Key features include task lifecycle management via the TasksAPI, a robust exception hierarchy, and Pydantic-based data models. Review feedback identifies a contradiction in documentation regarding terminal task states, suggests adding validation for polling intervals, recommends improved deadline tracking in the run method to ensure strict timeout adherence, and points out an outdated comment in the error definitions.
| """Lifecycle states a task can hold. | ||
|
|
||
| The full set the SDK may observe is fixed at 5 values; ``run()`` and | ||
| ``wait()`` treat ``COMPLETED``, ``FAILED``, and ``PAUSED`` as terminal. |
There was a problem hiding this comment.
There was a problem hiding this comment.
Fixed in 3145b06 — the TaskStatus docstring now matches _TERMINAL_STATUSES (only COMPLETED and FAILED) and explains why PAUSED stays non-terminal (backend allows append() onto a paused task, which transitions it back to RUNNING).
| Raises: | ||
| TaskTimeout: when ``timeout`` elapses without a terminal state. | ||
| """ | ||
| deadline = time.monotonic() + timeout |
There was a problem hiding this comment.
There was a problem hiding this comment.
Fixed in 3df6de6 — wait() now rejects negative timeout and negative poll_interval up front with ValueError. Zero is still accepted for both (timeout=0 polls exactly once; poll_interval=0 tight-loops via time.sleep(0)). Two unit tests added (test_negative_timeout_rejected, test_negative_poll_interval_rejected).
| created = self.create(agent_id=agent_id, message=message, metadata=metadata) | ||
| info = self.wait(created.task_id, timeout=timeout, poll_interval=poll_interval) |
There was a problem hiding this comment.
The run() method docstring mentions a "single deadline," but the implementation applies the timeout only to the wait() phase. The time consumed by the create() call is not subtracted from the timeout passed to wait(). To strictly adhere to a single deadline, the elapsed time should be tracked.
| created = self.create(agent_id=agent_id, message=message, metadata=metadata) | |
| info = self.wait(created.task_id, timeout=timeout, poll_interval=poll_interval) | |
| start_time = time.monotonic() | |
| created = self.create(agent_id=agent_id, message=message, metadata=metadata) | |
| remaining = max(0.0, timeout - (time.monotonic() - start_time)) | |
| info = self.wait(created.task_id, timeout=remaining, poll_interval=poll_interval) |
There was a problem hiding this comment.
Fixed in 3df6de6 — run() now captures time.monotonic() before create(), then forwards max(0.0, timeout - elapsed) to wait() so the combined create + wait wall-clock is bounded by the caller's timeout. steps() still runs after the terminal state is observed; the docstring now calls it out separately as a small additional cheap GET. Test test_shared_deadline injects a create_delay=0.1 and asserts elapsed < timeout + create_delay/2, which strictly distinguishes the new bound from the old behavior.
| Not yet raised by any code path; reserved for the polling layer added | ||
| in a later commit. |
There was a problem hiding this comment.
Fixed in 3145b06 — the TaskTimeout docstring no longer claims it is unraised. It now describes when it actually fires (TasksAPI.wait, and indirectly TasksAPI.run) and why http_status is None (the deadline is purely client-side, no HTTP exchange surfaces the failure).
…havior Two docstring drifts surfaced in review: - TaskStatus said wait() treats COMPLETED, FAILED, and PAUSED as terminal, but the polling helper added in 25c49f4 only treats COMPLETED and FAILED that way (PAUSED is non-terminal so callers observe append()-driven resume transitions). Update the enum docstring to match. - TaskTimeout still carried "Not yet raised by any code path; reserved for the polling layer added in a later commit." That reservation lapsed once 25c49f4 wired wait() to raise it. Replace the comment with a description of when it actually fires and why http_status is None. No behavior change; docstring-only fix.
…eate()
Two issues surfaced in review:
1. wait() did not validate timeout / poll_interval. A negative
poll_interval crashes inside time.sleep() with a ValueError
originating in stdlib, exposing the SDK's internals rather than
failing at the boundary with a clear message. Add explicit checks
that reject negative values up front; zero is still accepted for
both (timeout=0 polls exactly once; poll_interval=0 tight-loops
while yielding the GIL).
2. run()'s docstring promised a "single deadline" but the
implementation passed the full timeout to wait() without
subtracting the time spent in create(). For a slow create(),
the wall-clock could exceed the user's requested timeout. Now
run() captures monotonic time before create(), subtracts the
elapsed from timeout, and forwards the remaining budget to
wait(). steps() runs after the terminal state is observed and
is documented separately as additional cheap latency.
Tests:
- test_negative_timeout_rejected / test_negative_poll_interval_rejected
on TestWait
- test_shared_deadline on TestRun, injecting create_delay=0.1 and
asserting elapsed < timeout + create_delay/2 so the assertion
strictly distinguishes the new behavior from the old
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request establishes the initial codebase for the xAgent Python SDK, featuring a synchronous XAgentClient for the v1 HTTP API. Key components include task lifecycle management, a robust error mapping system, and Pydantic models for type safety. The PR also sets up development tooling with pre-commit hooks and an extensive test suite. Reviewers recommended enhancing the readability of validation error messages and increasing the resilience of response parsing by adding explicit type checks for incoming data.
| if isinstance(detail, list): | ||
| return "; ".join(str(item) for item in detail) |
There was a problem hiding this comment.
The current implementation of _format_422_detail for list inputs will produce string representations of dictionary objects (e.g., "{'msg': 'field required'}") which are not very user-friendly. Since FastAPI validation errors typically provide a msg field within each detail dictionary, it is better to extract that field specifically when available.
| if isinstance(detail, list): | |
| return "; ".join(str(item) for item in detail) | |
| if isinstance(detail, list): | |
| return "; ".join(str(i.get("msg", i) if isinstance(i, dict) else i) for i in detail) |
There was a problem hiding this comment.
Fixed in 396e334 — went one step beyond the suggestion: extract loc as a dotted path together with msg, since loc is the part that tells the caller which field failed. A standard FastAPI 422 entry now formats as body.message.content: field required instead of field required (or the raw dict repr). Falls back to msg alone when loc is missing, and to str(item) when neither is a string. Three tests cover the three branches.
| ``client.tasks.steps(task_id)`` so those wrapper fields are redundant | ||
| and dropped here. | ||
| """ | ||
| return _STEP_LIST_ADAPTER.validate_python(data.get("steps", [])) |
There was a problem hiding this comment.
This line assumes that data is always a dictionary. However, resp.json() can return other types such as a list or None if the server response is unexpected. Accessing .get() on a non-dictionary object will raise an AttributeError. Adding a type check ensures the SDK handles malformed or unexpected successful responses gracefully.
| return _STEP_LIST_ADAPTER.validate_python(data.get("steps", [])) | |
| steps_data = data.get("steps", []) if isinstance(data, dict) else [] | |
| return _STEP_LIST_ADAPTER.validate_python(steps_data) |
There was a problem hiding this comment.
Fixed in 396e334 — applied the suggestion plus widened the signature from dict[str, Any] to Any so the runtime check is no longer fighting the static type. Non-dict input (None, list, str, int, bool) now returns [] instead of crashing in dict.get. A parametrized test (test_non_dict_returns_empty) covers all five non-dict types.
Second-round review surfaced two more graceful-degradation gaps:
1. _format_422_detail joined str() of each list entry, producing the
raw dict repr (e.g. "{'loc': ['body', 'message', 'content'],
'msg': 'field required', 'type': 'missing'}") rather than the
human-readable loc.msg form FastAPI 422 entries support. Add a
_format_422_item helper that emits "body.message.content: field
required" when both fields are present, falls back to msg alone
when loc is missing, and to str(item) when neither field is a
string. Preserves more information than the suggested msg-only
fix because loc tells the caller which field failed, not just
what failed.
2. _parse_steps called .get("steps", []) on its argument, which
raises AttributeError when the server or an upstream proxy
returns a non-dict body (list, null, etc.). Widen the signature
to Any and short-circuit to [] when the input is not a dict, so
malformed responses degrade gracefully instead of leaking a
builtin exception that does not descend from XAgentError.
Tests:
- test_422_detail_list now asserts the formatted message
("field required" for msg-only input)
- test_422_detail_list_with_loc covers the new loc.msg path
- test_422_detail_list_of_strings covers raw-string entries
- test_non_dict_returns_empty (parameterized over None, list,
str, int, bool) verifies _parse_steps no longer crashes on
malformed responses
The upstream repository was renamed from xagent-sdk-python to xagent-sdk and will host clients for multiple languages, each in its own top-level directory. This commit moves the existing Python SDK into python/ and adds the monorepo-wide bits (top-level README, single pre-commit config, language-specific CI workflow) so other language clients can be added later without re-organizing. Layout changes: - All existing SDK files (src, tests, pyproject.toml, README.md) are now under python/. - A new top-level README.md serves as the monorepo navigator and points readers at python/README.md for SDK usage. - .pre-commit-config.yaml moved back to repo root with file filters scoped to ^python/. The mypy hook is converted to a local hook that cd's into python/ before invoking `uv run mypy --package xagent_sdk`, which lets python/pyproject.toml keep mypy_path = "src" and works both via pre-commit and via direct invocation from python/. - .github/workflows/ci.yml renamed to python-ci.yml with paths filters limiting the workflow to python/ + shared/ + the config files, and working-directory: python on the dependency/test steps. Install command in python/README.md updated to reference the new repo URL with `#subdirectory=python` so pip finds pyproject.toml in the right place. No SDK behavior change. All 78 unit tests still pass; e2e tests are unaffected.
Set up shared/fixtures/v1/ as the single source of truth for the v1
HTTP wire contract across all language clients. Each JSON file holds
the raw body the server emits (no wrapping, no metadata) so a future
TypeScript / JavaScript client can drive its tests off the same files
that the Python client uses, and a wire-shape change can be made in
one place rather than once per language.
Fixtures included:
- responses/: me, create_task, append_task, task_info_completed,
steps_full (covers all four Step types in one body).
- errors/: the six stable V1 envelope codes plus validation_422
(FastAPI's {"detail": [...]} shape). Status codes are documented in
shared/README.md rather than embedded in the JSON, because they are
implicit per error code in the wire contract.
Python integration:
- python/tests/unit/_fixtures.py provides `response(name)` and
`error_envelope(name)` helpers that resolve paths relative to the
repo root via Path(__file__).resolve().parents[3].
- python/tests/unit/test_errors.py::TestFromResponseStableCodes now
loads each envelope from a fixture rather than inlining a JSON
literal. The parametrize names the fixture instead of the code,
which is the same string by construction.
The remaining Python tests (frozen-dataclass behavior, env-var
fallback, polling deadlines) stay inline because they exercise
Python-specific behavior, not the wire contract.
78 unit tests still pass, hermetic, <1s. No SDK behavior change.
A long-standing backend bug (POST /v1/chat/tasks blocking until the LLM call completes) slipped past our earlier e2e tests because they only checked final output, not request timing. The bug surfaced when LLM latency crept past the SDK's 30s per-request HTTP timeout and the smoke test started failing with a generic "timed out" message that did not point at the contract violation. This commit adds test_create_is_async to enforce the timing contract directly: - A new patient_client fixture provides a 60s-timeout XAgentClient so the test observes POST's actual latency even when the backend is synchronous, rather than surfacing as a transport timeout. - test_create_is_async measures monotonic time around tasks.create() and asserts (a) the response status is PENDING and (b) elapsed is under 5s. A correct backend implementation returns in well under a second; the 5s bound leaves slack for cold start and slow networks. When this test fires, the failure message includes the measured elapsed seconds and a one-paragraph explanation of the async-polling contract it is checking, so the next person diagnosing the failure sees the contract violation rather than a generic transport timeout. Verified against the currently-broken backend: the test fails with "POST took 37.26s; v1 contract requires async return (typically <=1s in practice)." That message is the failure surface we wanted.
Summary
Phase 1 Python client SDK for the v1 HTTP API shipped in xagent PR #384.
Eight commits land an end-to-end usable 0.1.0 release: client class,
endpoint methods, polling helpers, full exception hierarchy, dataclass
response models, 68 unit tests, e2e test scaffolding, and a complete
README.
Public surface
XAgentClientwith env-var fallback (XAGENT_API_KEY/XAGENT_BASE_URL)client.me()identity probeclient.tasks.create / append / get / steps / wait / runRunResultbundles the finalTaskInfoplus the step timelineInvalidAPIKey,AgentNotFound,TaskNotFound,TaskBusy,RateLimited,InternalError) plus three SDK-coined (InvalidInput,XAgentTransportError,TaskTimeout)Design notes
base_urlis required (kwarg or env); no production default ishard-coded while the prod endpoint is being finalized. A future
release will add it as a non-breaking default.
wait()/run()terminal states mirror backendv1/tasks.py:170:only
COMPLETEDandFAILED.PAUSEDstays non-terminal somulti-process workflows (one caller polls while another appends a
resume turn) can observe the transition rather than returning early.
deadline so an unusually long
poll_intervalcannot overshoot thecaller's requested wall-clock timeout.
TypeAdapter, but the public surface stays@dataclass(frozen=True)so downstream apps are not pinned to a specific pydantic version.
httpx.HTTPErrorintoXAgentTransportError, so every exception escaping the SDK descendsfrom
XAgentError.Test plan
uv sync --group dev && uv run pre-commit installuv run pre-commit run --all-files— ruff, mypy strict,codespell all pass
uv run pytest— 68 unit tests, hermetic, sub-seconduv run pytest -m e2eskips cleanly whenXAGENT_API_KEYorXAGENT_BASE_URLis unsetuv run pytest -m e2epasses;NO_PROXY=localhost,127.0.0.1may be needed on macOS / corporate networks
.github/workflows/ci.yml) runs pre-commit and thepytest matrix on Python 3.11 and 3.14
Notes for maintainers
LICENSEand.gitignore; this PRdrops in the full SDK layout under
src/xagent_sdk/plus tests, CI,and docs.
pyproject.tomlpins the version to0.1.0. Tagging a release asv0.1.0after merge will activate the README install command(
pip install "xagent-sdk @ git+https://github.com/xorbitsai/xagent-sdk-python@v0.1.0").v0.12.3, mypyv1.19.0, pre-commit hooksv5.0.0, codespellv2.4.1) match the xagent backend repo socross-repo lint behavior stays consistent.