Skip to content

feat(channels): add extensible channel abstraction with generic webho…#327

Open
danielmillerp wants to merge 1 commit into
mainfrom
dm/agentex-channels-webhook
Open

feat(channels): add extensible channel abstraction with generic webho…#327
danielmillerp wants to merge 1 commit into
mainfrom
dm/agentex-channels-webhook

Conversation

@danielmillerp

@danielmillerp danielmillerp commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

…ok ingress

Introduce a channel layer that normalizes external surfaces into agent turns, so platforms beyond the chat UI can drive agents. The agent-driving core is channel-agnostic; a new channel (e.g. Slack) implements one interface without touching the router.

  • domain/channels/base.py: InboundMessage, Channel ABC (authenticate + to_inbound), ChannelBinding (route -> agent + inline params), timing-safe shared-secret and HMAC-SHA256 verification helpers.
  • domain/channels/router.py: ChannelRouter.dispatch() — InboundMessage ->
    task/create (get-or-create on a per-conversation session key) + event/send via the in-process ACP use case.
  • domain/channels/webhook.py: WebhookChannel — generic HTTP ingress with a per-route secret. Accepts Authorization: Bearer / x-openclaw-webhook-secret, or an X-Hub-Signature-256 HMAC. Generic JSON normalization (no source-specific shaping).
  • api/routes/channels.py: POST /channels/webhook/{route_id}. Route bindings load from the CHANNELS_WEBHOOK_ROUTES env var for now (seam for a future config store).
  • Whitelist /channels/webhook in the auth middleware so it bypasses the agent API-key check and verifies its own per-route secret instead; register the router.

Greptile Summary

This PR introduces a channel abstraction layer that normalises external HTTP surfaces (webhooks, and future platforms like Slack) into agent turns, keeping the agent-driving core channel-agnostic. A new channel implements two methods (authenticate + to_inbound) without touching the router or existing API paths.

  • domain/channels/InboundMessage, Channel ABC, ChannelBinding, timing-safe HMAC-SHA256 and shared-secret helpers, and ChannelRouter (task get-or-create + sync/async dispatch + quiescence-based reply polling).
  • domain/channels/webhook.pyWebhookChannel accepting Authorization: Bearer, x-openclaw-webhook-secret, or X-Hub-Signature-256; generic JSON text normalization with an 8 KB fallback.
  • api/routes/channels.pyPOST /channels/webhook/{route_id} with size/auth/content-type guards and an optional wait param for synchronous callers; route bindings loaded from CHANNELS_WEBHOOK_ROUTES env var.
  • middleware_utils.py/channels/webhook whitelisted from agentex API-key auth so per-route secret verification takes over.

Confidence Score: 5/5

Safe to merge — auth helpers are timing-safe, the middleware whitelist correctly scopes only webhook sub-paths, and the sync/async dispatch paths are structurally sound.

No correctness defects were found in the changed code. The auth branching (HMAC-SHA256 preferred over bearer), the get-or-create session-key continuity, and the after_id cursor for reply isolation all work as intended. The two flagged items are quality improvements, not bugs that produce wrong behavior.

No files require special attention before merging, though router.py and routes/channels.py have two findings worth addressing in a follow-up.

Important Files Changed

Filename Overview
agentex/src/domain/channels/base.py Introduces Channel ABC, InboundMessage, ChannelBinding dataclasses, timing-safe secret and HMAC-SHA256 helpers, and the deliver_reply dispatcher. Clean, well-structured; auth helpers use hmac.compare_digest correctly.
agentex/src/domain/channels/router.py ChannelRouter dispatches InboundMessage to async/sync agent turns; await_reply polls for quiescence. The 8 s minimum latency on wait=True is undocumented and likely to exceed typical webhook platform timeouts.
agentex/src/domain/channels/webhook.py WebhookChannel implements Channel with HMAC-SHA256 and bearer/shared-secret auth; _render_text normalization falls back to truncated JSON. Auth branching correctly prioritises X-Hub-Signature-256 over bearer.
agentex/src/api/routes/channels.py POST /channels/webhook/{route_id} handler with size/auth/content-type guards. _webhook_binding re-parses the CHANNELS_WEBHOOK_ROUTES env var JSON on every request rather than caching the result at startup.
agentex/src/api/middleware_utils.py Adds /channels/webhook to WHITELISTED_ROUTES; the boundary-aware prefix match correctly covers all /channels/webhook/{route_id} sub-paths.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Caller as Webhook Caller
    participant MW as Auth Middleware
    participant Route as POST /channels/webhook/{route_id}
    participant WC as WebhookChannel.authenticate()
    participant CR as ChannelRouter.dispatch()
    participant ACP as AgentsACPUseCase
    participant MS as TaskMessageService

    Caller->>MW: "POST /channels/webhook/{route_id}"
    MW->>MW: is_whitelisted_route() skip API-key check
    MW->>Route: forward request
    Route->>Route: _webhook_binding(route_id) ChannelBinding
    Route->>WC: authenticate(binding, request, raw_body)
    WC-->>Route: True / False (HMAC or bearer)
    Route->>CR: dispatch(inbound, binding, acp_type)
    CR->>ACP: TASK_CREATE (get-or-create, session_key)
    ACP-->>CR: task_id
    alt "acp_type == SYNC"
        CR->>ACP: MESSAGE_SEND(task_id, content)
        ACP-->>CR: reply messages inline
        CR-->>Route: "DispatchResult(reply=text)"
    else "acp_type == ASYNC"
        CR->>MS: "get_messages(limit=1) after_id"
        CR->>ACP: EVENT_SEND(task_id, content)
        CR-->>Route: "DispatchResult(reply=None, after_id)"
    end
    alt "wait=True and reply is None"
        loop every 2s until stable 6s or 120s timeout
            Route->>MS: get_messages(after_id, asc)
            MS-->>Route: agent messages
        end
    end
    Route-->>Caller: ok, task_id, reply?
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Caller as Webhook Caller
    participant MW as Auth Middleware
    participant Route as POST /channels/webhook/{route_id}
    participant WC as WebhookChannel.authenticate()
    participant CR as ChannelRouter.dispatch()
    participant ACP as AgentsACPUseCase
    participant MS as TaskMessageService

    Caller->>MW: "POST /channels/webhook/{route_id}"
    MW->>MW: is_whitelisted_route() skip API-key check
    MW->>Route: forward request
    Route->>Route: _webhook_binding(route_id) ChannelBinding
    Route->>WC: authenticate(binding, request, raw_body)
    WC-->>Route: True / False (HMAC or bearer)
    Route->>CR: dispatch(inbound, binding, acp_type)
    CR->>ACP: TASK_CREATE (get-or-create, session_key)
    ACP-->>CR: task_id
    alt "acp_type == SYNC"
        CR->>ACP: MESSAGE_SEND(task_id, content)
        ACP-->>CR: reply messages inline
        CR-->>Route: "DispatchResult(reply=text)"
    else "acp_type == ASYNC"
        CR->>MS: "get_messages(limit=1) after_id"
        CR->>ACP: EVENT_SEND(task_id, content)
        CR-->>Route: "DispatchResult(reply=None, after_id)"
    end
    alt "wait=True and reply is None"
        loop every 2s until stable 6s or 120s timeout
            Route->>MS: get_messages(after_id, asc)
            MS-->>Route: agent messages
        end
    end
    Route-->>Caller: ok, task_id, reply?
Loading

Fix All in Cursor Fix All in Claude Code Fix All in Codex

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
agentex/src/api/routes/channels.py:43-56
**Env-var re-parsed on every request**

`_webhook_binding` calls `os.environ.get` and `json.loads` inside the hot path of every incoming webhook. The env var is read and the JSON is re-parsed on each request. Under any meaningful load this is unnecessary GC/CPU pressure, and a JSON parse error on a malformed env var silently returns `None` → 404 for every route rather than a startup-time configuration failure. Consider parsing the routes dict once at module level (or on first call with a module-level cache) so misconfiguration is caught early and the hot path is a dict lookup.

### Issue 2 of 2
agentex/src/domain/channels/router.py:121-153
**`wait=True` minimum latency of 8 s is undocumented and likely to timeout webhook senders**

`await_reply` sleeps `interval_s` (2 s) before its first poll, then requires `quiescence_s` (6 s) of text stability before returning — a guaranteed floor of 8 seconds, not counting agent processing time. Most webhook platforms have hard timeouts well below this: GitHub's webhook delivery is 10 s and Zapier is 30 s. Any async agent doing real work will push the total past GitHub's limit. Neither the query-parameter description nor the OpenAPI spec mentions this latency floor, so a caller who enables `wait=true` expecting a quick response will silently time out on the platform side while agentex happily keeps polling. The `interval_s` and `quiescence_s` defaults should be documented prominently on the `wait` parameter, or the minimum quiescence should be shortened for the webhook use case.

Reviews (3): Last reviewed commit: "feat(channels): add extensible channel a..." | Re-trigger Greptile

@danielmillerp danielmillerp requested a review from a team as a code owner June 19, 2026 18:52
Comment thread agentex/src/domain/channels/webhook.py
@danielmillerp danielmillerp force-pushed the dm/agentex-channels-webhook branch from 2e41d47 to 744749b Compare June 19, 2026 19:04
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown

✱ Stainless preview builds

This PR will update the agentex-sdk SDKs with the following commit messages.

openapi

feat(api): add webhook endpoint to channels

python

chore(internal): regenerate SDK with no functional changes

typescript

chore(internal): regenerate SDK with no functional changes

Edit this comment to update them. They will appear in their respective SDK's changelogs.

agentex-sdk-typescript studio · code · diff

Your SDK build had at least one new note diagnostic, which is a regression from the base state.
generate ⚠️build ⏭️ (prev: build ✅) → lint ⏭️ (prev: lint ✅) → test ✅

New diagnostics (1 note)
💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /channels/webhook/{route_id}`
agentex-sdk-python studio · code · diff

Your SDK build had at least one new note diagnostic, which is a regression from the base state.
generate ⚠️build ⏭️ (prev: build ✅) → lint ⏭️ (prev: lint ✅) → test ✅

New diagnostics (1 note)
💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /channels/webhook/{route_id}`
agentex-sdk-openapi studio · code · diff

Your SDK build had at least one new note diagnostic, which is a regression from the base state.
generate ✅

New diagnostics (1 note)
💡 Endpoint/NotConfigured: Skipped endpoint because it's not in your Stainless config: `post /channels/webhook/{route_id}`

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-06-19 19:39:34 UTC

Comment thread agentex/src/api/routes/channels.py Outdated
Comment on lines +95 to +99
agent = await agents_use_case.get(name=binding.agent_name)

inbound = _WEBHOOK.to_inbound(route_id, body)
result = await ChannelRouter(agents_acp_use_case).dispatch(
inbound, binding, agent.acp_type

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Unhandled agent-not-found propagates to webhook callers

agents_use_case.get(name=binding.agent_name) is called without error handling. If the agent_name in CHANNELS_WEBHOOK_ROUTES doesn't match a registered agent, the exception thrown by the use case will propagate directly to the webhook caller — either as an AttributeError (if get() returns None and agent.acp_type is dereferenced) or as a misrouted 404 that looks identical to an unknown route error. A misconfigured env var would cause every request to that route to fail with a confusing 500/404, with no log message pointing at the root cause. Wrapping this in a try/except and returning a clear 500 or 503 would make misconfiguration much easier to diagnose.

Prompt To Fix With AI
This is a comment left during a code review.
Path: agentex/src/api/routes/channels.py
Line: 95-99

Comment:
**Unhandled agent-not-found propagates to webhook callers**

`agents_use_case.get(name=binding.agent_name)` is called without error handling. If the `agent_name` in `CHANNELS_WEBHOOK_ROUTES` doesn't match a registered agent, the exception thrown by the use case will propagate directly to the webhook caller — either as an `AttributeError` (if `get()` returns `None` and `agent.acp_type` is dereferenced) or as a misrouted 404 that looks identical to an unknown route error. A misconfigured env var would cause every request to that route to fail with a confusing 500/404, with no log message pointing at the root cause. Wrapping this in a try/except and returning a clear 500 or 503 would make misconfiguration much easier to diagnose.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Cursor Fix in Claude Code Fix in Codex

…ok ingress

Introduce a channel layer that normalizes external surfaces into agent turns and,
for channels that respond, delivers the agent's reply back. The agent-driving core
is channel-agnostic; a new channel (e.g. Slack) implements one interface without
touching the router.

- domain/channels/base.py: InboundMessage, ChannelBinding (route -> agent + an OPAQUE
  params dict forwarded verbatim to task/create; the platform does not interpret it),
  timing-safe shared-secret / HMAC-SHA256 verifiers, and the Channel ABC. Ingress:
  authenticate + to_inbound. Outbound (optional, OpenClaw-style): deliver + chunk
  (textChunkLimit), plus deliver_reply() — the buffered dispatcher that chunks a reply
  and delivers each block. A plain webhook leaves outbound unset (its reply is the
  HTTP response); push channels (Slack) set supports_outbound and implement deliver.
- domain/channels/router.py: ChannelRouter.dispatch() — task/create (get-or-create on
  a per-conversation session key) + the turn, branching on ACP type (sync: message/send
  returns the reply inline; async: event/send, reply lands on the stream). await_reply()
  retrieves an async agent's settled reply for responding channels.
- domain/channels/webhook.py: WebhookChannel — generic HTTP ingress with a per-route
  secret. Accepts Authorization: Bearer / x-openclaw-webhook-secret, or an
  X-Hub-Signature-256 HMAC (GitHub/Gitea). Generic JSON normalization.
- api/routes/channels.py: POST /channels/webhook/{route_id} (+ optional ?wait to return
  the reply for synchronous callers). Route bindings load from CHANNELS_WEBHOOK_ROUTES
  env for now (seam for a future config store).
- Whitelist /channels/webhook in the auth middleware so it bypasses the agent API-key
  check and verifies its own per-route secret instead; register the router.
- Regenerate openapi.yaml.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@danielmillerp danielmillerp force-pushed the dm/agentex-channels-webhook branch from 744749b to 239fb1a Compare June 19, 2026 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant