Release: open-source-readiness pass + CVE clear + captcha primitive by djl11 · Pull Request #283 · unifyai/unity

djl11 · 2026-05-26T11:42:46Z

Promotes 6 commits from staging to main. Two themes plus one feature.

Open-source-readiness pass (3 commits)

aaabf3d46 chore(repo): tighten .gitignore for build artifacts and add AGENTS.md
- .gitignore now covers build/, dist/, *.egg-info/, Local/
- Removed ~12MB of build artifacts from the working tree
- New AGENTS.md distilled from .cursor/rules/ so Claude Code, Codex, Aider, Cline, etc. pick up the same conventions Cursor does
bfe44c46f chore(github): add CODEOWNERS, PR/issue templates, dependabot, OSV scanner
- CODEOWNERS — @unifyai/Engineers as catch-all + explicit ownership of security-sensitive paths
- PULL_REQUEST_TEMPLATE.md — references the .cursor/rules invariants
- ISSUE_TEMPLATE/{config,bug_report,feature_request}.yml — routes bugs by surface; steers "please add this skill" feature requests toward GuidanceManager/FunctionManager
- dependabot.yml — github-actions weekly (grouped) + agent-service/ npm weekly; deliberately skips scheduled pip per the editable-sibling install model
- workflows/osv-scanner.yml — Google's reusable workflow pinned by SHA, SARIF to Security tab

Dependabot CVE triage (1 commit + 5 dismissals)

351563a81 chore(deps): bump 9 packages to clear Dependabot CVE alerts
- urllib3 2.6.3 → 2.7.0 (CVE-2026-44431, CVE-2026-44432, both high)
- langchain-core 1.3.0 → 1.4.0 (CVE-2026-44843 high)
- python-multipart 0.0.26 → 0.0.29 (CVE-2026-42561 high)
- lxml 6.0.3 → 6.1.1 (CVE-2026-41066 high)
- langsmith 0.7.33 → 0.8.5 (CVE-2026-45134 high)
- authlib 1.7.0 → 1.7.2 (CVE-2026-44681 medium)
- idna 3.11 → 3.16 (CVE-2026-45409 medium)
- qs 6.15.0 → 6.15.2 (CVE-2026-8723 medium, npm)
- ws 8.18.3 → 8.21.0 (CVE-2026-45736 medium, npm)
Plus 5 alerts dismissed out-of-band as not_used: 4 LiteLLM proxy CVEs (Conversation Manager Refactor #69/add timestamp arg to clear_notifications #70/Summary #71/add assistant phone utterance to the thread #74 — proxy not deployed) and Replace explicit requests API calls to use Unify API #67 python-dotenv (we only read .env, never call set_key()). The litellm bump to 1.83.10 was investigated and rejected — it forces openai 2.30 → 2.24 and langchain-openai 1.1.15 → 1.1.10 because litellm hard-pins openai. Recorded that decision via @dependabot ignore this version on unillm#54.

Net effect on the Security tab after this merge: 15 alerts (1 critical, 9 high, 5 medium) → 0.

Captcha primitive + docs (2 commits)

c9ba90982 feat(computer): add solve_captcha primitive for reCAPTCHA v2 via AntiCaptcha
39fe85099 docs(env): document ANTICAPTCHA_KEY placeholder in .env.example

Other in-flight work picked up incidentally

bd001c346 test(task_scheduler): pin Communication env-builder equivalence in shared contract tests — landed on staging before this session.

Test plan

The full test suite auto-runs on staging→main PRs (tests.yml line 130). No tags needed. Auto-merge on green.

…ared contract tests Adds 4 new tests in test_offline_runner_contract.py that prove field-for-field that Communication's NEW _build_offline_runner_env composition (shared Unity contract + hosted-only assistant-identity layer) produces dicts identical to the OLD monolithic Communication builder, across the scheduled, triggered, entrypoint-override, and sparse-assistant-data scenarios. The golden reference function is a verbatim copy of Communication's pre-refactor builder inlined into the test file. If anything in the shared contract drifts from the old behaviour, these tests fail loudly here, in Unity's test suite, before reaching Communication's deployment. Brings total contract-module test count to 35 (up from 31).

…Captcha Exposes a deterministic, Python-callable primitive `WebSessionHandle.solve_captcha()` on every web session created via `cp.web.new_session(...)`. The primitive delegates the visible reCAPTCHA v2 challenge to the AntiCaptcha worker pool and injects the returned Google-signed token back into the live page so the page's own submit flow accepts the verification. Layers wired: - agent-service: new `POST /captcha/solve` handler (sitekey extraction + createTask/getTaskResult polling + page.evaluate injection). Reads `ANTICAPTCHA_KEY` only from `process.env`; token is never logged or echoed in the response. - Python: `ComputerSession.solve_captcha` (+ matching mock-backend and `_MockSession` stubs) with rich docstring on `_LowLevelActionsMixin`. `ComputerSession._request` gains a keyword- only `timeout` parameter (default preserves existing behaviour). - Runtime exposure: `"solve_captcha"` appended to `_COMPUTER_METHODS` and `ComputerPrimitives._LOW_LEVEL_METHODS`; excluded from `_DESKTOP_METHODS` (desktop sessions have no DOM target). - Config: optional `ANTICAPTCHA_KEY` documented in `agent-service/README.md`; missing key surfaces as 503 `anticaptcha_key_missing`. - Tests: mock-backend coverage in `test_computer_multimode.py` guarding the auto-wiring and the default/invisible variant paths. Magnitude-core is intentionally untouched: the primitive is not in the LLM action vocabulary. Callers reach for it from their own orchestration code after a prior `observe()` has confirmed a CAPTCHA is on screen. Out of scope: v3/Enterprise reCAPTCHA, hCaptcha, Turnstile, FunCaptcha, GeeTest, desktop-mode equivalents, and wiring into specific actor/extractor flows.

Clean up the open-source-ready repo surface: - .gitignore now covers build/, dist/, *.egg-info/ (any name), and Local/ so setuptools/uv build output and personal workspace dirs stay out of git status. Deleted ~12MB of build/, dist/, unity.egg-info/, unify_agent.egg-info/, Local/, __pycache__/, .cache.ndjson from disk. - AGENTS.md distilled from .cursor/rules/ so Claude Code, Codex, Aider, Cline, and other assistants pick up the same conventions Cursor does (testing philosophy, no-defensive-coding, explicit-path commits, state-manager design rules, repo map). No code changes.

…anner Brings .github/ in line with peer open-source AI-assistant repos (NousResearch/hermes-agent, openclaw/openclaw) so contributors land on a familiar surface and supply-chain hygiene is visible. Added: - CODEOWNERS — @unifyai/engineers as catch-all + explicit ownership of security-sensitive paths (CODEOWNERS itself, dependabot.yml, workflows/, SECURITY.md, AGENTS.md, ARCHITECTURE.md, secret_manager/). - PULL_REQUEST_TEMPLATE.md — Summary / type / areas / test plan / migration / checklist. References the .cursor/rules invariants (no-defensive-coding, no-temporal-comments, zero-backcompat target). - ISSUE_TEMPLATE/{config,bug_report,feature_request}.yml — bug template routes by surface (CLI / voice / installer / specific manager / ConversationManager / etc.) and asks for `unity doctor` output; feature template explicitly steers users toward GuidanceManager/FunctionManager for runtime-extension requests so the issue queue isn't drowned in "please add this skill" tickets. - dependabot.yml — github-actions weekly (grouped minor/patch) + agent-service npm weekly. Deliberately skips scheduled pip updates per the editable-sibling install model (unify/unillm/orchestra-core); CVE-driven pip security updates remain enabled at the repo-settings level. Comment explains the rationale. - workflows/osv-scanner.yml — Google's reusable workflow pinned by SHA. Scans uv.lock + agent-service/package-lock.json on lockfile changes, push to main/staging, and weekly. SARIF results land in the Security tab; fail-on-vuln disabled so pre-existing CVEs don't block merges.

Lockfile bumps only — no pyproject.toml / package.json changes. Triggered by the 15 open Dependabot alerts on the default branch (see https://github.com/unifyai/unity/security/dependabot). uv.lock (7 bumps): - urllib3 2.6.3 -> 2.7.0 CVE-2026-44431 (high) cross-origin header leak in proxied redirects - urllib3 2.6.3 -> 2.7.0 CVE-2026-44432 (high) decompression-bomb bypass in streaming API - langchain-core 1.3.0 -> 1.4.0 CVE-2026-44843 (high) unsafe deserialization via overly broad load() allowlists (pulls in new transitive langchain-protocol 0.0.15) - python-multipart 0.0.26 -> 0.0.29 CVE-2026-42561 (high) DoS via unbounded multipart part headers - lxml 6.0.3 -> 6.1.1 CVE-2026-41066 (high) XXE in default config of iterparse() and ETCompatXMLParser() - langsmith 0.7.33 -> 0.8.5 CVE-2026-45134 (high) public prompt pull deserializes untrusted manifests - authlib 1.7.0 -> 1.7.2 CVE-2026-44681 (medium) OIDC implicit/hybrid open redirect (not reachable — we don't run an OIDC provider — but bumped for hygiene) - idna 3.11 -> 3.16 CVE-2026-45409 (medium) IDNA encode() bypass of CVE-2024-3651 fix agent-service/package-lock.json (2 bumps, via npm audit fix): - qs 6.15.0 -> 6.15.2 CVE-2026-8723 (medium) qs.stringify DoS on null/undefined entries in comma-format arrays - ws 8.18.3 -> 8.21.0 CVE-2026-45736 (medium) uninitialized memory disclosure Not addressed in this commit (blocked on sibling repos): - litellm 1.83.4 -> 1.83.10 (clears 4 alerts: 1 critical SQLi in proxy, 3 high — sandbox escape, RCE via MCP stdio, SSTI in /prompts/test). All four CVEs are in the LiteLLM *proxy server* surface, which Unity does not run; reachability is effectively zero, but the bump should land for defense in depth. BLOCKED: unillm pins litellm==1.83.4 exactly. The unillm Dependabot PR is already open at unifyai/unillm#54. - python-dotenv 1.0.1 -> 1.2.2 (CVE-2026-28684, medium — symlink-following in set_key; Unity only reads .env so not reachable). BLOCKED: litellm 1.83.4 ships an unusual pin (python-dotenv>=1.0.1,<1.0.1+) that effectively freezes python-dotenv at 1.0.1. Will unblock once unillm#54 lands and `uv sync` brings litellm 1.83.10 in.

The agent-service /captcha/solve handler (added in c9ba909) reads process.env.ANTICAPTCHA_KEY at request time and returns 503 anticaptcha_key_missing if it's unset. Document the env var alongside the other optional integration keys so operators know where to put it without having to read the agent-service README. The actual key value lives in GCP Secret Manager under projects/responsive-city-458413-a2/secrets/ANTICAPTCHA_KEY, alongside the other runtime API keys (ANTHROPIC_API_KEY, DEEPGRAM_API_KEY, LIVEKIT_API_KEY, etc.). The companion unity-deploy commit adds ANTICAPTCHA_KEY to setup_k8s_config.py's required_secrets list so the unity-secrets K8s Secret picks it up automatically on cluster setup.

github-advanced-security · 2026-05-26T13:04:34Z

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

The script's `discover_all()` was only recursing into top-level tests/ sub-directories whose names start with `test` — but Unity's convention is to name per-manager test directories after the manager itself (contact_manager/, knowledge_manager/, actor/, task_scheduler/, conversation_manager/, etc.) without the `test_` prefix. Effect: the staging→main CI matrix was silently collapsing to just 2 entries (tests/test_integration_status/ and tests/test_session_details.py — the only top-level paths starting with `test`) instead of the ~67 leaf paths that actually exist. Every prior release went green on a hollow signal exercising none of the manager test suites. Fix: replace `item.name.startswith("test")` with `item.name not in EXCLUDE_DIRS`. Safe because `collect_paths()` is itself gated by `has_test_files`/`has_test_subdirs`, so recursing into a non-test directory is a no-op. EXCLUDE_DIRS already covers __pycache__, .pytest_cache, .venv, etc. Verified locally: `python3 .github/scripts/discover_test_paths.py | wc -l` returns 67 (was 2), and the output now includes tests/contact_manager, tests/task_scheduler, tests/actor/*, tests/conversation_manager/*, etc.

…same PR/branch The matrix workflow was producing a confusing mix of real failures and infrastructure cancellations during today's fix-and-iterate session. Inspecting one cancelled job (simulated/data on run 26562352729) showed: ##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled. ##[error]The operation was canceled. Root cause: pushing 10 commits in quick succession (typical of an iterative debugging session) spawned 10 concurrent matrix runs of ~67 jobs each = ~670 queued jobs against the org's finite GitHub Actions runner pool. GitHub started killing runners mid-test to free capacity, surfacing as fake "failures" on test clusters that had been passing. Fix: add a workflow-level concurrency group that cancels older runs when a newer commit lands on the same PR/branch. The group key is `tests-{workflow_name}-{PR_number or ref}` so: - Pushes to the SAME PR/branch supersede each other (the latest commit is what matters; older matrices are dead weight) - Different PRs / branches still run in parallel This is the standard pattern for monorepo CI workflows that allocate lots of jobs per run. The alternative — letting all runs queue and finish — wastes runner capacity and produces the runner-shutdown false-positives we're seeing. Side effect: investigation matrices like the current backlog (c583ab2, fc49fe6, 01a9de9, 1604b16, 6cef32b, ab849ef, 2b07266, f9d2289, 6c04f36, 13670b4) will now collapse to the latest commit only when this fix lands — the older ones get cancelled, freeing runner capacity for the newest matrix to actually complete. Net-delta investigation becomes meaningful again.

Same family as d715242 — fail-by-design fixtures yield non-zero exit, marker-filter empty now surfaces as exit 1 + descriptive stderr, session-name collision detection was removed (6892831), and "socket:" output line was part of the deleted Observe section (65bd78f). 1. test_directory_discovers_all_test_files: accept exit_code in (0, 1) since fixtures dir has always-fail files. 2. test_symbolic_only_all_eval_tests: accept exit 1 + stderr shape check. 3. test_second_run_creates_new_sessions: drop session-name uniqueness, assert both runs succeed + each created ≥1. 4. test_output_includes_socket_name: accept socket name in stdout-path OR RunResult.socket. All four were hidden from CI by the matrix-discovery bug.

…on same branch Previous concurrency key was: tests-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} For a push to staging: - push event: pull_request.number is null → falls back to github.ref = "refs/heads/staging" → group "tests-Tests-refs/heads/staging" - pull_request event (PR 283 staging→main is perpetually open): pull_request.number = 283 → group "tests-Tests-283" Different groups → both run, doubling credit usage on every commit during active development. Fix: use `github.head_ref || github.ref_name` instead. Both event types resolve this to the same source branch name ("staging"), collapsing the two simultaneous matrices into one. Direct pushes to branches without an open PR still work via the `ref_name` fallback. Different branches still parallelise (different group keys). Net effect on credit usage: ~50% reduction during active commit streams, since the duplicate push/pull_request matrix is now deduplicated.

tests/task_scheduler is a leaf directory with 14 files containing 318 collected tests (heavily parametrized LLM-eval). At ~30s avg per test that's ~159min — past the 130min per-job timeout the matrix workflow enforces (--timeout flag passed to parallel_run.sh + GitHub Actions job-level timeout-minutes: 130). The whole task_scheduler cluster has been auto-cancelling at timeout in recent matrices, regardless of code state. Fix: add SPLIT_DIRS config to discover_test_paths.py. Entries map {directory: list_of_file_groups}; each group becomes its own matrix entry (one space-separated parallel_run.sh argument). The default leaf-collapse algorithm is bypassed for split-config directories. Split chosen for task_scheduler: - test_execute.py alone (23 test funcs, the heaviest LLM-eval file — parametrized into many cases that exhaust runtime budget on their own) - test_active_queue.py + test_active_task.py together (29 funcs combined, still LLM-eval-heavy but not as severe) - The remaining 11 smaller files bundled (cumulative ~50 funcs, comfortable for one job) Why explicit chunking instead of a generic "split if > N files": runtime correlates with LLM-eval call count, not file count. A naive file-count split could put 5 fast files in one bundle and 1 mega-heavy file with 4 small ones in another — same hung-job outcome. Manual split lets us isolate the known-heavy file. Matrix entry count: 67 → 69 (+2). Each task_scheduler entry now has enough wall-clock headroom for its share of tests; the auto-cancellation at 130min should stop. The SPLIT_DIRS mechanism is reusable: future too-big clusters can be added here without further plumbing.

…k failures Four E2E spending tests have been failing in CI: - test_assistant_limit_check (DID NOT RAISE SpendingLimitExceededError) - test_inflight_cancellation_on_limit_exceeded (timing wrong) - test_limit_check_callback_allows_under_limit (allowed=False, cap=0.0, spend=$10) - test_limit_exceeded_blocks_llm_call (DID NOT RAISE) All four share a single root cause: state leaking through a SHARED "SpendingTest Assistant" record reused across every test in the file. The old e2e_config fixture did a "find-by-name then reuse, else create" lookup. Every test in TestE2ESpendingLimits got the same agent_id, so: 1. Cumulative spend (current_spend) is NEVER reset by Orchestra once an LLM call lands on it. Once any test makes a real LLM call, the assistant carries that spend for the rest of the session. test_limit_check_callback_allows_under_limit fails when it sees current_spend=$10 from earlier tests, even though it asserts the assistant "starts fresh". 2. The PATCH-based cap restore in test bodies (test_limit_exceeded_blocks_llm_call etc.) reads the *current* cap then restores it. If a previous test leaked cap=0, that becomes the "original" for the next test, making the leak permanent. 3. The fixture-level cap=None reset is best-effort with bare-except and silently fails on any Orchestra hiccup, leaving the cap unreset. The previous "await the reset PATCH" fix (c583ab2) addressed fragility #3 but couldn't address #1 (spend accumulation) or #2 (test-body restore racing the reset). Fix: each test gets its OWN freshly-created assistant with a unique surname (test-node-slug + 8-char UUID). The fixture: - Always POSTs a new assistant at setup (no find-by-name reuse) - Raises loudly on create failure (was: silently leaving test_agent_id=None then propagating to SESSION_DETAILS) - DELETEs the assistant at teardown via /assistant/{id} No state survives between tests: - Fresh agent_id per test → spend starts at 0 - Fresh cap=25 per test → no cap-leak between tests - Delete in teardown → no residual rows accumulate The non-E2E tests in the file (TestAtomicUpsert, TestUpdateCumulativeSpend, …) don't use e2e_config — they mock SESSION_DETAILS and are unaffected. Side effects: - Each test creates + deletes an assistant: ~2 extra HTTP round-trips per test. Acceptable cost given the correctness win. - Local DB rows accumulate transiently if a teardown DELETE fails (bare-except), but local.sh's docker-volume rebuild on restart clears them; CI runs are fresh per matrix job anyway.

…revent leak The previous commit (31b5e27) ephemeralised the per-test assistant, eliminating assistant-cap and assistant-spend leaks across spending E2E tests. But the test USER is necessarily shared — multiple tests, fixtures, and helpers reference `test-user-001`, and creating ephemeral users would require billing-account + api_key cascades that aren't worth the churn. User-level state can still leak: test_user_limit_check sets user cap=0 mid-test then restores to its locally-captured `original_limit`. If the restore PATCH fails for any reason (network hiccup, expired auth, the test crashes before the finally block runs reliably, etc.), the user is left with cap=0 and the NEXT test that reads user spending state sees a stuck zero cap. Defense: in the fixture teardown — after assistant DELETE — also unconditionally PUT user/spending-limit cap=None. This guarantees every subsequent test starts with no user-level cap regardless of how the previous test handled its own local restore. Best-effort with bare-except so a flaky Orchestra doesn't fail teardown (same pattern as the assistant DELETE above). Doesn't change correctness of any individual test — only makes the sequence of tests robust to one test leaking user-cap state into the next.

…hon split Empirical CI data from matrix 26582248627: - task_scheduler: my initial 3-way split (test_execute alone, test_active_queue+test_active_task bundled, rest bundled) saw test_execute pass cleanly but the test_active_queue+test_active_task bundle still running at 88+ minutes — 9+20 funcs, both heavily parametrized into many LLM-eval cases, exceeding the per-job budget combined. - function_manager/python: was a single 193-test cluster (15 files), running 88+ minutes and approaching the 130min job timeout. Likely will hit timeout on next matrix. Adjustments: 1. tests/task_scheduler: split test_active_queue and test_active_task into their own matrix entries (was bundled). Each gets its own 130min budget. 2. tests/function_manager/python: new 3-way split grouped by execution-mode family (so related fixture-sharing tests stay together): - Group A: execution-env + venv-lifecycle (6 files, heaviest subprocess-per-test category) - Group B: in-process proxy + state-mode (5 files) - Group C: execute_function + multi-session + helpers (6 files) Matrix entry count: 69 → 71 task_scheduler (+1) + 70 fm/python (+2) = 73 total. Concurrency-directive deduplication of push + PR events keeps the queue bounded; older runs cancel as new commits land. If a specific group still hits timeout, further split that one in a follow-up commit. Pattern is reusable.

… walk, cwd, HOME, pid) The earlier diagnostic (6c04f36) confirmed venv_dir.exists()= False between prepare_venv() return and create_subprocess_exec() on Linux CI. Local repro on macOS doesn't reproduce the disappearance, and no codepath in unity/, unify/, or unillm/ rmtrees the .unity/venvs/ tree. To narrow which tree level is being wiped, expand the diagnostic to dump: - PID + cwd + HOME at the failure point (in case some sibling test changed cwd / HOME after the prepare_venv chdir) - Existence of every Path ancestor from python_path up to "/" (deepest first), so we can tell whether the wipe is leaf-only (just the venv_id dir) or full-tree (.unity/venvs/) - Grandparent directory listing (the safe_ctx-keyed dir holding all venv_id subdirs for this test context); if it exists with OTHER venv-id subdirs, suspect per-venv-id cleanup; if missing, suspect higher-level rmtree The structured error message lets the next matrix run reveal the actual rmtree scope without further code changes, narrowing the search to the right code path. No behavior change otherwise — the structured RuntimeError is still raised in place of the generic FileNotFoundError.

…introduced third-parties Two related LLM-eval failures on the conv_mgr/voice fast-brain cluster: 1. test_redundant_checking_guidance_avoids_same_deferral_phrase: Scenario: assistant has already said "Let me check on that." then a `[notification]` confirms it's checking. On the next turn, the LLM said "Let me check on that." AGAIN — exact verbatim repeat. The existing "Do NOT over-acknowledge or send multiple confirmations" line in Communication guidelines is too generic; the LLM took it to mean "don't send multiple replies", not "don't reuse the same phrase". Fix: add an explicit bullet right under the existing one: "Never repeat the same deferral / filler phrase verbatim across consecutive turns" — with concrete varied alternatives ("Still looking…", "Almost there", "One moment more", or stay silent via `wait`). Naming the verbatim- repeat case directly avoids the LLM rationalising the repeat as a fresh ack. 2. test_demo_introduction_without_name_in_greeting: Scenario: boss says "I'm here with Maria — Maria, go ahead and ask Alex anything", then Maria (as the next user message) asks "can you pronounce my name?" The LLM replied "Sure — how do you spell it?" — completely failing to carry "Maria" forward from the introduction. The voice prompt had no guidance about WHO is currently speaking when multiple parties share the line. The model sees both messages as role=user and didn't infer the speaker switch. Fix: add a "Tracking who's currently speaking" sub-section to the voice prompt's pacing block. Explicitly states that after a "here with X" / "I'll hand you over to Y" introduction, the next turn IS X/Y, and self-referential language ("my name", "I'm thinking…") refers to the introduced person. Names the exact failure case as the wrong-answer example: asking "how do you spell it?" after "this is Maria" sounds inattentive. Both are prompt-engineering nudges; no code path changes. Caches will invalidate naturally since both nudges change the system prompt text.

…nterjected questions test_interjection (in tests/transcript_manager/test_ask.py) was failing: 1) tm.ask(Q1) # "When did Dan last speak with Julia on the phone?" 2) handle.interject(Q2) # "Did Jimmy ever tell us when he's on holiday?" 3) handle.result() # expects answers to BOTH Q1 and Q2 The LLM-judge assertion (_llm_assert_correct with multiple_answers=True) checks the final reply contains both the Dan-Julia date AND the Jimmy holiday date. Production LLM behavior was to abandon Q1 entirely once Q2 came in and reply with just Jimmy's date — natural conversational behavior but not what the test (or a power-user querying transcripts) expects. The TM ask prompt previously had no guidance about how to handle interjections. Add a global directive: "treat interjections as ADDITIVE — final reply must cover BOTH the original question AND the interjected one, not just the latest." This makes the test's expectation explicit in the prompt rather than relying on the LLM to infer the right behavior. This is the right semantic for a transcript-query loop specifically — when someone interjects an extra question while you're querying transcripts, they typically want both questions answered, not the first abandoned. The same nudge wouldn't apply uniformly to all loops (e.g. ConversationManager actions where interjection often means "course correct the current action", not "do both") so it lives only in the TM ask prompt.

… when caller asks "what is this about?" test_triggered_wake_explains_topic_naturally was failing: System prompt: voice agent (Alex calling Alice) Conversation: 1) assistant [notification]: "this phone call from Alice may relate to the task 'Invoice follow-up'. ... Do not mention the task unless it naturally helps." 2) user (Alice): "Sure, what is this about?" Expected response: mentions "invoice", "follow-up", or "alice" Actual response: "Hi, how can I help?" The LLM was being overly conservative about the "do not mention unless it naturally helps" hedge in the wake notification. "What is this about?" is the canonical scenario where mentioning the topic naturally helps — but the fast brain was treating the hedge as "stay silent about it". The existing "Notification authority" block only covered completion notifications. It said nothing about wake-context (why the assistant is awake / why a call is happening). Add a new sub-section "Wake-context notifications" that: - Explicitly identifies the pattern: "Background context: this call may relate to X" / "<task X> is due now" - Names "what's this about?" / "why did you call?" as the canonical "should mention the topic" trigger - Reinterprets the hedge phrasing ("may relate", "still deciding", "do not mention unless it naturally helps") as "lead with the topic but stay open to redirection", NOT as "stay silent" - Gives the exact failing example as the wrong-answer ("Hi, how can I help?" — ignores the context I was given) - Gives concrete right-answer phrasings ("Wanted to follow up on the invoice — is now a good time?") - Reiterates: never quote internal phrasing aloud (already enforced elsewhere, but adjacent to this block for clarity) Prompt-engineering nudge only; no code paths changed.

…stant to skip CI-broken wake-up The per-test ephemeral-assistant fixture (added earlier in this session) was failing in CI with 500 Internal Server Error on every POST /v0/assistant call. Orchestra-side stack trace shows: views.py:788 create_assistant -> assistant_infra.py:1652 wake_up_assistant -> httpcore.UnsupportedProtocol: Request URL is missing an 'http://' or 'https://' protocol. `wake_up_assistant` builds `_adapters_url_for(deploy_env) + "/assistant/wakeup"`. In CI there is no Communication / adapters service running, so `_adapters_url_for("")` returns "" and httpx rightly refuses to POST to a protocol-less URL. The orchestra view (Phase 3 in create_assistant) only calls wake-up when `not assistant_in.is_local`. The intent of `is_local=True` is "unity runs locally, no remote infra needed" — exactly what every CI test wants. The fixture already passes `create_infra=False` (skips pubsub/VM provisioning); adding `is_local=True` skips the adapters wakeup webhook too, completing the "no external services required" picture. Long-term, orchestra's `wake_up_assistant` should fail gracefully (or no-op) when `_adapters_url_for` returns empty, since a 500 here masks the real configuration issue. Cross-repo fix can land separately; the test-side flag is the correct caller behavior anyway (CI assistants are local fixtures by definition).

…s.agent_id Two unrelated function_manager/python failures from the CI matrix that were blocking the whole cluster: 1. `uv sync` failing with "Distribution not found at: file:///tmp/.../<venv_dir>" on every test that creates a venv: The synthetic pyproject.toml generated for each user venv only declares `[project]` + `dependencies`. It is NOT an installable package. Without `--no-install-project`, uv tries to install the project itself (in editable mode), fails to find a build backend / sdist for the empty venv_dir, and surfaces "Distribution not found" — which then masquerades as a venv-creation failure. Adding `--no-install-project` to the `uv sync` invocation tells uv "install dependencies into .venv, do NOT install the project itself". The project manifest is just a dependency declaration for us, never a real package. Verified the flag exists in current uv. The CI failure was reliably reproducible across every test in the `test_venv_*` family. 2. `AttributeError: AssistantDetails ... has no attribute 'id'` in test_remote_windows fixtures: `mock_session_details_windows` and `mock_session_details_ubuntu` were calling `monkeypatch.setattr(SESSION_DETAILS.assistant, "id", "test-assistant")`. Same legacy `.id` → `.agent_id` rename that bit contact_manager tests earlier. AssistantDetails is a Pydantic model with the new field name `agent_id` (int); setting an arbitrary attribute fails because the model rejects unknown fields. Switched the fixtures to set `agent_id` to a unique int per fixture (999_001 / 999_002) to keep the two windows / ubuntu variants distinguishable in any downstream test assertions.

…gger The earlier fix (f9d2289) called caplog.set_level(INFO, logger="unity"), on the assumption that this would also subscribe caplog to the named logger. That assumption was wrong: - pytest's caplog.handler is registered with the ROOT logger only, via the `propagate=True` chain. - `caplog.set_level(level, logger=name)` sets the LEVEL on the named logger but does NOT attach caplog.handler to it. - unity/logger.py sets `LOGGER.propagate = False` (since 5ed695f, 2026-02-20 "Consolidate logging into unity.logger as single authority"), so unity log records never reach the root handler. Net effect: caplog.records stays empty for unity LOGGER output even after set_level(..., logger="unity"). Verified locally with a minimal test (`logging.getLogger("unity")` + `LOGGER.info(...)`) — without the explicit `addHandler(caplog.handler)`, the captured records list is empty; with the explicit add, the message appears immediately. Fix: explicitly `logging.getLogger("unity").addHandler(caplog.handler)` at the top of the test body, wrap the assertion block in try/finally, and `removeHandler(caplog.handler)` in finally so the handler doesn't leak across tests (each test gets a fresh caplog with a fresh handler, and a stale handler would write to a disposed buffer). The existing `set_level(..., logger="unity")` call is left as-is so the unity logger's effective level still includes INFO during the test (otherwise INFO records would be filtered before reaching any handler).

…NGS picks them up The env vars in `pytest_configure(config)` (USER_DESKTOP_CONTROL_ENABLED, ASSISTANT_EMAIL, ASSISTANT_NUMBER, ASSISTANT_WHATSAPP_NUMBER) were landing too late. Order of operations: 1. pytest collects this conftest.py → runs `from tests.helpers import ...` at module top 2. tests/helpers.py transitively imports unity modules 3. unity.settings instantiates `SETTINGS = ProductionSettings()` — pydantic-settings reads env vars **once**, at this point. 4. pytest_configure() runs — by now SETTINGS is frozen, so the env overrides are silently ignored. Symptom (caught in conv_mgr/flows + conv_mgr/voice + conv_mgr/core CI): - test_can_you_use_my_computer: LLM answered "Not directly" instead of "Yes — install a quick remote-access tool from unify.ai", because SETTINGS.conversation.USER_DESKTOP_CONTROL_ENABLED was False at SETTINGS-instantiation time → the desktop_access_faq prompt branched the wrong way. - test_reply_adds_re_prefix_to_subject: LLM didn't emit EmailSent because `send_email` wasn't surfaced as a tool (gated on non- empty assistant.email; SESSION_DETAILS.assistant.email was still "" because ASSISTANT_EMAIL env var hadn't been read at populate time). Fix: hoist the env-var setdefaults to the very top of conftest.py, BEFORE the `from tests.helpers import ...` line. Add a header comment documenting the timing requirement so future hands don't move them back into pytest_configure() "for tidiness". The redundant copies in pytest_configure() stay as a defense-in-depth (in case a downstream test reimports SETTINGS) but the authoritative point of truth is now the module top.

…exist, restore _sync_required_contacts In commit 2b07266 I renamed the SyncContacts handler call from contact_manager._sync_required_contacts → _provision_system_overlays in both event_handlers.py and the matching test. The rename was wrong: _provision_system_overlays is NOT defined anywhere on ContactManager / BaseContactManager / SimulatedContactManager — only _sync_required_contacts is. Symptoms: - Production: every SyncContacts event raised AttributeError on cm.contact_manager._provision_system_overlays (caught and logged by the try/except in event_handlers.py:_sync_contacts, silently dropping the sync — no boom in logs, just no-op). - Tests: test_queue_operation_waits_for_initialization patched the same wrong name; production handler now AttributeError'd inside the try/except, never reached the (non-existent) overlay call, mock_sync.called stayed False, assert_called_once() failed with "Expected '_sync_required_contacts' to have been called once" (mock display name inherited from the wraps= target, which itself failed to resolve). Fix is purely the revert. Restore the actual production method name in BOTH the handler and the test. Verified locally (`pytest test_queue_operation_waits_for_initialization -xvs` → 1 passed). Lesson for future renames: confirm the target method exists on every concrete implementation (BaseContactManager, ContactManager, SimulatedContactManager) before committing the rename — `git grep "def _provision_system_overlays"` would have shown zero matches and flagged this immediately.

…n't depend on cwd test_screenshot_crop_via_act was failing with: FileNotFoundError: [Errno 2] No such file or directory: 'Screenshots' `generate_screenshot_path(entry)` returns a relative path (`Screenshots/User/<ts>.jpg`). Production code calls it from inside a worker that runs in `cwd=local_root` — see `conversation_manager/main.py:os.chdir(_local_root)` early in startup. Test fixtures spin up CM in-process via CMStepDriver without that chdir, so when the test calls `write_screenshot_to_disk` directly, the relative path resolves against whatever cwd pytest is in (usually the repo root), and `p.parent.mkdir(parents=True)` raises because `./Screenshots/` isn't there. Earlier in the test we already mkdir'd `local_root / "Screenshots" / "User"` (line 61) and the test later globs `local_root / "Screenshots" / "User"` for the written file (line 108), so the test already KNOWS where the screenshot should live — it just wasn't telling write_screenshot_to_disk explicitly. Fix: absolutise the path before the disk write: screenshot_path = str(local_root / generate_screenshot_path(entry)) This keeps the production codepath (relative + cwd chdir) untouched and makes the test independent of cwd. No prod change.

…on exit (parallel race) ROOT CAUSE found for the function_manager/python "venv python disappeared between prepare_venv() and create_subprocess_exec()" RuntimeError that has been failing reliably on every CI matrix run for weeks. The pytest_unconfigure hook was calling `shutil.rmtree("/tmp/unity_test_home")` at the end of EVERY pytest session. parallel_run.sh launches one pytest session per test in parallel tmux panes, all of them sharing the same deterministic HOME (by design — LLM cache keys embed ~/Unity/Local and must stay stable). When session A finishes, its pytest_unconfigure wipes the entire tree — including the venvs that session B/C/D's FunctionManager just created under `$HOME/Unity/Local/.unity/venvs/<ctx>/<id>/`. The next `execute_in_venv` in B/C/D then sees: venv_dir=/tmp/unity_test_home/Unity/Local/.unity/venvs/<ctx>/0 exists=False ancestor existence: False all the way up to /tmp/ The earlier diagnostic dump (commit 1e9a5f4) confirmed the wipe scope was full-tree, not leaf-only, ruling out per-venv-id cleanup. Fix: drop the rmtree. The shared HOME stays, but: - CI runners are ephemeral — the dir is reclaimed when the runner shuts down. - Local dev: users can `rm -rf /tmp/unity_test_home` manually. - Test isolation is already enforced by per-test Unify contexts (each FunctionManager venv lives under a context-keyed subdirectory), so leftover files from one test do not affect another. This single-line change unblocks every test in function_manager/python and likely fixes a long tail of intermittent "vanishing fixture file" failures elsewhere too.

… doesn't time out Five test_remote_windows tests were failing in CI with: RuntimeError: Managed VM did not become ready within 5 minutes The production helper `_execute_python_function_on_remote_windows` waits on the `_vm_ready` threading.Event (defined at `unity/function_manager/primitives/runtime.py:56`). That event is set in production by either: - ConversationManager startup (`unity/conversation_manager/main.py:278`) - ComputerPrimitives mock path (`unity/function_manager/primitives/runtime.py:643`) In a pure FunctionManager unit test neither codepath runs, so the wait blocks until its 300s timeout fires. Fix: pre-set the event in both `mock_session_details_windows` and `mock_session_details_ubuntu` fixtures (the two fixtures any remote_windows test ever uses). Capture the prior state so we clear it again on teardown if WE flipped it on, leaving the global event in its previous state for any sibling test that happens to share the process. This is the smallest surgical fix — alternatives like patching the wait helper or refactoring `_execute_python_function_on_remote_windows` to accept an injectable readiness signal would have a much wider blast radius for no real benefit on the test side.

…nDetails

…tacts

…nts, prompts, domains)

…y app

resolve_bot_token queried Orchestra with team_id and without include_token, so /slack/send could never fetch a token (400/503). Key on slack_team_id and request include_token. Add POST /slack/user-info (users.info) returning email + real/display name so the inbound pipeline can resolve an unknown Slack sender to a contact.

@app

…name On first inbound from an unmapped slack_user_id, look up the sender via the gateway /slack/user-info and match an existing contact by email then by real/display name (ambiguity-refuse), persisting slack_user_id so later messages match directly. Addressed (@app) senders with no match get a respondable contact; others keep the gated unknown-contact policy. Breaks the bootstrap deadlock that dropped every first Slack message.

…dempotent A channel @mention is delivered twice (app_mention + message) with distinct event_ids but the same client_msg_id; dedup now keys on the stable message id so the pair collapses to one processed event. _create_slack_contact pre-checks and, on a lost unique-slack_user_id race, resolves to the existing contact instead of dropping the message.

+        )
+    logger.info(
+        "sent Slack message to %s on team %s (ts=%s)",
+        channel_id,


+    logger.info(
+        "sent Slack message to %s on team %s (ts=%s)",
+        channel_id,
+        team_id,


…tact orphans Slack bot_user_id is workspace-scoped (on the install), so the per-assistant assistant_slack_bot_user_id is never set at bootstrap and the brain never gets the Slack send tools (always waits). Adopt the bot id from the inbound event when handling SlackMessageReceived/SlackChannelMessageReceived so the triggered brain run exposes send_slack_message/send_slack_channel_message. Also stop _create_slack_contact from minting a nameless, email-less contact that captures the slack_user_id and shadows the real contact; require a name or email and otherwise leave the sender for email/name resolution.

Surface the inbound message's event_ts as the effective thread_ts for Slack channel messages so a top-level @mention reply starts a thread instead of posting at the channel root. DMs keep prior behaviour and only thread when already threaded.

djl11 added 7 commits May 26, 2026 00:37

ci: trigger workflows after Actions outage recovery (no code change)

485a09c

djl11 temporarily deployed to unity-testing May 26, 2026 13:04 — with GitHub Actions Inactive

djl11 had a problem deploying to unity-testing May 26, 2026 13:21 — with GitHub Actions Failure

djl11 temporarily deployed to unity-testing May 26, 2026 13:21 — with GitHub Actions Inactive

djl11 had a problem deploying to unity-testing May 26, 2026 13:21 — with GitHub Actions Failure

djl11 temporarily deployed to unity-testing May 26, 2026 13:21 — with GitHub Actions Inactive

djl11 had a problem deploying to unity-testing May 26, 2026 13:21 — with GitHub Actions Failure

djl11 and others added 27 commits May 28, 2026 09:30

refactor(gateway): extract pubsub helpers into gateway/common

60ee166

feat(cm_types): add Slack medium and Slack routing metadata in Sessio…

4dea34f

…nDetails

feat(contact_manager): support Slack workspace/user identities on con…

d266b59

…tacts

feat(comms): plumb Slack through ConversationManager (primitives, eve…

05329dc

…nts, prompts, domains)

feat(gateway/slack): add Slack channel adapter and register in gatewa…

07e1862

…y app

github-advanced-security AI found potential problems May 29, 2026

View reviewed changes

Comment thread unity/gateway/channels/slack/views.py

)

logger.info(

"sent Slack message to %s on team %s (ts=%s)",

channel_id,

Comment thread unity/gateway/channels/slack/views.py

logger.info(

"sent Slack message to %s on team %s (ts=%s)",

channel_id,

team_id,

juliagsy added 2 commits May 29, 2026 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release: open-source-readiness pass + CVE clear + captcha primitive#283

Release: open-source-readiness pass + CVE clear + captcha primitive#283
djl11 wants to merge 117 commits into
mainfrom
staging

djl11 commented May 26, 2026

Uh oh!

github-advanced-security AI commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants