async experiment#1
Draft
alex-remedios-aisi wants to merge 17 commits intomainfrom
Draft
Conversation
- All harness markers (Queued / Running / Done / Error / Skipped / timeout) now carry an `[nb mcp]` prefix and render as stderr-stream so they're visually distinct from the cell's own stdout. - Running banner stays pinned at the top of cell outputs throughout streaming, so the agent/user can tell a cell is live even when its output cadence is slow. - Queued, Running, and Done/Error footers include a wall-clock timestamp with timezone (e.g. `10:18:27 UTC`) — unambiguous across remote servers — plus duration on completion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Notebooks written outside the MCP (hand-edited, older nbformat) can have cells without an `id` field. `_flush_outputs_to_disk` is keyed on cell id and returned silently when it couldn't find the cell, so the Queued marker (written by index) would appear and then nothing else — no Running banner, no kernel output, no Done footer — even though the kernel was happily running. exec_cell_to_disk now assigns a uuid-based id and persists it before the first flush. As a defense, _flush_outputs_to_disk now warns to stderr rather than silently dropping writes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Writes to ./.nb_mcp.log in the MCP server's CWD. Covers: - server start - job submit + per-cell running/done/error - kernel start/ready/stop + interrupt requests - dropped-output warnings (replaces the stderr print in _flush_outputs_to_disk) - unhandled exceptions in the worker thread (with traceback) Level via NB_MCP_LOG_LEVEL, path via NB_MCP_LOG_PATH. Falls back to stderr if the log file can't be opened. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`execute_code` was passing its timeout straight into `client.get_iopub_msg(timeout=...)` — that's the idle gap between messages, not the total runtime. A chatty cell that prints every second kept resetting the window, so a 10-minute budget could run forever. In the field this showed up as `timeout=600` still executing at 14 minutes. The timeout is now a hard deadline anchored at the start. We poll iopub in 1s slices and re-check the deadline each time. When it fires we call an `interrupt_fn` (wired to `kernels.interrupt`) so the kernel actually stops — otherwise the busy kernel would block every subsequent cell in the job. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Under heavy IOPub traffic (inspect_ai progress bars, per-step training
logs, display_data every ~20s) a single bad ZMQ frame desynchronises
the client's message parser — every subsequent read raises
\`ValueError("'<IDS|MSG>' is not in list")\`. Before this change, that
exception would bubble up out of the job worker thread, crashing the
in-flight exec and leaving the agent with no way to reattach. The
kernel itself is fine — GPU, training, file writes all still alive.
Fix:
- exec_runner.execute_code now catches unexpected iopub exceptions,
logs them, and invokes a recover_fn up to 3 times. recover_fn
rebuilds the client's ZMQ channels against the same KernelManager
(kernels.reset_client) — kernel process untouched. We re-subscribe,
keep filtering by the original msg_id, and continue.
- If recovery fails or the cap is hit, we append a clear
\`[nb mcp] iopub desync\` marker telling the agent to use
exec_status / read_cell and return cleanly instead of crashing.
- Wired recover_fn through exec_cell_to_disk, jobs, and run_scratch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The blocking `wait` MCP tool held up the whole tool-call slot for up
to its timeout, blocking the agent from interacting with the user or
doing any other work while a long cell ran. Claude Code's Monitor
tool streams stdout lines as event notifications while the agent
keeps working, which is the better UX for long-running cells.
Changes:
- New subcommand `nb watch --job <id> [--path <nb>]`. Tails
`.nb_mcp.log` (or `NB_MCP_LOG_PATH`), filters to the target job,
emits one formatted line per interesting event (submit, cell
start/done/error, kernel lifecycle, final complete/error), exits
when the job ends. Line-buffered stdout.
- Non-scratch exec tools now append a ready-to-use Monitor hint to
their response:
Monitor(command='uv run nb watch --job abc123 --path nb.ipynb')
- `wait` MCP tool removed, along with `jobs.wait_for_job` helper.
- New test `test_watch_cli.py` covers happy-path event filtering
and the startup-timeout path.
- CLAUDE.md updated with the new tool + CLI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every cell going through a Monitor hand-off was overkill for quick work. Non-scratch exec tools now wait up to \`block_for\` seconds (default 10) for the background job to finish: - completes in time → return the full status inline, no Monitor needed - still running → return the Monitor-ready command as before block_for=0 is fire-and-forget. All four exec tools take the new parameter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the log file only had lifecycle events (job submit, cell running, done/error). A chatty cell printing for ten minutes would produce zero notifications between "running" and "done" — Monitor users couldn't see the job was healthy without also \`read_cell\`-ing the notebook. Now each cell has its own rate-limited progress emitter. When the cell produces new stream output, we wait at least NB_MCP_PROGRESS_INTERVAL_SEC (default 1.0s) before emitting one INFO line: \`job X cell [N] out: <last line>\` (200 char truncation). Matches the nb watch filter by job id, so Monitor delivers each line as an event. Set the env var to 0 to disable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
One-per-second was too chatty for realistic workloads (training loops, evals). Ten seconds is a saner default — still plenty of mid-run notifications but 10x less noise in the log file and in Monitor. Override via NB_MCP_PROGRESS_INTERVAL_SEC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before: a cell with zero stream output (time.sleep, GPU compute, blocking I/O) produced no log lines between the initial "cell running" and final "cell done" events. Monitor stayed silent for the whole run — indistinguishable from a hung kernel from the agent's perspective. Now: a per-cell heartbeat thread logs \`job X cell [N] still running (Ns elapsed)\` at the progress interval, but only when the cell is genuinely silent. The heartbeat skips if: - the progress emitter logged within the interval (chatty cell), or - the kernel produced output within the interval (output arrived but was throttled from being logged) So a chatty cell's real output always wins; the heartbeat only surfaces for actually-silent work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-notebook exec_status is fine when you already know which file you care about. But when debugging "is anything running?" or "do I have stale kernels?", the agent had no way to get a global view without checking each notebook. New MCP tool: \`status()\` (no args). Lists every registered kernel (path, alive, pid) and every active/recent job. Backed by two new accessors — kernels.list_all and jobs.list_all_active / list_all_finished — which read the in-memory state of the running MCP server. New CLI: \`nb status\`. Useful when the MCP is down or when debugging outside Claude Code. Reads .nb_mcp.log to reconstruct job history and shells out to pgrep for live ipykernel pids. Not as accurate as the MCP tool (log-based, not in-memory) but always available. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 covers the whole session: the pivot from blocking wait to Monitor-driven \`nb watch\`, the marker/log/progress/heartbeat work that makes tailing worthwhile, and the hardening fixes that fell out (wall-clock timeout, iopub recovery, cell-id backfill). 16 frames the open question of reattaching to kernels across MCP restarts — what's required, what needs deciding, and a sketch of the shape without committing to a direction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Disk-quota exhaustion and other write-path errors can cause cells to fail without the agent seeing a clear error — propagation is there but the logging is weak. Captured in journal 16 as a related hardening pass rather than creating a separate entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a training run's kernel gets OOM-killed mid-execution, the client's iopub reads fail. The existing recovery code tried to reset channels, which also fails (no kernel to talk to), and emitted a generic "iopub desync — channel recovery failed" marker. The job then continued optimistically against the dead kernel, with subsequent cells producing more opaque failures. Now: - kernels.reset_client raises a new KernelDeadError when the underlying km.is_alive() is False, or when wait_for_ready times out after the rebuild. - exec_runner.execute_code catches it distinctly. Emits log.error "kernel died during cell execution: …" and an nbformat error output (ename=NbMcpKernelDied) on the cell, so the job is marked ERROR and subsequent cells are skipped. - Inline cell marker now says "kernel died mid-execution" with operator guidance (run status(), expect fresh kernel next exec) instead of the misleading "iopub desync". Agents monitoring the log via \`nb watch\` now see a clear root cause instead of a vague channel message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.