Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,8 @@ repos:
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.10.0
hooks:
- id: mypy
args: [ --config-file=pyproject.toml ]
30 changes: 26 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -393,6 +393,24 @@ It writes a hashed packet with `policy-diff.json` and
`promotion-decision.json`, then exits `0` for `PROMOTE`, `1` for `HOLD`, and
`4` for `ROLLBACK`.

For the operator-facing release decision, use the composed assurance command.
It consumes an existing proof packet, optionally adds realtime/action-execution
and shadow evidence, and writes one `release-assurance.json` / Markdown report:

```bash
tether release assure /tmp/tether-deploy-proof \
--profile warehouse-safe \
--control-hz 20 \
--execution-cert \
--shadow-trace ./traces/shadow.jsonl.gz \
--output-dir /tmp/tether-release-assurance
```

The top-level decision is `PROMOTE`, `HOLD`, or `ROLLBACK`. The report includes
component decisions, risk signals such as policy action delta, latency
regression, stale action window, chunk-boundary delta, velocity discontinuity,
and open evidence gaps for production rollout.

For serving-specific deployment confidence, turn the same proof packet into a
realtime certificate. This answers whether the measured `/act` path fits the
robot control loop on the target hardware/cell:
Expand Down Expand Up @@ -532,14 +550,14 @@ Filter dimensions: `--since` (`7d` / `24h` / `30m`), `--task` (case-insensitive
| Ada Lovelace (RTX 40-series, L4) | sm_8.9 | ✅ Supported | |
| Hopper (H100, H200) | sm_9.0 | ✅ Supported | |
| Jetson Orin (Orin Nano / NX / AGX) | sm_8.7 | ✅ Supported | JetPack 6.x |
| Jetson Thor | sm_10.x | ⚠️ Untested | Should work — same Blackwell silicon as desktop, but ORT-bundled CUDA EP needs Blackwell support (see below) |
| **Blackwell desktop (RTX 5090, RTX PRO 6000, B200, GB200)** | **sm_10.0** | **❌ Not yet supported** | ORT's bundled cuBLAS/cuDNN don't ship sm_100 kernels. Server segfaults at `InferenceSession` init. **Workaround:** use `tether chat` (no GPU needed), or `/act` testing on Modal cloud or non-Blackwell GPU until ORT updates ship. Tracking: [microsoft/onnxruntime#blackwell](https://github.com/microsoft/onnxruntime/issues) |
| Jetson Thor | sm_10.x | ⚠️ Untested | Same Blackwell silicon as desktop; ORT ≥1.25.1 ships those kernels. Untested only for lack of hardware. |
| **Blackwell desktop (RTX 5090, RTX PRO 6000, B200, GB200)** | **sm_10.0 / 12.0** | **⚠️ Supported (smoke-validate)** | The pinned `onnxruntime-gpu>=1.25.1` ships Blackwell sm_120 kernels, so the earlier `InferenceSession`-init segfault is resolved. Smoke-validation recommended before declaring fully production-ready (open ORT threading issue #27621). On ORT < 1.25.1 the server still segfaults — `tether doctor` and the `tether go` Blackwell guard detect this and print the upgrade path. |
| Older NVIDIA (Turing RTX 20, GTX 16) | sm_7.5 | ⚠️ Best-effort | Should work but not in CI matrix |
| Pre-Tensor-Core (Maxwell Jetson Nano 4GB, GTX 9-series) | sm_5.x | ❌ Not supported | NVIDIA EOL'd this hardware at JetPack 4.6 (Python 3.6) — too old for modern ML stacks regardless. The bootstrap installer auto-detects and bails fast with redirect instructions. |

**For Blackwell users right now:** the bootstrap installer accepts your hardware and the package installs cleanly, but `tether go` will segfault at server startup. The real fix requires ORT to ship Blackwell-aware bundled binaries (no published timeline). Workarounds: chat-only mode (no GPU needed), `tether doctor`, `tether models list` all work fine. `/act` and TRT-engine inference need a non-Blackwell GPU temporarily.
**For Blackwell users:** the default install pins `onnxruntime-gpu>=1.25.1`, which ships Blackwell sm_120 kernels — so `tether go` serves on RTX 50-series / B200 / GB200 hardware. The earlier `InferenceSession`-init segfault only occurs on ORT < 1.25.1; `tether doctor` and the `tether go` Blackwell guard detect that and print the upgrade path. Smoke-validate your model on-device before production (open ORT threading issue #27621).

A Blackwell-specific runtime path via TensorRT-LLM (which supports sm_100) is tracked upstream.
A native TensorRT-LLM path (sm_100 / sm_120) is tracked upstream as an additional Blackwell runtime.

## Composable runtime wedges

Expand Down Expand Up @@ -590,6 +608,10 @@ Full ledger: [reflex_context/measured_numbers.md](reflex_context/measured_number

**Latency numbers are intentionally not in the README yet** — earlier TRT FP16 tables were measured on a now-abandoned decomposed-ONNX path. `tether bench <export_dir>` reproduces on any hardware.

<!-- BEGIN:jetson-latency-table -->
<!-- Auto-populated by scripts/publish_jetson_latency.py from `tether bench realtime` certificates. Empty until certs are published. -->
<!-- END:jetson-latency-table -->

Reproduce on your own GPU with one command:

```bash
Expand Down
29 changes: 29 additions & 0 deletions docs/cli_reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,35 @@ the selected `--fail-on` gate tripped.

---

## `tether release`

Customer-facing release assurance workflow. It consumes an existing proof packet
and returns the decision an operator needs before a robot policy update reaches a
fleet: `PROMOTE`, `HOLD`, or `ROLLBACK`.

```bash
tether release assure ./tether-deploy-proof \
--profile warehouse-safe \
--control-hz 20 \
--execution-cert \
--shadow-trace ./traces/shadow.jsonl.gz \
--output-dir ./release-assurance \
--json
```

`release assure` composes lower-level evidence instead of replacing it:
deployment proof, promotion gates, optional realtime serving certificate,
optional action-execution certificate, and optional shadow rollout gate. The
JSON/Markdown report includes component decisions, blocking checks, risk signals
such as stale action windows and chunk-boundary jumps, and open evidence gaps.

Artifacts written with `--output-dir`: `release-assurance.json`,
`release-assurance.md`, and `MANIFEST.json`. Exit codes: `0` means `PROMOTE`,
`1` means `HOLD`, `4` means `ROLLBACK`, and `2` means the packet or arguments
could not be loaded.

---

## `tether rollout`

Self-serve rollout decision workflow for candidate policies that were mirrored
Expand Down
21 changes: 21 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,27 @@ artifacts = ["*.cpp", "*.cu", "*.txt", "*.json"]
target-version = "py310"
line-length = 100

[tool.mypy]
python_version = "3.10"
mypy_path = "src"
ignore_missing_imports = true
follow_imports = "silent"
check_untyped_defs = false
warn_unused_configs = true
exclude = [
"src/tether/models/third_party/",
]

[[tool.mypy.overrides]]
module = "tether.exporters.monolithic"
disable_error_code = [
"assignment",
"attr-defined",
"method-assign",
"misc",
"no-redef",
]

[tool.pytest.ini_options]
testpaths = ["tests"]
# pytest-asyncio v1.x requires explicit mode declaration. Tests use
Expand Down
89 changes: 89 additions & 0 deletions scripts/publish_jetson_latency.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
#!/usr/bin/env python3
"""Publish realtime-serving certificates into a latency table (standalone CLI).

Thin wrapper over :func:`tether.realtime_cert_publish.publish` so the publish
flow is runnable standalone (e.g. in CI) without importing the full tether CLI.
Equivalent to the ``tether publish-latency`` subcommand.

Usage:
python scripts/publish_jetson_latency.py /tmp/orin-smolvla-cert [more-certs...]
python scripts/publish_jetson_latency.py certs/*.json --no-readme
python scripts/publish_jetson_latency.py CERT_DIR --out path/to/results.md

Each positional arg may be a cert JSON file or a directory containing
``realtime-serving-cert.json``. Pure stdlib + ``tether.realtime_cert``; no GPU
needed — runs anywhere the package is importable.
"""

from __future__ import annotations

import argparse
import sys
from pathlib import Path

from tether.realtime_cert_publish import (
DEFAULT_RESULTS_DOC,
README_TABLE_BEGIN,
README_TABLE_END,
CertificateLoadError,
publish,
)


def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument(
"certs",
nargs="+",
help="cert JSON files, or dirs containing realtime-serving-cert.json",
)
parser.add_argument(
"--out",
type=Path,
default=DEFAULT_RESULTS_DOC,
help=f"results doc path (default: {DEFAULT_RESULTS_DOC})",
)
parser.add_argument(
"--readme",
type=Path,
default=Path("README.md"),
help="README to inject the table into (default: README.md)",
)
parser.add_argument(
"--no-readme", action="store_true", help="don't touch the README"
)
parser.add_argument(
"--title", default="Realtime serving latency", help="table heading"
)
args = parser.parse_args(argv)

try:
result = publish(
args.certs,
out=args.out,
readme=None if args.no_readme else args.readme,
title=args.title,
)
except CertificateLoadError as exc:
print(f"error: {exc}", file=sys.stderr)
return 2

print(result["table"])
print(f"wrote {result['out']} ({result['count']} certificate(s))")
if not args.no_readme:
if result["readme_updated"]:
print(f"injected table into {args.readme}")
else:
print(
f"note: markers not found in {args.readme}; skipped injection "
f"(add {README_TABLE_BEGIN} / {README_TABLE_END})",
file=sys.stderr,
)
return 0


if __name__ == "__main__":
raise SystemExit(main())
18 changes: 18 additions & 0 deletions src/tether/chat/executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,23 @@ def _build_promote(p: dict[str, Any]) -> list[str]:
return args


def _build_release_assurance(p: dict[str, Any]) -> list[str]:
args = ["release", "assure", str(p["packet"])]
_flag(args, "profile", p.get("profile"))
if p.get("candidate_active") is True:
args.append("--candidate-active")
_flag(args, "control-hz", p.get("control_hz"))
_flag(args, "target", p.get("target"))
if p.get("execution_cert") is True:
args.append("--execution-cert")
_flag(args, "shadow-trace", p.get("shadow_trace"))
_flag(args, "min-compared", p.get("min_compared"))
_flag(args, "output-dir", p.get("output_dir"))
if p.get("json") is True:
args.append("--json")
return args


def _build_realtime_cert(p: dict[str, Any]) -> list[str]:
args = ["bench", "realtime", str(p["proof"])]
_flag(args, "target", p.get("target"))
Expand Down Expand Up @@ -255,6 +272,7 @@ def _build_replay(p: dict[str, Any]) -> list[str]:
"prove_deployment": _build_prove,
"diff_policies": _build_policy_diff,
"decide_promotion": _build_promote,
"assure_release": _build_release_assurance,
"certify_realtime_serving": _build_realtime_cert,
"show_promotion_profile": _build_show_profile,
"benchmark": _build_bench,
Expand Down
2 changes: 1 addition & 1 deletion src/tether/chat/loop.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
# needs explicit instruction to copy specific values rather than summarize them.
SYSTEM_PROMPT = """You are the Tether assistant. Tether is a deployment-confidence CLI for vision-language-action (VLA) robot policies. The main product question is whether a policy has enough evidence to promote, block, or roll back.

You have tools that wrap the `tether` CLI. Use them to act on the user's behalf instead of describing commands. Pick the smallest tool that answers the question. Don't ask for confirmation before read-only tools (list_models, doctor, list_traces, list_promotion_profiles, show_promotion_profile, decide_promotion, certify_realtime_serving). Use list_promotion_profiles or show_promotion_profile when the user asks which promotion profile to use or what a profile checks. Use prove_deployment when the user asks whether an export is safe, ready, deployable, production-ready, suitable for a robot, or needs a proof packet; include policy_diff_* parameters when the user provides candidate/shadow traces for rollout evidence. It is an offline/local proof path and does not actuate hardware. Use prove_realtime_deployment when the user gives an export path and asks whether it can meet a realtime, 20 Hz, 50 Hz, p95, jitter, deadline, or control-loop budget; include control_hz when the user names a control rate, and include execution_cert when the user asks about stale chunks, adaptive action chunking, chunk-boundary smoothness, execution horizon, or action continuity. Use certify_realtime_serving only when the user gives an existing proof packet and asks whether that proof can meet a realtime/control-loop budget. Use decide_promotion when the user asks whether an existing proof packet should promote, block, or roll back. Use diff_policies when the user asks for only a standalone candidate/shadow policy diff or whether a policy is safe to promote. For destructive, hardware-actuating, or long-running tools (export_model, serve_model against a real robot transport, distill, finetune, evaluate), confirm intent first if the user's request is ambiguous about scope.
You have tools that wrap the `tether` CLI. Use them to act on the user's behalf instead of describing commands. Pick the smallest tool that answers the question. Don't ask for confirmation before read-only tools (list_models, doctor, list_traces, list_promotion_profiles, show_promotion_profile, decide_promotion, certify_realtime_serving, assure_release). Use list_promotion_profiles or show_promotion_profile when the user asks which promotion profile to use or what a profile checks. Use prove_deployment when the user asks whether an export is safe, ready, deployable, production-ready, suitable for a robot, or needs a proof packet; include policy_diff_* parameters when the user provides candidate/shadow traces for rollout evidence. It is an offline/local proof path and does not actuate hardware. Use prove_realtime_deployment when the user gives an export path and asks whether it can meet a realtime, 20 Hz, 50 Hz, p95, jitter, deadline, or control-loop budget; include control_hz when the user names a control rate, and include execution_cert when the user asks about stale chunks, adaptive action chunking, chunk-boundary smoothness, execution horizon, or action continuity. Use assure_release when the user gives an existing proof packet and asks whether a robot policy update/release should promote, hold, or roll back, especially if they also mention realtime, shadow rollout, action chunk continuity, or fleet release readiness. Use certify_realtime_serving only when the user asks specifically for a realtime/control-loop certificate from an existing proof packet. Use decide_promotion when the user asks only for the lower-level proof-packet promotion gate. Use diff_policies when the user asks for only a standalone candidate/shadow policy diff or whether a policy is safe to promote. For destructive, hardware-actuating, or long-running tools (export_model, serve_model against a real robot transport, distill, finetune, evaluate), confirm intent first if the user's request is ambiguous about scope.

When a tool returns a non-zero exit code, read its stderr, explain what went wrong in one sentence, and suggest a concrete next action. Don't fabricate tool output.

Expand Down
19 changes: 19 additions & 0 deletions src/tether/chat/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,25 @@ def _tool(name: str, description: str, parameters: dict[str, Any]) -> dict[str,
"required": ["packet"],
},
),
_tool(
"assure_release",
"Build one release assurance report from an existing proof packet. Use this when the user asks whether a robot policy update/release should promote, hold, or roll back and may also care about realtime serving, action chunk continuity, or shadow rollout evidence.",
{
"properties": {
"packet": {"type": "string", "description": "Deployment proof packet directory, or deployment-proof.json path."},
"profile": {"type": "string", "description": "Optional built-in promotion profile name or JSON/YAML path."},
"candidate_active": {"type": "boolean", "description": "Return ROLLBACK instead of HOLD when gates fail for an active rollout."},
"control_hz": {"type": "number", "description": "Robot control rate. Setting this includes realtime evidence."},
"target": {"type": "string", "description": "Hardware/cell label, e.g. agx-orin-cell-a."},
"execution_cert": {"type": "boolean", "description": "Also certify stale action windows, chunk-boundary continuity, velocity discontinuity, and runtime attribution."},
"shadow_trace": {"type": "string", "description": "Optional shadow trace from `tether serve --shadow-policy --record`."},
"min_compared": {"type": "integer", "description": "Minimum compared shadow requests required before promotion. Default 1."},
"output_dir": {"type": "string", "description": "Directory for release-assurance artifacts."},
"json": {"type": "boolean", "description": "Emit JSON instead of human output."},
},
"required": ["packet"],
},
),
_tool(
"prove_realtime_deployment",
"Run a deterministic realtime deployment proof chain for an export: `tether prove` into a known proof directory, then `tether bench realtime` against that same packet. Use this when the user gives an export path and asks whether it can run at 20 Hz, 50 Hz, realtime, or inside a robot control-loop budget.",
Expand Down
1 change: 1 addition & 0 deletions src/tether/chat/welcome.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ def tools_listing() -> str:
"certify_realtime_serving": "Deploy",
"diff_policies": "Deploy",
"decide_promotion": "Deploy",
"assure_release": "Deploy",
"list_promotion_profiles": "Deploy",
"show_promotion_profile": "Deploy",
"list_models": "Models",
Expand Down
Loading
Loading