Khurram Khalil and Khaza Anuarul Hoque
Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
Corresponding author: hoquek@missouri.edu
VERMITHOR is a runtime orchestration framework for edge DNN inference under the coupled constraints of thermal safety, latency deadlines, and network volatility. It combines three components:
- Mesh-Exit routing: a backbone-agnostic dynamic-exit DNN interface exposing local early-exit, bottleneck offload, and full-backbone execution paths.
- Robust Conformal Prediction (RCP): online f-divergence-inflated conformal intervals for preserving network-feasibility coverage under distribution shift.
- Past-time Signal Temporal Logic (ptSTL) monitoring: an O(1) amortized runtime monitor that enforces thermal, network, and dwell-time specifications during inference.
The accompanying manuscript evaluates VERMITHOR on NVIDIA Jetson Thor across seven stress scenarios, seven orchestration baselines, and classification plus object-detection tasks. The public repository contains the core implementation and supplementary result tables used to document the paper's extended evaluations. It is not a full experiment-reproduction package for every benchmark, baseline, and hardware run.
VERMITHOR/
|-- mesh_exit/
| |-- super_node.py SuperNode, LocalExitHead, BottleneckEncoder
| `-- resnet_backbone.py MeshExitResNet factory functions
|-- conformal/
| |-- divergence_estimator.py Likelihood-ratio and f-divergence estimation
| `-- conformal_predictor.py Standard CP and Robust CP
|-- stl_monitor/
| |-- robustness.py Thermal and network robustness predicates
| |-- hybrid_dynamics.py Hybrid dynamics and coverage utilities
| |-- stl_monitor.py Monotonic queue and STL monitor
| |-- online_divergence.py Online divergence and inflation controller
| `-- runtime_controller.py Integrated five-mode controller
`-- data/
|-- cifar10.py CIFAR-10 loader notes
`-- cifar100.py CIFAR-100 loader notes
results/
`-- Vermithor supplementary results.pdf
The source modules depend on PyTorch and NumPy. A minimal environment can be created with:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
A quick import smoke test is:
PYTHONPATH=. python -c "from VERMITHOR.mesh_exit import mesh_exit_resnet18; from VERMITHOR.conformal import RobustConformalPredictor; from VERMITHOR.stl_monitor import IntegratedRuntimeController; print('VERMITHOR imports OK')"
The SuperNode is injected after configurable ResNet layers and exposes three propagation paths:
| Path | Module | Purpose |
|---|---|---|
| Local exit | LocalExitHead | On-device early prediction |
| Bottleneck | BottleneckEncoder | Compressed feature transmission for split execution |
| Continuation | identity path | Full-backbone local execution |
RobustConformalPredictor extends standard conformal prediction by inflating the calibrated interval with an online f-divergence estimate:
lambda(D_f) = min(lambda_max, lambda_base * (1 + beta * D_f))
The runtime use is bounded by lambda_max; when the cap binds, the paper reports the corresponding graceful-degradation behavior rather than claiming unconditional coverage.
The controller monitors thermal, network, and dwell-time predicates over the hybrid state. Physical thermal dynamics are handled deterministically through Lipschitz-bounded robustness, while network predicates are evaluated against RCP intervals.
The integrated controller exposes five modes:
| Mode | Meaning |
|---|---|
| OFFLOAD | execute a bottleneck/offload path when network margins are safe |
| FULL_LOCAL | execute the full model locally |
| LOCAL_EXIT | use the best feasible early exit |
| THROTTLE | reduce local thermal pressure under warning conditions |
| EMERGENCY | reuse the last valid prediction under imminent violation |
The reported experiments use BW_min = 5 Mbps, RTT_max = 100 ms, T_lim = 85 C, tau_min = 20 control steps at 10 Hz, W = 50 steps, and lambda_max = 3.0.
The current supplementary PDF in results/ is synchronized with the manuscript and includes:
- RQ1 safety violations including the DVFS Governor baseline.
- RQ2 hardware efficiency with the same end-to-end mJ/img accounting as the paper.
- RQ3 coverage baselines: VERMITHOR-RCP, Vanilla CP, Weighted CP, MC-Dropout, and No-CP quantile.
- RQ4 dwell/mode-stability metrics.
- RQ5 component ablations and mode-occupancy/cap-binding analysis.
- The repository provides core implementation code and supplementary result tables. It does not include all training scripts, baseline reimplementations, VOC/SSD object-detection code, or hardware log files.
- CIFAR loader files are lightweight notes/placeholders for the dataset configuration used in experiments.
- The object-detection transfer experiment is reported in the paper; the public repo does not package the SSD-300 training pipeline.