Skip to content

dependable-cps/VERMITHOR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VERMITHOR: Formally Safe Runtime Orchestration for Edge DNN Inference

Khurram Khalil and Khaza Anuarul Hoque

Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA

Corresponding author: hoquek@missouri.edu


Overview

VERMITHOR is a runtime orchestration framework for edge DNN inference under the coupled constraints of thermal safety, latency deadlines, and network volatility. It combines three components:

  • Mesh-Exit routing: a backbone-agnostic dynamic-exit DNN interface exposing local early-exit, bottleneck offload, and full-backbone execution paths.
  • Robust Conformal Prediction (RCP): online f-divergence-inflated conformal intervals for preserving network-feasibility coverage under distribution shift.
  • Past-time Signal Temporal Logic (ptSTL) monitoring: an O(1) amortized runtime monitor that enforces thermal, network, and dwell-time specifications during inference.

The accompanying manuscript evaluates VERMITHOR on NVIDIA Jetson Thor across seven stress scenarios, seven orchestration baselines, and classification plus object-detection tasks. The public repository contains the core implementation and supplementary result tables used to document the paper's extended evaluations. It is not a full experiment-reproduction package for every benchmark, baseline, and hardware run.

Repository Contents

VERMITHOR/
|-- mesh_exit/
|   |-- super_node.py           SuperNode, LocalExitHead, BottleneckEncoder
|   `-- resnet_backbone.py      MeshExitResNet factory functions
|-- conformal/
|   |-- divergence_estimator.py Likelihood-ratio and f-divergence estimation
|   `-- conformal_predictor.py  Standard CP and Robust CP
|-- stl_monitor/
|   |-- robustness.py           Thermal and network robustness predicates
|   |-- hybrid_dynamics.py      Hybrid dynamics and coverage utilities
|   |-- stl_monitor.py          Monotonic queue and STL monitor
|   |-- online_divergence.py    Online divergence and inflation controller
|   `-- runtime_controller.py   Integrated five-mode controller
`-- data/
    |-- cifar10.py              CIFAR-10 loader notes
    `-- cifar100.py             CIFAR-100 loader notes

results/
`-- Vermithor supplementary results.pdf

Installation

The source modules depend on PyTorch and NumPy. A minimal environment can be created with:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

A quick import smoke test is:

PYTHONPATH=. python -c "from VERMITHOR.mesh_exit import mesh_exit_resnet18; from VERMITHOR.conformal import RobustConformalPredictor; from VERMITHOR.stl_monitor import IntegratedRuntimeController; print('VERMITHOR imports OK')"

Component Details

Mesh-Exit Architecture

The SuperNode is injected after configurable ResNet layers and exposes three propagation paths:

Path Module Purpose
Local exit LocalExitHead On-device early prediction
Bottleneck BottleneckEncoder Compressed feature transmission for split execution
Continuation identity path Full-backbone local execution

Robust Conformal Prediction

RobustConformalPredictor extends standard conformal prediction by inflating the calibrated interval with an online f-divergence estimate:

lambda(D_f) = min(lambda_max, lambda_base * (1 + beta * D_f))

The runtime use is bounded by lambda_max; when the cap binds, the paper reports the corresponding graceful-degradation behavior rather than claiming unconditional coverage.

STL Runtime Monitor

The controller monitors thermal, network, and dwell-time predicates over the hybrid state. Physical thermal dynamics are handled deterministically through Lipschitz-bounded robustness, while network predicates are evaluated against RCP intervals.

The integrated controller exposes five modes:

Mode Meaning
OFFLOAD execute a bottleneck/offload path when network margins are safe
FULL_LOCAL execute the full model locally
LOCAL_EXIT use the best feasible early exit
THROTTLE reduce local thermal pressure under warning conditions
EMERGENCY reuse the last valid prediction under imminent violation

The reported experiments use BW_min = 5 Mbps, RTT_max = 100 ms, T_lim = 85 C, tau_min = 20 control steps at 10 Hz, W = 50 steps, and lambda_max = 3.0.

Supplementary Results

The current supplementary PDF in results/ is synchronized with the manuscript and includes:

  • RQ1 safety violations including the DVFS Governor baseline.
  • RQ2 hardware efficiency with the same end-to-end mJ/img accounting as the paper.
  • RQ3 coverage baselines: VERMITHOR-RCP, Vanilla CP, Weighted CP, MC-Dropout, and No-CP quantile.
  • RQ4 dwell/mode-stability metrics.
  • RQ5 component ablations and mode-occupancy/cap-binding analysis.

Scope Notes

  • The repository provides core implementation code and supplementary result tables. It does not include all training scripts, baseline reimplementations, VOC/SSD object-detection code, or hardware log files.
  • CIFAR loader files are lightweight notes/placeholders for the dataset configuration used in experiments.
  • The object-detection transfer experiment is reported in the paper; the public repo does not package the SSD-300 training pipeline.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages