| title | MultiObjectTracking RAG App |
|---|---|
| emoji | 🎥 |
| colorFrom | blue |
| colorTo | indigo |
| sdk | gradio |
| sdk_version | 5.49.1 |
| python_version | 3.10 |
| app_file | app.py |
| pinned | false |
Upload a video, run RT-DETR + OC-SORT tracking, generate tracking artifacts, and ask natural-language questions about the video.
This repository is a modular video analytics project that combines:
- object detection
- multi-object tracking
- MOTChallenge-format export and evaluation
- structured track/event summarization
- retrieval-augmented querying over tracking results
- a Gradio app for upload, processing, and question answering
The codebase started with a YOLO/DeepSORT-style pipeline and now also contains a newer RT-DETR + OC-SORT + RAG workflow. In the current workspace, the RT-DETR + OC-SORT path is the most complete and safest path to document as the primary quick start.
At a high level, the project processes a video in stages:
- Read frames from an input video.
- Detect objects in each frame with an Ultralytics model.
- Associate detections over time with a tracker.
- Draw tracked bounding boxes, IDs, and trails on an output video.
- Export frame-level tracking results to:
- an annotated
.mp4 - a frame-level
.tracks.json - a MOT-format
.mot.txt
- an annotated
- Consolidate frame snapshots into per-track histories.
- Extract structured events and per-track facts.
- Convert those facts into retrieval chunks.
- Build a FAISS index over the chunks.
- Answer grounded natural-language questions using precomputed video facts plus retrieval evidence.
- Modular detector and tracker factories under
src/detectors/andsrc/trackers/ - End-to-end pipeline orchestration in
src/pipeline/orchestrator.py - Annotated video generation with per-track labels and motion trails
- MOT-format export for benchmark-style evaluation
- MOT evaluation script using
motmetrics - Track consolidation and event extraction for downstream analytics
- FAISS-based retrieval over tracking-derived text chunks
- Gradio interface for uploaded-video processing and QA
The tracking path is driven by:
scripts/run_demo.pysrc/config.pysrc/detectors/factory.pysrc/trackers/factory.pysrc/pipeline/orchestrator.pysrc/io/video_reader.pysrc/io/video_writer.pysrc/io/mot_exporter.pysrc/annotator.py
The orchestrator loads the detector, resets the tracker, reads frames, runs detection and tracking, annotates frames, writes the output video, and optionally saves:
*.tracks.json*.mot.txt
The structured analytics path is driven by:
scripts/build_tracks.pyscripts/extract_appearance.pyscripts/extract_events.pyscripts/build_chunks.pyscripts/build_video_facts.pyscripts/build_index.pyscripts/query_rag.py
Those scripts use the modules in src/rag/ to transform tracking output into:
- built track histories
- extracted events
- track facts
- retrieval chunks
- video-level summary facts
- a FAISS index and metadata file
The core processing pipeline runs in this order:
scripts/run_demo.py- runs the tracking pipeline, writes
*.tracks.jsonand*.mot.txt
- runs the tracking pipeline, writes
scripts/build_tracks.py- consolidates frame-level tracks into built track histories
scripts/extract_appearance.py- extracts clothing/appearance metadata for tracked subjects
scripts/extract_events.py- extracts structured events and per-track facts
scripts/build_video_facts.py(optional: omit to skip to chunks-first path)- builds global video facts from tracks and events (chunks are optional for richer statistics)
scripts/build_chunks.py- converts facts and events into retrieval chunks
scripts/build_index.py- creates a FAISS index and metadata for retrieval
scripts/query_rag.py- runs question-answering queries against the built artifacts
Note: Steps 5 and 6 are interchangeable. You can:
- Build video facts first (minimal), then chunks, or
- Build chunks first, then pass chunks to video facts for enriched statistics (current example order)
If you want appearance-aware color queries, run scripts/extract_appearance.py before scripts/build_chunks.py, then include the appearance-enhanced tracks file when chunking.
app.pyis the Gradio front end for upload-based processing, artifact downloads, and QA.api.pyprovides the same underlying flow as a FastAPI server with:POST /run-pipelinePOST /query
- Both use the same backend logic to run the pipeline and answer queries.
.
├── api.py
├── app.py
├── configs/
│ ├── default.yaml
│ ├── rtdetr_ocsort.yaml
│ ├── rtdetr_ocsort_fast.yaml
│ ├── rtdetr_ocsort_x.yaml
│ └── ultralytics_deepsort.yaml
├── evaluation/
│ └── evaluate_mot.py
├── scripts/
│ ├── run_demo.py
│ ├── download_models.py
│ ├── build_tracks.py
│ ├── extract_appearance.py
│ ├── extract_events.py
│ ├── build_chunks.py
│ ├── build_video_facts.py
│ ├── build_index.py
│ └── query_rag.py
├── src/
│ ├── detectors/
│ ├── trackers/
│ ├── rag/
│ ├── io/
│ ├── captioning/
│ ├── pipeline/
│ ├── utils/
│ ├── annotator.py
│ ├── config.py
│ └── schemas.py
├── tests/
├── demo/
├── MOT17-04-FRCNN/
├── MOT17-09-FRCNN/
├── MOT17-11-FRCNN/
├── Makefile
├── pyproject.toml
└── requirements.txt
configs/rtdetr_ocsort.yamlPrimary config for RT-DETR + OC-SORT.configs/rtdetr_ocsort_fast.yamlFaster, shorter run variant.configs/rtdetr_ocsort_x.yamlHigher-capacity RT-DETR-X variant.configs/default.yamlOlder config intended for a DeepSORT-based path.configs/ultralytics_deepsort.yamlAnother DeepSORT-oriented config.
demo/sample_videos/Local input videos for command-line runs.demo/sample_outputs/Generated videos and JSON/TXT artifacts.demo/hf_runs/Per-run Gradio app output bundles from the earlier UI flow.demo/api_runs/Per-run backend artifacts for the FastAPI/Gradio pipeline.
The workspace includes local MOT-style sequence folders such as:
MOT17-09-FRCNN/MOT17-11-FRCNN/MOT17-04-FRCNN/
These are treated as local data, not portable source assets. The .gitignore excludes model weights, datasets, videos, generated outputs, and evaluation results, so anyone cloning this repo should expect to add those assets locally.
Depending on the workflow, the project can generate:
output.mp4Annotated tracking videooutput.tracks.jsonFrame-level track snapshotsoutput.mot.txtMOTChallenge-style prediction fileoutput.built_tracks.jsonConsolidated track historiesoutput.events.jsonExtracted event recordsoutput.track_facts.jsonPer-track analytics factsoutput.chunks.jsonRetrieval chunksoutput.video_facts.jsonGlobal summary factsoutput.faiss.indexVector index for retrievaloutput.index_meta.jsonChunk metadata used during retrieval
The detector factory currently supports:
ultralyticsyolov9placeholder/stub
In practice, the working configs use the Ultralytics integration with RT-DETR weights such as rtdetr-l.pt and rtdetr-x.pt.
The tracker factory currently supports:
ocsortdeepsort
In this workspace, the ocsort path is the reliable documented path. The current DeepSORT configs do not include tracker.max_iou_distance, which the factory expects, so the RT-DETR + OC-SORT configs are the recommended choice for running the project as-is.
After tracking finishes, the RAG pipeline adds structure in several stages:
scripts/build_tracks.py groups frame-level snapshots by track_id and computes:
- first/last frame
- duration
- average confidence
- displacement
- path length
- direction
- estimated entry/exit side
- short-lived and fragmented flags
scripts/extract_events.py emits events such as:
enterexitdirection_motionlong_presencefragmented_trackcrowded_window
scripts/build_chunks.py creates retrieval chunks of three types:
trackeventtime_window
scripts/build_video_facts.py precomputes summary facts such as:
- total unique tracks
- longest track
- shortest track
- most crowded window
- entry/exit counts by side
- direction counts
- average track duration
Note: This script requires track_facts and events, but chunks is optional. If chunks are provided, they enrich the statistics; otherwise, facts are computed from tracks and events alone.
scripts/build_index.py builds a FAISS index over chunk text using a sentence-transformer embedding model.
scripts/query_rag.py and app.py then:
- retrieve the most relevant chunks
- prefer exact answers from global video facts when possible
- fall back to retrieval evidence when needed
evaluation/evaluate_mot.py compares a generated MOT prediction file against a MOT-style sequence directory containing:
gt/gt.txtseqinfo.ini
The script computes metrics such as:
- MOTA
- MOTP
- IDF1
- ID precision / recall
- ID switches
- false positives
- misses
- mostly tracked / partially tracked / mostly lost
Results are written into evaluation/results/.
- Python 3.10+ is declared in
pyproject.toml
The repository uses packages from requirements.txt, including:
torchtorchvisionultralyticsboxmotdeep-sort-realtimeopencv-pythonmotmetricsfaiss-cpusentence-transformersgradiofastapipytestruff
git clone <your-repo-url>
cd MultiObjectTracking-RTDETR-OCSORT-RAGpython3 -m venv venv
source venv/bin/activatepip install --upgrade pip
pip install -r requirements.txtmkdir -p demo/sample_videos demo/sample_outputs demo/hf_runs evaluation/results modelsIf you prefer using the Makefile:
make install
make dirsThe repository ignores model weights, so they need to exist locally.
To download the default RT-DETR-L weight used by the current download script:
PYTHONPATH=. venv/bin/python scripts/download_models.pyThat script saves the model into:
models/rtdetr-l.pt
The RT-DETR configs in this workspace currently reference:
rtdetr-l.ptrtdetr-x.pt
If your chosen config points to a different location, either place the file there or update the config.
The execution order for the full workflow is:
scripts/run_demo.pyscripts/build_tracks.pyscripts/extract_appearance.pyscripts/extract_events.pyscripts/build_video_facts.py(optional first step)scripts/build_chunks.pyscripts/build_index.pyscripts/query_rag.py
This section is ordered from the simplest working path to the full analytics and QA workflow.
Recommended command:
PYTHONPATH=. venv/bin/python scripts/run_demo.py \
--config configs/rtdetr_ocsort.yaml \
--input demo/sample_videos/mot17_09_frcnn.mp4 \
--output demo/sample_outputs/mot17-09-ocsort.mp4This produces:
demo/sample_outputs/mot17-09-ocsort.mp4demo/sample_outputs/mot17-09-ocsort.tracks.jsondemo/sample_outputs/mot17-09-ocsort.mot.txt
You can also run the faster variant:
PYTHONPATH=. venv/bin/python scripts/run_demo.py \
--config configs/rtdetr_ocsort_fast.yaml \
--input demo/sample_videos/mot17_09_frcnn.mp4 \
--output demo/sample_outputs/mot17-09-ocsort-fast.mp4And the RT-DETR-X variant:
PYTHONPATH=. venv/bin/python scripts/run_demo.py \
--config configs/rtdetr_ocsort_x.yaml \
--input demo/sample_videos/mot17_09_frcnn.mp4 \
--output demo/sample_outputs/mot17-09-ocsort-x.mp4PYTHONPATH=. venv/bin/python scripts/build_tracks.py \
--input demo/sample_outputs/mot17-09-ocsort.tracks.json \
--output demo/sample_outputs/mot17-09-ocsort.built_tracks.json \
--fps 30 \
--frame-width 1920 \
--frame-height 1080PYTHONPATH=. venv/bin/python scripts/extract_appearance.py \
--config configs/rtdetr_ocsort.yaml \
--tracks demo/sample_outputs/mot17-09-ocsort.built_tracks.json \
--video demo/sample_videos/mot17_09_frcnn.mp4 \
--output demo/sample_outputs/mot17-09-ocsort.built_tracks.appearance.jsonPYTHONPATH=. venv/bin/python scripts/extract_events.py \
--input demo/sample_outputs/mot17-09-ocsort.built_tracks.appearance.json \
--events-output demo/sample_outputs/mot17-09-ocsort.events.json \
--facts-output demo/sample_outputs/mot17-09-ocsort.track_facts.json \
--fps 30This step preserves any appearance metadata on tracks so downstream video facts and chunk generation can use it.
If you prefer to keep the original built tracks file, you can instead pass the appearance-enhanced tracks file separately:
PYTHONPATH=. venv/bin/python scripts/extract_events.py \
--input demo/sample_outputs/mot17-09-ocsort.built_tracks.json \
--appearance-tracks demo/sample_outputs/mot17-09-ocsort.built_tracks.appearance.json \
--events-output demo/sample_outputs/mot17-09-ocsort.events.json \
--facts-output demo/sample_outputs/mot17-09-ocsort.track_facts.json \
--fps 30Minimal approach (facts from events only):
PYTHONPATH=. venv/bin/python scripts/build_video_facts.py \
--track-facts demo/sample_outputs/mot17-09-ocsort.track_facts.json \
--events demo/sample_outputs/mot17-09-ocsort.events.json \
--fps 30 \
--output demo/sample_outputs/mot17-09-ocsort.video_facts.jsonPYTHONPATH=. venv/bin/python scripts/build_chunks.py \
--config configs/rtdetr_ocsort.yaml \
--track_facts demo/sample_outputs/mot17-09-ocsort.track_facts.json \
--events demo/sample_outputs/mot17-09-ocsort.events.json \
--output demo/sample_outputs/mot17-09-ocsort.chunks.jsonIf you have run scripts/extract_appearance.py and then scripts/extract_events.py on the appearance-enhanced tracks, your generated track_facts.json already carries the appearance field and can be chunked directly.
If you instead want to merge appearance from a separate appearance-enhanced tracks file, use:
PYTHONPATH=. venv/bin/python scripts/build_chunks.py \
--config configs/rtdetr_ocsort.yaml \
--track_facts demo/sample_outputs/mot17-09-ocsort.track_facts.json \
--events demo/sample_outputs/mot17-09-ocsort.events.json \
--tracks_with_appearance demo/sample_outputs/mot17-09-ocsort.built_tracks.appearance.json \
--output demo/sample_outputs/mot17-09-ocsort.chunks.jsonIf you skipped step 5, or want richer statistics after building chunks:
PYTHONPATH=. venv/bin/python scripts/build_video_facts.py \
--track-facts demo/sample_outputs/mot17-09-ocsort.track_facts.json \
--events demo/sample_outputs/mot17-09-ocsort.events.json \
--chunks demo/sample_outputs/mot17-09-ocsort.chunks.json \
--fps 30 \
--output demo/sample_outputs/mot17-09-ocsort.video_facts.jsonPYTHONPATH=. venv/bin/python scripts/build_index.py \
--chunks demo/sample_outputs/mot17-09-ocsort.chunks.json \
--index-output demo/sample_outputs/mot17-09-ocsort.faiss.index \
--metadata-output demo/sample_outputs/mot17-09-ocsort.index_meta.json \
--model sentence-transformers/all-MiniLM-L6-v2PYTHONPATH=. venv/bin/python scripts/query_rag.py \
--index demo/sample_outputs/mot17-09-ocsort.faiss.index \
--metadata demo/sample_outputs/mot17-09-ocsort.index_meta.json \
--video-facts demo/sample_outputs/mot17-09-ocsort.video_facts.json \
--query "Which track stayed the longest?" \
--top-k 5 \
--model sentence-transformers/all-MiniLM-L6-v2PYTHONPATH=. venv/bin/python app.pyThen open the local Gradio URL shown in the terminal, upload a video, run the full pipeline, and ask questions in the second tab.
If you want the shortest end-to-end sequence:
git clone <your-repo-url>
cd MultiObjectTracking-RTDETR-OCSORT-RAG
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
mkdir -p demo/sample_videos demo/sample_outputs demo/hf_runs evaluation/results models
PYTHONPATH=. venv/bin/python scripts/download_models.py
PYTHONPATH=. venv/bin/python scripts/run_demo.py \
--config configs/rtdetr_ocsort.yaml \
--input demo/sample_videos/mot17_09_frcnn.mp4 \
--output demo/sample_outputs/mot17-09-ocsort.mp4Evaluate a generated MOT file against a local MOT sequence directory:
PYTHONPATH=. venv/bin/python evaluation/evaluate_mot.py \
--pred demo/sample_outputs/mot17-09-ocsort.mot.txt \
--sequence-dir MOT17-09-FRCNNOptional example with a duration cap:
PYTHONPATH=. venv/bin/python evaluation/evaluate_mot.py \
--pred demo/sample_outputs/mot17-09-ocsort.mot.txt \
--sequence-dir MOT17-09-FRCNN \
--sample-seconds 10Results are saved under evaluation/results/.
Run the test suite:
PYTHONPATH=. venv/bin/python -m pytest testsRun tests with coverage through the Makefile:
make testRun linting:
make lintThe repository includes these convenience targets:
make installmake dirsmake runmake testmake lintmake metricsmake download-models
Important note: make run currently defaults to configs/default.yaml, but the present code path is most reliable with configs/rtdetr_ocsort.yaml. If you want to use make run, override the config explicitly:
make run \
CONFIG=configs/rtdetr_ocsort.yaml \
INPUT=demo/sample_videos/mot17_09_frcnn.mp4 \
OUTPUT=demo/sample_outputs/mot17-09-ocsort.mp4- Model weights are ignored by Git.
demo/sample_videos/is ignored by Git.demo/sample_outputs/is ignored by Git.demo/hf_runs/is ignored by Git.MOT17*folders are ignored by Git.evaluation/results/is ignored by Git.
That means a fresh clone gives you the code, but not the large runtime assets.
- The DeepSORT config path is currently incomplete because
src/trackers/factory.pyexpectstracker.max_iou_distance, but that key is missing fromconfigs/default.yamlandconfigs/ultralytics_deepsort.yaml. scripts/download_models.pycurrently downloadsrtdetr-l.pt; if you use thertdetr_ocsort_x.yamlconfig, you needrtdetr-x.ptlocally as well.- The sample commands assume your local videos already exist in
demo/sample_videos/. - MOT evaluation requires a valid local MOT sequence directory with
gt/gt.txtandseqinfo.ini.
- The unit test suite passes with:
venv/bin/python -m pytest tests- The RT-DETR + OC-SORT config path can be instantiated successfully.
- The older DeepSORT configs are not the best default for the current codebase without config updates.