Part of the Detecture monorepo — this is the data-pipeline component that mines texture-rich scenes and produces the training / test datasets consumed by the model and scored by the benchmark.
DetectureMiner transforms generic semantic datasets (ADE20K + TextureSAM's texture-augmented variant) into the high-quality multi-texture training set Detecture is trained on. It also produces the held-out ADE20K_Detecture in-domain evaluation split.
Source dataset (ADE20K or TextureSAM Textured-ADE20K)
│
▼
┌─────────────────────┐
│ 1. Filter │ geometry-first scoring (0–100)
│ (rwtd_miner/) │ — region count, boundary quality, texture coverage
└──────────┬──────────┘
│ images scoring ≥ 65
▼
┌─────────────────────┐
│ 2. Extract regions │ merge ADE20K class-masks into 2–5 geometric
│ │ texture regions per image (≥ 1% area each)
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ 3. VLM annotate │ Claude 3.5 Sonnet describes each region
│ (training_data/) │ in the "Texture of <name>, <features>,
│ │ <spatial>" template (10–15 words)
└──────────┬──────────┘
│
▼
training set (~14k samples, 3.2 regions avg)
+ ADE20K_Detecture test split (212 samples)
The pipeline is described in detail in the paper's Data Ecosystem section and Appendix B.
DetectureMiner/
├── rwtd_miner/ # filtering + extraction engine (the core miner)
│ ├── cli.py # primary entry point
│ ├── stages/ # filter → extract → score stage modules
│ ├── dataset_adapters/ # ADE20K / COCO / TextureSAM loaders
│ └── utils/ # geometry scoring, mask merging
├── training_data/
│ ├── scripts/ # VLM-annotation drivers
│ ├── adapters/ # Anthropic / OpenAI client wrappers
│ ├── configs/ # per-source dataset configs
│ └── utils.py # path normalisation, metadata schema
├── configs/
│ ├── profiles/ # reusable hardware + scoring presets
│ └── generated/ # materialised merges (`<profile>_merged.yaml`)
├── data/ # raw datasets — NOT in git (see root README)
├── docs/ # design notes (review/ is gitignored, auto/ too)
└── config.yaml # top-level defaults
# 1. Place or symlink the raw datasets under data/raw/
# (ADE20K → data/raw/ade20k/ADEChallengeData2016, etc.)
# See root README for Hugging Face download.
# 2. Run the miner with the matching profile
python -m rwtd_miner.cli \
--config configs/generated/coco_ultra_strict_merged.yaml \
--stage all
# 3. Run the VLM annotation pass (requires ANTHROPIC_API_KEY)
export ANTHROPIC_API_KEY=sk-ant-...
python training_data/scripts/annotate.py \
--config training_data/configs/ade20k.yamlThe annotated training set is written as a metadata.json per dataset
in the unified schema consumed by
Qwen2SAM_Detecture and
Qwen2SAM_Detecture_Benchmark.
Every config path uses Path.home() / DETECTURE_DATASETS_ROOT
expansion — no absolute /home/... anywhere. Override the default
dataset root:
export DETECTURE_DATASETS_ROOT=/mnt/fast_storage/datasetsPer-source configs live in training_data/configs/:
- ade20k.yaml — vanilla ADE20K
- ade20k_texturesam.yaml — TextureSAM's
η<1variants - ADE20K_textured_images.yaml — TextureSAM's
η=1pure-texture variant
Each mined image becomes one JSON record:
{
"image_path": "~/datasets/ADE20k_Detecture/images/xxx.jpg",
"id": "training_ADE_train_00000001",
"textures": [
{
"description": "Texture of weathered brown concrete, coarse aggregate pattern, lower foreground",
"mask_path": "~/datasets/ADE20k_Detecture/masks/xxx_region0.png"
},
...
]
}The per-image region count is whatever the filtering + extraction pipeline produced (typically 2–5).
Every step is deterministic given the same --seed and the same
source dataset. The mined training set used for the paper's Detecture
checkpoint is ~14k samples; the exact metadata.json is shipped via
the Hugging Face dataset archive (see root README).