🤔 Your LLM-Wiki Need to be Refined
DeepRefine is a general LLM-based reasoning model for agent-compiled knowledge refinement that improves the quality of any pre-constructed knowledge bases with user queries to make it more suitable for the downstream tasks.
- [2026/5/10] Static quants of DeepRefine-v1-8B, 🤗 mradermacher/DeepRefine-v1-8B-GGUF has been released. Thanks to the community!
We collect the raw training data of HotpotQA from https://hotpotqa.github.io/ and then construct the data samples for RL training through the following script:
bash scripts/autograph-r1/data_prepare/hotpotqa_cons.shOr you can also access the training data under the folder data/.
⚠️ Configuration Reminder: Please ensure to replace all path configurations in the following scripts with your own paths.
Update your config in verl/third_party/autograph_r1/config.ini.
Train
bash scripts/train/run_qwen3-4b_graph_refiner.shTrain
bash scripts/train/run_qwen3-8b_graph_refiner.shWe have also provided our model in HuggingFace.
⚠️ Configuration Reminder: Please ensure to replace all path configurations in the following scripts with your own paths.
There are six evaluation mode:
- Graph Retriever, no refinement.
bash scripts/eval/gr_refine_bench_no_refine.sh- Graph Retriever, naive refinement (without training).
bash scripts/eval/gr_refine_bench_wo_rl.sh- Graph Retriever, deeprefine.
bash scripts/eval/gr_refine_bench_rl.sh- Text Retriever, no refinement.
bash scripts/eval/tr_refine_bench_no_refine.sh- Text Retriever, naive refinement (without training).
bash scripts/eval/tr_refine_bench_wo_rl.sh- Text Retriever, deeprefine.
bash scripts/eval/tr_refine_bench_rl.sh@article{huang2026deeprefine,
title={DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning},
author={Huang, Haoyu and Bai, Jiaxin and Liu, Shujie and Wei, Yang and Tsang, Hong Ting and Gao, Yisen and Xie, Zhongwei and Li, Yufei and Song, Yangqiu},
journal={arXiv preprint arXiv:2605.10488},
year={2026}
}