Skip to content

zai-org/RPC-Bench

Repository files navigation

RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension

🌐 Project Page • 📖 Paper • 🤗 Hugging Face • 🧭 ModelScope

Official code and data of the paper RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension (ACL 2026).


RPC-Bench, a large-scale fine-grained question answering benchmark constructed from review-rebuttal exchanges of high-quality academic papers, with each paper available in two input formats (pure text and rendered page images) enabling evaluation of both large language models (LLMs) and visual language models (VLMs).

🚀 Quick Start

Dependencies

First, create a conda environment and install all pip package requirements.

conda create -n rpc python==3.11.13
conda activate rpc

pip install -r requirements.txt

QA Construction

The pipeline/ directory provides an example workflow for constructing benchmark QA annotations from crawled OpenReview review-rebuttal data through LLM-based decomposition, rewriting, and filtering. See pipeline/README.md for details.

Data processing

For this benchmark, each academic paper can be processed into either structured text or page-rendered images, enabling evaluation across both LLMs and VLMs. Choose the parsing mode that best fits your experimental objectives.

  • File Download: Download paper PDFs based on metadata from JSON files located under the benchmark/ directory.
python download.py
  • Text Parsing: Parse PDF content into text using MinerU.
pip install --upgrade pip
pip install uv
uv pip install -U "mineru[core]"

mineru-models-download
mineru -p "./benchmark/pdf/test" -o "./benchmark/parse/test" --source local
  • Image Parsing: Convert PDF pages into image format for further processing.
python pdf2image.py

Processed Data Download

You may also download our processed data directly from Google Drive, Hugging Face, or ModelScope. The processed data includes:

  • pdf/: original paper PDFs.
  • md/: Markdown files parsed from each paper by MinerU, used as text input for LLM-oriented evaluation.
  • parse/: full MinerU parsing outputs, including structured layout and content artifacts.
  • vlm/: page images rendered from PDFs with PyMuPDF at 200 DPI, used for VLM-oriented evaluation.

The examples below show how to download only md/ and vlm/, which are sufficient for running the default LLM and VLM inference scripts.

Option 1: Download from Hugging Face

pip install -U huggingface_hub
hf download zai-org/RPC-Bench md/test/ vlm/test/ --repo-type dataset --local-dir ./benchmark

Option 2: Download from ModelScope

pip install -U modelscope
modelscope download --dataset ZhipuAI/RPC-Bench --include "md/test/**" "vlm/test/**" --local_dir ./benchmark

🧩 Consistency Evaluation

The consistency/ directory provides a self-contained example for measuring consistency between LLM judge outputs and human pairwise preferences. See consistency/README.md for details.

✈️ Inference

GPT-5 is given as an example below, but you may replace this with any other LLM or VLM supported in your environment.

  • LLM Inference:
python llm.py
  • VLM Inference:
python vlm.py

🛜 Evaluation

After inference, evaluate predictions against benchmark references using:

python eval.py

About

Official Code for RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension (ACL 2026)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages