RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension

🌐 Project Page • 📖 Paper • 🤗 Hugging Face • 🧭 ModelScope

Official code and data of the paper RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension (ACL 2026).

RPC-Bench, a large-scale fine-grained question answering benchmark constructed from review-rebuttal exchanges of high-quality academic papers, with each paper available in two input formats (pure text and rendered page images) enabling evaluation of both large language models (LLMs) and visual language models (VLMs).

🚀 Quick Start

Dependencies

First, create a conda environment and install all pip package requirements.

conda create -n rpc python==3.11.13
conda activate rpc

pip install -r requirements.txt

QA Construction

The pipeline/ directory provides an example workflow for constructing benchmark QA annotations from crawled OpenReview review-rebuttal data through LLM-based decomposition, rewriting, and filtering. See pipeline/README.md for details.

Data processing

For this benchmark, each academic paper can be processed into either structured text or page-rendered images, enabling evaluation across both LLMs and VLMs. Choose the parsing mode that best fits your experimental objectives.

File Download: Download paper PDFs based on metadata from JSON files located under the benchmark/ directory.

python download.py

Text Parsing: Parse PDF content into text using MinerU.

pip install --upgrade pip
pip install uv
uv pip install -U "mineru[core]"

mineru-models-download
mineru -p "./benchmark/pdf/test" -o "./benchmark/parse/test" --source local

Image Parsing: Convert PDF pages into image format for further processing.

python pdf2image.py

Processed Data Download

You may also download our processed data directly from Google Drive, Hugging Face, or ModelScope. The processed data includes:

pdf/: original paper PDFs.
md/: Markdown files parsed from each paper by MinerU, used as text input for LLM-oriented evaluation.
parse/: full MinerU parsing outputs, including structured layout and content artifacts.
vlm/: page images rendered from PDFs with PyMuPDF at 200 DPI, used for VLM-oriented evaluation.

The examples below show how to download only md/ and vlm/, which are sufficient for running the default LLM and VLM inference scripts.

Option 1: Download from Hugging Face

pip install -U huggingface_hub
hf download zai-org/RPC-Bench md/test/ vlm/test/ --repo-type dataset --local-dir ./benchmark

Option 2: Download from ModelScope

pip install -U modelscope
modelscope download --dataset ZhipuAI/RPC-Bench --include "md/test/**" "vlm/test/**" --local_dir ./benchmark

🧩 Consistency Evaluation

The consistency/ directory provides a self-contained example for measuring consistency between LLM judge outputs and human pairwise preferences. See consistency/README.md for details.

✈️ Inference

GPT-5 is given as an example below, but you may replace this with any other LLM or VLM supported in your environment.

LLM Inference:

python llm.py

VLM Inference:

python vlm.py

🛜 Evaluation

After inference, evaluate predictions against benchmark references using:

python eval.py

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
assets		assets
benchmark		benchmark
consistency		consistency
pipeline		pipeline
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
RPC-Bench_PPT.pdf		RPC-Bench_PPT.pdf
RPC-Bench_Poster.pdf		RPC-Bench_Poster.pdf
download.py		download.py
eval.py		eval.py
llm.py		llm.py
pdf2image.py		pdf2image.py
prompt.py		prompt.py
requirements.txt		requirements.txt
vlm.py		vlm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension

🚀 Quick Start

Dependencies

QA Construction

Data processing

Processed Data Download

🧩 Consistency Evaluation

✈️ Inference

🛜 Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension

🚀 Quick Start

Dependencies

QA Construction

Data processing

Processed Data Download

🧩 Consistency Evaluation

✈️ Inference

🛜 Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages