Support extract use CPU and optimize some codes. by Xreki · Pull Request #707 · PaddlePaddle/GraphNet

Xreki · 2026-05-14T15:24:56Z

PR Category

Other

Description

本次 PR 对 GraphNet 并行提取流水线进行了增强，新增 CPU 模式支持、可配置超时、细粒度的失败状态追踪，以及可选的 LLM 重试功能。变更内容如下：

1. 支持 CPU 模式

parallel_extract.py 通过 torch.cuda.is_available() 自动检测 GPU/CPU 环境。
新增 --num-workers 参数用于 CPU 模式（此前仅支持 GPU）。
Worker 日志前缀更新为 [Worker-{id} GPU:{id}] / [Worker-{id} CPU]。

2. 可配置超时（按设备区分默认值）

新增 --extract-timeout 和 --verify-timeout 命令行参数。
默认值根据运行设备自动区分：
- GPU：extract=1000s，verify=300s
- CPU：extract=2000s，verify=600s
GraphNetAgent、SubprocessGraphExtractor、ForwardVerifier 均支持传入可选的 timeout 参数。

3. 细粒度提取状态（ExtractionStatus 枚举）

引入 ExtractionStatus(str, Enum)，包含四种状态：
- OK — 提取和验证均通过
- VERIFY_FAILED — 计算图提取成功，但前向验证失败
- EXTRACT_FAILED — 脚本生成或图提取失败
- ERROR — 未预期的运行时错误
GraphNetAgent.extract_sample() 返回类型由 bool 改为 ExtractionStatus。

4. 拆分成功率指标

[PROGRESS] 日志现在同时输出两个成功率：
- success= — 整体成功率（提取+验证均通过）
- extract= — 提取成功率（包含提取成功但验证失败的模型）
Summary 汇总和 Per-GPU/Per-Worker 统计也同步展示两种成功率。
结果 JSON 新增 extract_success 和 extract_success_rate 字段。

5. 可选的 LLM 重试

新增 --use-llm 参数（store_true，默认 False），用于开启提取失败时的 LLM 脚本修复重试。
此前 Worker 中硬编码为 llm_retry=False。

6. 代码重构

将原先 main() 中的逻辑拆分为 _parse_args()、_resolve_config()、_load_model_ids()。
_worker() 重命名为 worker_fn()，增加 worker_id 和 llm_retry 显式参数。

paddle-bot · 2026-05-14T15:25:02Z

Thanks for your contribution!

- GraphNetAgent: add extract_timeout and verify_timeout parameters - parallel_extract: add --extract-timeout and --verify-timeout CLI args - Default timeouts differ by device: GPU: extract=1000s, verify=300s CPU: extract=2000s, verify=600s - Fix typo: os.envion -> os.environ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- New CLI arg --use-llm (default: true) controls llm_retry in GraphNetAgent - Pass llm_retry through worker_fn and _resolve_config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…cess rates. - Introduce ExtractionStatus(str, Enum): OK, VERIFY_FAILED, EXTRACT_FAILED, ERROR - GraphNetAgent.extract_sample() now returns ExtractionStatus instead of bool - parallel_extract tracks and prints both overall and extraction-only success rates - Per-GPU/Worker summary also shows both rates Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Support extract use CPU and optimize some codes.

31299c0

Xreki and others added 3 commits May 14, 2026 23:50

Fix workspace path.

aad09c2

Add --use-llm flag to parallel_extract for configurable LLM retry.

0bdf25b

- New CLI arg --use-llm (default: true) controls llm_retry in GraphNetAgent - Pass llm_retry through worker_fn and _resolve_config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Xreki force-pushed the fix_extract_agent branch from d66a007 to 2df4e9e Compare May 15, 2026 02:24

Xreki force-pushed the fix_extract_agent branch from 2df4e9e to 437a689 Compare May 15, 2026 02:55

luotao1 approved these changes May 15, 2026

View reviewed changes

Xreki merged commit 7c444e8 into PaddlePaddle:develop May 15, 2026
3 checks passed

Xreki deleted the fix_extract_agent branch May 15, 2026 03:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support extract use CPU and optimize some codes.#707

Support extract use CPU and optimize some codes.#707
Xreki merged 5 commits into
PaddlePaddle:developfrom
Xreki:fix_extract_agent

Xreki commented May 14, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Xreki commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

Description

1. 支持 CPU 模式

2. 可配置超时（按设备区分默认值）

3. 细粒度提取状态（ExtractionStatus 枚举）

4. 拆分成功率指标

5. 可选的 LLM 重试

6. 代码重构

Uh oh!

paddle-bot Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Xreki commented May 14, 2026 •

edited

Loading