feat(agent): support MoE, multimodal, audio, seq2seq, diffusion model by fangfangssj · Pull Request #710 · PaddlePaddle/GraphNet

fangfangssj · 2026-05-15T08:52:51Z

PR Category

Feature Enhancement

Description

抽图agent支持 MoE, multimodal, audio, seq2seq, diffusion model等架构，目前小批量测试没问题，尚未大批量测试

Extend GraphNet Agent to correctly identify and extract computation graphs for a wider range of model architectures beyond basic text/vision models.

Key changes:

ModelMetadata: add architecture_type field ("text"/"vision"/"seq2seq"/ "audio"/"multimodal"/"diffusion"/"moe")
ConfigMetadataAnalyzer: use AutoConfig.from_pretrained() for rich config introspection; classify architecture via transformers' own task mapping tables (MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES, etc.) — no hardcoded lists; add per-arch input shape builders covering whisper decoder_input_ids, CLIP text_config seq_len, diffusion sample/timestep/encoder_hidden_states, MoE/seq2seq specific inputs; add field-based model_type inference fallback for configs missing model_type (e.g. prajjwal1/bert-tiny)
TemplateCodeGenerator: branch model loader by arch (AutoModelForSeq2SeqLM for seq2seq, UNet2DConditionModel for diffusion); add diffusion-specific script generation using positional args; inject inferred model_type into config when absent
LLMCodeFixer: extend _SYSTEM_PROMPT with MoE and diffusion input specs and error patterns; add MoE routing / UNet / seq2seq / GQA / audio fields to _extract_key_fields
GraphNetAgent: add _resolve_model_dir() to detect diffusers pipelines (model_index.json) and automatically redirect to unet/ subdir

Tested on: bert-tiny (text), convnextv2 (vision), t5-small (seq2seq), whisper-tiny (audio), clip-vit-base-patch32 (multimodal), tiny-random-MixtralForCausalLM (moe), tiny-stable-diffusion-pipe (diffusion)

… extraction Extend GraphNet Agent to correctly identify and extract computation graphs for a wider range of model architectures beyond basic text/vision models. Key changes: - ModelMetadata: add architecture_type field ("text"/"vision"/"seq2seq"/ "audio"/"multimodal"/"diffusion"/"moe") - ConfigMetadataAnalyzer: use AutoConfig.from_pretrained() for rich config introspection; classify architecture via transformers' own task mapping tables (MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES, etc.) — no hardcoded lists; add per-arch input shape builders covering whisper decoder_input_ids, CLIP text_config seq_len, diffusion sample/timestep/encoder_hidden_states, MoE/seq2seq specific inputs; add field-based model_type inference fallback for configs missing model_type (e.g. prajjwal1/bert-tiny) - TemplateCodeGenerator: branch model loader by arch (AutoModelForSeq2SeqLM for seq2seq, UNet2DConditionModel for diffusion); add diffusion-specific script generation using positional args; inject inferred model_type into config when absent - LLMCodeFixer: extend _SYSTEM_PROMPT with MoE and diffusion input specs and error patterns; add MoE routing / UNet / seq2seq / GQA / audio fields to _extract_key_fields - GraphNetAgent: add _resolve_model_dir() to detect diffusers pipelines (model_index.json) and automatically redirect to unet/ subdir Tested on: bert-tiny (text), convnextv2 (vision), t5-small (seq2seq), whisper-tiny (audio), clip-vit-base-patch32 (multimodal), tiny-random-MixtralForCausalLM (moe), tiny-stable-diffusion-pipe (diffusion) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

paddle-bot · 2026-05-15T08:58:06Z

Thanks for your contribution!

Xreki

LGTM

Xreki

LGTM

Xreki approved these changes May 15, 2026

View reviewed changes

Xreki merged commit dc25adf into PaddlePaddle:develop May 15, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): support MoE, multimodal, audio, seq2seq, diffusion model#710

feat(agent): support MoE, multimodal, audio, seq2seq, diffusion model#710
Xreki merged 1 commit into
PaddlePaddle:developfrom
fangfangssj:llm

fangfangssj commented May 15, 2026

Uh oh!

paddle-bot Bot commented May 15, 2026

Uh oh!

Xreki left a comment

Uh oh!

Xreki left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fangfangssj commented May 15, 2026

PR Category

Description

Uh oh!

paddle-bot Bot commented May 15, 2026

Uh oh!

Xreki left a comment

Choose a reason for hiding this comment

Uh oh!

Xreki left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants