feat(agent): support MoE, multimodal, audio, seq2seq, diffusion model#710
Merged
Conversation
… extraction
Extend GraphNet Agent to correctly identify and extract computation graphs
for a wider range of model architectures beyond basic text/vision models.
Key changes:
- ModelMetadata: add architecture_type field ("text"/"vision"/"seq2seq"/
"audio"/"multimodal"/"diffusion"/"moe")
- ConfigMetadataAnalyzer: use AutoConfig.from_pretrained() for rich config
introspection; classify architecture via transformers' own task mapping
tables (MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES, etc.) — no hardcoded
lists; add per-arch input shape builders covering whisper decoder_input_ids,
CLIP text_config seq_len, diffusion sample/timestep/encoder_hidden_states,
MoE/seq2seq specific inputs; add field-based model_type inference fallback
for configs missing model_type (e.g. prajjwal1/bert-tiny)
- TemplateCodeGenerator: branch model loader by arch (AutoModelForSeq2SeqLM
for seq2seq, UNet2DConditionModel for diffusion); add diffusion-specific
script generation using positional args; inject inferred model_type into
config when absent
- LLMCodeFixer: extend _SYSTEM_PROMPT with MoE and diffusion input specs and
error patterns; add MoE routing / UNet / seq2seq / GQA / audio fields to
_extract_key_fields
- GraphNetAgent: add _resolve_model_dir() to detect diffusers pipelines
(model_index.json) and automatically redirect to unet/ subdir
Tested on: bert-tiny (text), convnextv2 (vision), t5-small (seq2seq),
whisper-tiny (audio), clip-vit-base-patch32 (multimodal),
tiny-random-MixtralForCausalLM (moe), tiny-stable-diffusion-pipe (diffusion)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for your contribution! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Category
Feature Enhancement
Description
抽图agent支持 MoE, multimodal, audio, seq2seq, diffusion model等架构,目前小批量测试没问题,尚未大批量测试
Extend GraphNet Agent to correctly identify and extract computation graphs for a wider range of model architectures beyond basic text/vision models.
Key changes:
Tested on: bert-tiny (text), convnextv2 (vision), t5-small (seq2seq), whisper-tiny (audio), clip-vit-base-patch32 (multimodal), tiny-random-MixtralForCausalLM (moe), tiny-stable-diffusion-pipe (diffusion)