Shybert-AI

👋 Hi, I'm Shybert | Multimodal Algorithm Engineer

🏠 Xi'an, China | 📧 854197093@qq.com | QQ Group：1029629549

A practitioner with 6 years of algorithm development experience, focusing on front-end speech processing, multimodal model training, and engineering deployment. Passionate about sharing practical insights through technical blogs, and have accumulated extensive optimization experience across multiple international AI competitions.

🔭 Currently working on: AEC (Acoustic Echo Cancellation), Diffusion model acceleration, multimodal content generation
🌱 Exploring: On-device deployment of large models, performance tuning of video generation models (e.g., Wan2.2)
👯 Open to collaboration: Speech recognition, Sound Event Detection, AI competition solution reproduction & optimization
📝 Writing regularly: 110+ original technical articles on CSDN Blog
⚡ Recent highlights: 📄 arXiv Paper: Sparse Mixture-of-Experts Routing in Visual Diffusion Transformers (2025.05) | 2025 IKCEST Top 10, 2025 Baidu Commercial Competition Top 5, 2024 IKCEST Top 8, GitHub Project OpenManus-WebUI 226 Stars ⭐

🏆 Competition Awards & Honors

My competition experience spans multiple cutting-edge fields such as speech, video, and multimodal misinformation detection. Below are some representative awards:

Year	Competition	Topic/Direction	Ranking
2025	IKCEST International Big Data Competition	Photo-based Problem Solving with LLMs	Global Top 10
2025	Baidu Commercial AI Technology Innovation Competition	Video Ad Generation Inference Optimization (Digital Human)	National Top 5
2024	IKCEST International Big Data Competition	AI Sports Commentary	Global Top 8
2024	2nd World Scientific Intelligence Competition	Life Science & Materials Science Tracks	14th & 15th
2023	IKCEST International Big Data Competition	Multimodal Misinformation Detection in Social Networks	Global Top 11
2022	vloong Energy AI Challenge	New Energy Battery Anomaly Detection	3rd Place

🚀 Core Projects & Highlights

Here are some representative projects across speech and multimodal domains, covering the full pipeline from model training to on-device deployment.

📌 Multimodal Generation & Video Editing

【Diagnostic Research】UniGen-MOE: Diagnosing MoE Routing Failures in Video Diffusion (2025.05) A Token-Choice MoE conversion experiment based on the Wan2.2-TI2V-5B DiT backbone and Qwen2.5-VL-3B-Instruct encoder. Discovered two novel failure modes: "Selective Deadlock" and "U-shaped Deadlock Distribution", proposed the "Functional Redundancy Hypothesis", and documented the Three Laws of Dense-to-MoE conversion and a complete solution to the bfloat16 precision trap. The associated paper has been released on arXiv. (GitHub Repo)
【Unified Framework】UniGen-LingXi: 9-in-1 Multimodal Generation & Editing (2025.04) A resource-efficient, "editing-first" 9-in-1 multimodal unified framework covering core tasks such as text-to-image, text-to-video, image editing, and video editing. Served as the experimental base for UniGen-MOE. (GitHub Repo)
【Competition Solution】Video Ad Generation Inference Optimization (2025.09) Integrated FlashAttention, TeaCache, custom attention block computation and other techniques to compress single-video inference from 10 minutes to 1 minute, achieving 10x speedup while maintaining generation quality (similarity > 0.97). (Solution Blog)
【Short Video Generation】OpenShortVideo: AI Short Video Intelligent Production Platform (2025.03) Integrated scriptwriting, character management, scene generation, and shot production workflows to help creators rapidly generate high-quality short video content. (GitHub Repo) (44 ⭐)

📌 Speech Signal Processing

【Acoustic Echo Cancellation】Two-Stage Acoustic Echo Cancellation System Combined traditional TDC-wRLS linear filtering with U-Net deep learning to build a two-stage AEC system that effectively eliminates both linear and nonlinear echoes, significantly improving speech communication quality. (GitHub Repo)
【Speech Enhancement】DeepComplexCRN_streaming: Deep Complex Recurrent Network for Speech Enhancement Supports both full-utterance and streaming inference, suitable for real-time speech processing scenarios. (GitHub Repo)
【Sound Event Detection】AudioClassificationModelZoo-Pytorch Open-sourced 20+ audio classification models based on PyTorch, with streaming test support, providing a convenient toolkit for sound event detection research. (GitHub Repo)

📌 LLM Applications & Engineering Tools

【On-Device Deployment】Edge-side-LLMChat: Local LLM Chat APK Based on MNN (2025.05) An Android local LLM chat application based on the MNN inference framework, supporting model downloads, image input, and multi-model management. (GitHub Repo)
【Competition Solution】Intelligent Photo Problem-Solving Assistant (2025.12) A multi-model intelligent learning tool supporting photo upload, multi-model problem solving, automatic failover, and providing step-by-step analysis, voice explanations, and mistake collection. Implemented as the 2025 IKCEST Top 10 solution. (GitHub Repo) Live Demo
【WebUI Application】OpenManus-WebUI (2025.04) Built a front-end interface using Flask to invoke OpenManus, with file preview support for generated outputs, earning 226 Stars. (GitHub Repo)
【Competition Solution】AI_SECS_Agent: AI Sports Commentary System (2024.12) A multimodal agent system integrating object tracking, pose recognition, OCR, goal detection, and other models to automatically generate AI commentary from football match video URLs. (GitHub Repo)
【Competition Solution】MMF-RIM: Multimodal Misinformation Detection Model (2023.11) A 600M-parameter multimodal fusion model combining ERNIE, ResNet101, CLIP-ViT, and OCR text features to detect multimodal rumors in social networks. (GitHub Repo)

🛠️ Tech Stack & Expertise

Core Speech Algorithms

Multimodal & Generative Models

Competition & Optimization

Development & Deployment Frameworks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shybert-AI

Achievements

Achievements

Block or report Shybert-AI