Skip to content
View Shybert-AI's full-sized avatar

Block or report Shybert-AI

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Shybert-AI/README.md

中文

👋 Hi, I'm Shybert | Multimodal Algorithm Engineer

🏠 Xi'an, China | 📧 854197093@qq.com | QQ Group:1029629549

A practitioner with 6 years of algorithm development experience, focusing on front-end speech processing, multimodal model training, and engineering deployment. Passionate about sharing practical insights through technical blogs, and have accumulated extensive optimization experience across multiple international AI competitions.

  • 🔭 Currently working on: AEC (Acoustic Echo Cancellation), Diffusion model acceleration, multimodal content generation
  • 🌱 Exploring: On-device deployment of large models, performance tuning of video generation models (e.g., Wan2.2)
  • 👯 Open to collaboration: Speech recognition, Sound Event Detection, AI competition solution reproduction & optimization
  • 📝 Writing regularly: 110+ original technical articles on CSDN Blog
  • Recent highlights: 📄 arXiv Paper: Sparse Mixture-of-Experts Routing in Visual Diffusion Transformers (2025.05) | 2025 IKCEST Top 10, 2025 Baidu Commercial Competition Top 5, 2024 IKCEST Top 8, GitHub Project OpenManus-WebUI 226 Stars ⭐

🏆 Competition Awards & Honors

My competition experience spans multiple cutting-edge fields such as speech, video, and multimodal misinformation detection. Below are some representative awards:

Year Competition Topic/Direction Ranking
2025 IKCEST International Big Data Competition Photo-based Problem Solving with LLMs Global Top 10
2025 Baidu Commercial AI Technology Innovation Competition Video Ad Generation Inference Optimization (Digital Human) National Top 5
2024 IKCEST International Big Data Competition AI Sports Commentary Global Top 8
2024 2nd World Scientific Intelligence Competition Life Science & Materials Science Tracks 14th & 15th
2023 IKCEST International Big Data Competition Multimodal Misinformation Detection in Social Networks Global Top 11
2022 vloong Energy AI Challenge New Energy Battery Anomaly Detection 3rd Place

🚀 Core Projects & Highlights

Here are some representative projects across speech and multimodal domains, covering the full pipeline from model training to on-device deployment.

📌 Multimodal Generation & Video Editing

  • 【Diagnostic Research】UniGen-MOE: Diagnosing MoE Routing Failures in Video Diffusion (2025.05) A Token-Choice MoE conversion experiment based on the Wan2.2-TI2V-5B DiT backbone and Qwen2.5-VL-3B-Instruct encoder. Discovered two novel failure modes: "Selective Deadlock" and "U-shaped Deadlock Distribution", proposed the "Functional Redundancy Hypothesis", and documented the Three Laws of Dense-to-MoE conversion and a complete solution to the bfloat16 precision trap. The associated paper has been released on arXiv. (GitHub Repo)

  • 【Unified Framework】UniGen-LingXi: 9-in-1 Multimodal Generation & Editing (2025.04) A resource-efficient, "editing-first" 9-in-1 multimodal unified framework covering core tasks such as text-to-image, text-to-video, image editing, and video editing. Served as the experimental base for UniGen-MOE. (GitHub Repo)

  • 【Competition Solution】Video Ad Generation Inference Optimization (2025.09) Integrated FlashAttention, TeaCache, custom attention block computation and other techniques to compress single-video inference from 10 minutes to 1 minute, achieving 10x speedup while maintaining generation quality (similarity > 0.97). (Solution Blog)

  • 【Short Video Generation】OpenShortVideo: AI Short Video Intelligent Production Platform (2025.03) Integrated scriptwriting, character management, scene generation, and shot production workflows to help creators rapidly generate high-quality short video content. (GitHub Repo) (44 ⭐)

📌 Speech Signal Processing

  • 【Acoustic Echo Cancellation】Two-Stage Acoustic Echo Cancellation System Combined traditional TDC-wRLS linear filtering with U-Net deep learning to build a two-stage AEC system that effectively eliminates both linear and nonlinear echoes, significantly improving speech communication quality. (GitHub Repo)

  • 【Speech Enhancement】DeepComplexCRN_streaming: Deep Complex Recurrent Network for Speech Enhancement Supports both full-utterance and streaming inference, suitable for real-time speech processing scenarios. (GitHub Repo)

  • 【Sound Event Detection】AudioClassificationModelZoo-Pytorch Open-sourced 20+ audio classification models based on PyTorch, with streaming test support, providing a convenient toolkit for sound event detection research. (GitHub Repo)

📌 LLM Applications & Engineering Tools

  • 【On-Device Deployment】Edge-side-LLMChat: Local LLM Chat APK Based on MNN (2025.05) An Android local LLM chat application based on the MNN inference framework, supporting model downloads, image input, and multi-model management. (GitHub Repo)

  • 【Competition Solution】Intelligent Photo Problem-Solving Assistant (2025.12) A multi-model intelligent learning tool supporting photo upload, multi-model problem solving, automatic failover, and providing step-by-step analysis, voice explanations, and mistake collection. Implemented as the 2025 IKCEST Top 10 solution. (GitHub Repo) Live Demo

  • 【WebUI Application】OpenManus-WebUI (2025.04) Built a front-end interface using Flask to invoke OpenManus, with file preview support for generated outputs, earning 226 Stars. (GitHub Repo)

  • 【Competition Solution】AI_SECS_Agent: AI Sports Commentary System (2024.12) A multimodal agent system integrating object tracking, pose recognition, OCR, goal detection, and other models to automatically generate AI commentary from football match video URLs. (GitHub Repo)

  • 【Competition Solution】MMF-RIM: Multimodal Misinformation Detection Model (2023.11) A 600M-parameter multimodal fusion model combining ERNIE, ResNet101, CLIP-ViT, and OCR text features to detect multimodal rumors in social networks. (GitHub Repo)


🛠️ Tech Stack & Expertise

Core Speech Algorithms AEC Voice Wakeup SED Speech Enhancement

Multimodal & Generative Models Multimodal Fusion Video Generation Diffusion Models CLIP

Competition & Optimization Inference Acceleration Solution Reproduce

Development & Deployment Frameworks Python PyTorch Kaldi PaddlePaddle C++/Shell Docker Flask Android


📈 My GitHub Activity

Pinned Loading

  1. openshortvideo openshortvideo Public

    OpenShortVideo 是一个基于AI的短视频智能制作平台,集成了剧本创作、角色管理、场景生成、镜头制作等完整的工作流,帮助创作者快速生成高质量的短视频内容。

    HTML 44 11

  2. OpenManus-WebUI OpenManus-WebUI Public

    构建一个前端页面,通过flask框架实现OpenManus的前端调用。

    JavaScript 224 52

  3. Prediction-of-stock-price-based-on-BP-neural-network Prediction-of-stock-price-based-on-BP-neural-network Public

    基于BP神经网络的股票价格预测

    Python 25 4

  4. Energy_Anomaly_Detection_TOP3 Energy_Anomaly_Detection_TOP3 Public

    能源AI挑战赛_异常检测赛第3名方案

    Jupyter Notebook 15 2

  5. MuMuAINovel-sqlite MuMuAINovel-sqlite Public

    基于 AI 的智能小说创作助手

    Python 10 1

  6. claude-code-deepseek claude-code-deepseek Public

    可运行的Claude代码源码,采用双端模型进行驱动

    TypeScript 16 12