SubAsL AI is a state-of-the-art, dual-engine artificial intelligence system designed to bridge the communication gap between the Deaf community and the hearing world. By utilizing raw skeletal landmarks extracted via computer vision, SubAsL AI achieves high-speed, real-time American Sign Language (ASL) recognition entirely locally, making it highly privacy-preserving, production-ready, and scalable.
Here is the system successfully performing real-time inference, generating raw MediaPipe skeletons and predicting dynamic ASL vocabulary with high confidence:
SubAsL AI is powered by two distinct neural network architectures, specialized for different aspects of communication: Fingerspelling (Alphabets) and Dynamic Signing (Words).
Designed to recognize complex, multi-frame dynamic gestures across a vocabulary of 250 distinct ASL words (e.g., "Hello," "Water," "Hungry").
- Architecture: A deep 1D-CNN Residual Network (ResNet) featuring 4 residual blocks, Global Average Pooling, and a 512-Dense classification head.
- Feature Engineering: Extracts 75 skeletal landmarks (Hands + Pose) via MediaPipe. We utilize Nose-Centric Spatial Normalization (making predictions position-invariant) and compute the Temporal Velocity of joints to capture the true speed and direction of the sign.
- Performance:
- Training Accuracy: 85%
- Validation Accuracy: 71% (Across 250 classes; Random guessing would be 0.4%)
- Production Deployment: Model weights are quantized and converted to TensorFlow Lite (
.tflite), allowing inference at 60 FPS on a standard CPU without requiring cloud GPUs. - Dataset Used: Google - Isolated Sign Language Recognition (94K+ videos)
Designed for ultra-fast, static frame-by-frame fingerspelling detection (A-Z). Used for spelling out names or words outside the 250-word dictionary.
- Architecture: A highly optimized PyTorch feed-forward neural network.
- Feature Engineering: Analyzes the 21 3D-landmarks of a single hand, focusing on the relative geometry of the fingers.
- Production Deployment: PyTorch weights saved as
best_model.pth. - Performance:
- Accuracy: 97%
- F1-Score: 97%
- Precision/Recall: Balanced high performance across all 26 alphabet classes (Avg. 0.96+).
- Dataset Used: Sign Language Landmarks Dataset (Kaggle)
Ensure you have Python 3.9+ installed. Install the core dependencies:
pip install opencv-python mediapipe tensorflow torch numpyTo launch the 250-word dynamic ASL recognizer:
python inference_resnet.pyControls:
- SPACE: Add a space to your sentence.
- D / Backspace: Delete the last recorded word.
- ESC: Exit the application.
To launch the A-Z fingerspelling recognizer:
python inference.py- Skeleton Extraction: As the user signs, MediaPipe Holistic tracks their body, drawing a virtual skeleton consisting of 75 key points (Left Hand, Right Hand, Face, Torso).
- Mathematical Transformation: Instead of processing heavy pixel data, SubAsL AI processes pure mathematics. Coordinates are normalized against the user's nose, meaning the AI works perfectly whether the user is 1 foot or 10 feet away from the camera.
- Temporal Processing: For word detection, a buffer collects 60 frames of data. The ResNet model evaluates the flow of time, analyzing how the landmarks move, accelerate, and stop to determine the exact sign.
- Real-Time Output: The system confidently predicts the sign and outputs it to the UI in milliseconds.
The lightweight nature of our TFLite and PyTorch models paves the way for direct browser integration.
Next Steps: SubAsL AI can be packaged into a JavaScript-based Chrome Extension using TensorFlow.js and MediaPipe.js. This will allow the AI to run natively inside platforms like Google Meet or Zoom, providing live ASL-to-Text closed captioning completely offline, ensuring total user privacy.
Built with ❤️ for a more accessible world.



