US20250218223
2025-07-03
Physics
G06V40/28
A novel system is presented for translating sign language into other languages through a modular architecture. This approach leverages multiple classifiers to generate intermediate representations or final translations of sign languages. The classifiers are trained on various sign types to enhance translation accuracy while minimizing training complexity and time.
Sign language, used predominantly by the deaf and hard-of-hearing, relies on manual gestures, facial expressions, and body postures. Each sign language has its own grammar and syntax, distinct from spoken languages. Glosses are written approximations of signs used for transcription and analysis but do not capture all nuances. Neural networks, inspired by biological neural networks, are employed in machine learning to optimize performance for specific tasks. Various architectures like Feedforward Neural Networks (FNNs), Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and Transformers are utilized based on data types and tasks.
Translating sign languages from video to text is complex due to the dynamic nature of signs. ASL includes single signs, fingerspelled signs, and regular signs. Traditional methods like manual annotation or single classifier models have limitations in accuracy and adaptability. Rule-based systems and static image-based systems also fall short in capturing essential motion and variations in sign languages.
A modular sequential classifier approach is proposed, using multiple classifiers tailored for different sign types to improve translation accuracy. This structure allows for effective segregation and processing of single signs, fingerspelled signs, and regular signs. The system includes processing circuitry configured to enhance video segments for better feature extraction through techniques like brightness adjustment and frame removal.
The system employs a first classifier to determine if a video segment contains a single sign or a sequence of signs. If a sequence is detected, additional classifiers process the segment to produce symbols for each sign type. These classifiers are trained on specific sign types, such as regular continuous signs or fingerspelled signs. Training involves both traditional methods with labeled data and synthetic data generation to expand training sets.