A Lip-Reading Model for Tagalog Using Multimodal Deep Learning Approach

Nikie Jo E. Deocampo; Mia V. Villarica; Albert A. Vinluan

Nikie Jo E. Deocampo College of Computer Studies, AMA University, Philippines
Mia V. Villarica College of Computer Studies, Laguna State Polytechnic University, Philippines
Albert A. Vinluan College of Computer Studies, Information Communication Technology, Isabela State University, Philippines

Abstract

Purpose – The main purpose of this research is to develop a Tagalog-specific lip-reading model utilizing a multimodal deep learning approach, with a focus on visual and textual information. The research will address the underrepresentation of linguistically diverse languages in lip-reading research such as Tagalog. It aims to enhance communication between native and non-native Tagalog speakers who are deaf and hard of hearing, paving the way for a linguistically inclusive AI and lip-reading system.

Method – The research will employ the use of a hybrid multimodal convolutional neural network and long-term short-term memory model that is inspired by the LipNet Architecture, by integrating facial landmarks and contextual language information with a multimodal approach.

Results – The proposed Tagalog lip-reading model generated an increase in processing speed of at least 25%, optimized both by training and evaluation phases without compromising accuracy. Highlights of the training show great results in 80 epochs together with a validation accuracy of 89.5%.

Conclusion – The research showed the efficacy of the multimodal approach, proving the advantages of integrating visual and textual information for lip-reading tasks in the Tagalog language. The research has achieved a great result in terms of performance by tailoring the model architecture to the unique phonetic features of the Tagalog language.

Recommendations – Future research can explore the generalizability of the proposed model to other unexplored languages, considering its adaptability to various speaking styles, accents, and noise levels.

Research Implications – The success of this research in generating a lip-reading model for the Tagalog language showcased the significance of linguistically diverse datasets with a multimodal approach for the broad use of human-computer interaction.

Author Biographies

Nikie Jo E. Deocampo, College of Computer Studies, AMA University, Philippines

Mr. Nikie Jo Deocampo, is a professional with a background in Information Systems, holding a Master's degree in Information Technology, and is currently pursuing a Doctorate in Information Technology. With expertise in Web Technologies, Cloud Computing, and Machine Learning, Mr. Deocampo is a budding researcher committed to making a positive impact on the community through innovative research that aims to improve the quality of life. As a dedicated educator currently teaching at a university, Mr. Deocampo is passionate about imparting knowledge and fostering learning. Eager to collaborate with top researchers in IT and Computer Science, Mr. Deocampo envisions contributing significantly to the advancement of these fields.

Mia V. Villarica, College of Computer Studies, Laguna State Polytechnic University, Philippines

Dr. Mia V. Villarica, a professor from Laguna State Polytechnic University, specializes in Data Mining, Information and Communications Technology, and eLearning. She has contributed significantly to academic research, co-authoring papers such as "Serbigo Serbisyo on the Go!: Online job order mobile application for non-professional workers" and "Classification of Coffee Variety using Electronic Nose" published in 2022. Additionally, her work includes "Correlation Analysis between Sensors for Sensing Coffee Variations" presented at the 2022 IEEE 18th International Colloquium on Signal Processing & Applications. Villarica's research interests also extend to developing innovative solutions, as evidenced by her involvement in various projects.

Albert A. Vinluan, College of Computer Studies, Information Communication Technology, Isabela State University, Philippines

Dr. Albert A. Vinluan is a professor from Isabela State University. With expertise in Machine Learning, Supervised Learning, Data Mining, Applied Artificial Intelligence, and more, Dr. Vinluan has made significant research contributions. His publications include "Emotional analysis and prediction based on online book user comments" and "Research on Emotional Analysis of Online Book Reviews Based on Word2Vec Method," both in 2023, as well as "Barrier-free routes in a geographic information system for mobility impaired people" in 2022.

A Lip-Reading Model for Tagalog Using Multimodal Deep Learning Approach

Abstract

Author Biographies

Most read articles by the same author(s)

Other Journal by STEP Academic