International Conference on Machine Learning, Artificial Intelligence and Data Science

Sunakshi Mehra Profile

Sunakshi Mehra

Biography

Her publications focus on integrating advanced architectures?such as BiLSTMs, CNNs, and Transformer-based models?for tasks involving speech processing, human?computer interaction, and decision-level fusion strategies. In addition to her research and teaching responsibilities, Dr. Mehra actively contributes to the academic community as a reviewer for reputed international journals and conferences. Her scholarly engagement reflects her commitment to fostering high-quality research and mentoring within the AI community. Her broader vision is to bridge the gap between AI research and real-world applications, particularly in healthcare, assistive technologies, and human-centric computing. Through her work, Dr. Mehra strives to advance intelligent systems that not only achieve technical excellence but also create meaningful social impact.

Research Interest

Her research lies in the fields of Speech Recognition, Natural Language Processing (NLP), and Computer Vision, with a particular emphasis on multimodal deep learning. By combining audio, linguistic, and visual modalities, her work addresses critical challenges such as speech impairments, accented speech, and low-resource language processing.

Abstract

Multimodal Fusion Strategies in Deep Learning: Advancing Speech Recognition with Audio, Visual, and Linguistic Features: This talk will explore the role of multimodal deep learning frameworks in enhancing automatic speech recognition (ASR) systems. By leveraging diverse feature streams?including spectrogram-based acoustic cues, phonetic embeddings, and visual signals?multimodal fusion strategies can address challenges such as accented speech, speech impairments, and low-resource languages. The presentation will highlight experimental findings with late, early, and hybrid fusion techniques, discuss their performance across varied datasets, and outline potential applications in healthcare, human-computer interaction, and real-world speech technologies.