Enhancing speech recognition through diverse shared features accent classification
Amina Salifu, Henry Nunoo Mensah, Eric Tutu Tchao, Francisca Adoma Acheampong, Andrew Selasi Agbemenu & Jerry John Kponyo
Accurately recognising and processing accented speech remains a significant hurdle for automated systems due to the vast diversity of native and non-native languages. This work presents a novel approach to accent classification, tackling this challenge and paving the way for enhanced speech recognition systems. The paper introduces two key innovations: novel preprocessing algorithms and a unique architecture that identifies standard acoustic and prosodic features among speakers with the same native language. We propose a new framework integrating accent classification with speech recognition technology, specifically targeting virtual assistants. This framework aims to significantly improve the accuracy of voice command interpretation across diverse accents, promoting more inclusive and effective human-computer interaction. Our experiments demonstrate the effectiveness of this approach. Utilising the Speech Archive dataset and Accentdb, we establish a baseline accuracy of 78% with our convolutional neural network. By incorporating intra-native features, the accuracy improved to 81%. By integrating cross-native shared features, we substantially enhanced performance, achieving an accuracy of 99%. These results highlight our approach’s potential to enhance automated systems’ ability to understand and interact with speakers from diverse backgrounds, offering a promising direction for future developments in accent classification, recognition, and speech technology.