Speech recognition software for english pronunciation development: A systematic review

Authors

DOI:

https://doi.org/10.36825/RITI.13.29.014

Keywords:

Speech Recognition, Software, Pronunciation Quality, Pronunciation Assessment, Voice Analysis

Abstract

This article presents a systematic review of the literature focusing on software tools designed to improve English vocabulary pronunciation through speech recognition.  The main objective is to identify and analyze the features, challenges, and effectiveness of these tools.  A comparative analysis of various existing solutions will be carried out, evaluating their impact on pronunciation improvement, their specific functionalities for English vocabulary learning and their ability to adapt to different teaching contexts.  The purpose of this study is to provide a comprehensive overview of the current state of the art of speech recognition tools in this field, highlighting areas for improvement and possible future directions for research and development.

References

Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. L., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., McGuinness, L. A., Stewart, L. A., Thomas, J., Tricco, A. C., Welch, V. A., Whiting, P., Moher, D., Yepes-Nuñez, J. J., Urrútia, G., Romero-García, M., Alonso-Fernández, S. (2021). Declaración PRISMA 2020: Una guía actualizada para la publicación de revisiones sistemáticas. Revista Española de Cardiología, 74 (9), 790–799. https://doi.org/10.1016/j.recesp.2021.06.016

Chen, Y., Hu, J., Zhang, X. (2019). SELL-Corpus: An open source multiple accented Chinese-English speech corpus for L2 English learning assessment. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK. https://doi.org/10.1109/ICASSP.2019.8682612

Anand, N., Sirigiraju, M., Yarra, C. (2023). Unsupervised pronunciation assessment analysis using utterance level alignment distance with self-supervised representations. IEEE 20th India Council International Conference (INDICON), Hyderabad, India. https://doi.org/10.1109/INDICON59947.2023.10440736

Boyi, H., Guangliang, L. (2024). Analysis of English speech learning quality based on speech recognition technology. International Conference on Optimization Computing and Wireless Communication (ICOCWC), Tabor, Ethiopia. https://doi.org/10.1109/ICOCWC60930.2024.10470570

Hoesen, D., Putri, F. Y., Lestari, D. P. (2019). Automatic pronunciation generator for Indonesian speech recognition system based on sequence-to-sequence model. 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), Cebu, Philippines. https://doi.org/10.1109/O-COCOSDA46868.2019.9041182

Nandal, P., Kadian, Y., Upadhyay, S., Mudgal, B. P. (2021). Pronunciation accuracy calculator using machine learning. 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India. https://doi.org/10.1109/ICCMC51019.2021.9418381

Korzekwa, D., Lorenzo-Trueba, J., Zaporowski, S., Calamaro, S., Drugman, T., Kostek, B. (2021). Mispronunciation detection in nonnative (L2) English with uncertainty modeling. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada. https://doi.org/10.1109/ICASSP39728.2021.9413953

Lesnov, A., Zueva, N., Turalchuk, K. (2024). Leveraging deep learning for automatic pronunciation assessment in a mobile application. 4th International Conference on Technology Enhanced Learning in Higher Education (TELE), Lipetsk, Russian Federation. https://doi.org/10.1109/TELE62556.2024.10605133

Getman, Y., Phan, N., Al-Ghezi, R., Voskoboinik, E., Singh, M., Grosz, T., Kurimo, M., Salvi, G., Svendsen, T., Strömbergsson, S., Smolander, A., Ylinen, S. (2023). Developing an AI-assisted low-resource spoken language learning app for children. IEEE Access, 11, 86025–86037. https://doi.org/10.1109/ACCESS.2023.3304274

You, Z., Nijat, M., Shi, Y., Chen, C., Du, W., Hamdulla, A., Wang, D. (2023). Zero-shot mispronunciation detection by knowledge-based data augmentation. 26th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA), Delhi, India. https://doi.org/10.1109/O-COCOSDA60357.2023.10482946

Zhang, Z. (2023). Research on oral language evaluation method of business English based on speech recognition. 7th Asian Conference on Artificial Intelligence Technology (ACAIT), Jiaxing, China. https://doi.org/10.1109/ACAIT60137.2023.10528551

Tolba, R. M., Elarif, T., Taha, Z., Hammady, R. (2024). Interactive augmented reality system for learning phonetics using artificial intelligence. IEEE Access, 12, 78219–78231. https://doi.org/10.1109/ACCESS.2024.3406494

Ke, D., Yao, W., Hu, R., Huang, L., Luo, Q., Shu, W. (2022). A new spoken language teaching tech: Combining multi-attention and adain for one-shot cross-language voice conversion. 13th International Symposium on Chinese Spoken Language Processing (ISCSLP), Singapore, Singapore. https://doi.org/10.1109/ISCSLP57327.2022.10038137

Yan, B.-C., Wang, H.-W., Wang, Y.-C., Li, J.-T., Lin, C.-H., Chen, B. (2023). Preserving phonemic distinctions for ordinal regression: A novel loss function for automatic pronunciation assessment. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Taipei, Taiwan. https://doi.org/10.1109/ASRU57964.2023.10389777

Lin, B., Wang, L. (2023). Multi-lingual pronunciation assessment with unified phoneme set and language-specific embeddings. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece. https://doi.org/10.1109/ICASSP49357.2023.10095673

Lin, B., Wang, L. (2021). Uncertainty estimation in automatic pronunciation assessment with pseudo samples based on deep kernel learning. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Tokyo, Japan. https://ieeexplore.ieee.org/document/9689465

Zahran, A. I., Fahmy, A. A., Wassif, K. T., Bayomi, H. (2023). Fine-tuning self-supervised learning models for end-to-end pronunciation scoring. IEEE Access, 11, 112650–112663. https://doi.org/10.1109/ACCESS.2023.3317236

Sancinetti, M., Vidal, J., Bonomi, C., Ferrer, L. (2022). A transfer learning approach for pronunciation scoring. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore. https://doi.org/10.1109/ICASSP43922.2022.9747727

Du, L. (2024). Evaluation of English pronunciation interaction quality based on deep learning. International Conference on Integrated Circuits and Communication Systems (ICICACS), Raichur, India. https://doi.org/10.1109/ICICACS60521.2024.10498572

Pei, H.-C., Fang, H., Luo, X., Xu, X.-S. (2023). Gradformer: A framework for multi-aspect multi-granularity pronunciation assessment. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32, 554–563. https://doi.org/10.1109/TASLP.2023.3335807

Zhu, C., Wumaier, A., Wei, D., Fan, Z., Yang, J., Yu, H., Kadeer, Z., Wang, L. (2024). Pronunciation error detection model based on feature fusion. Speech Communication, 156, 1-12. https://doi.org/10.1016/j.specom.2023.103009

Korzekwa, D., Lorenzo-Trueba, J., Drugman, T., Kostek, B. (2022). Computer-assisted pronunciation training—Speech synthesis is almost all you need. Speech Communication, 142, 1–12. https://doi.org/10.1016/j.specom.2022.06.003

Masyhur, A. M., Andriyani, A. D., Sakinah, N., Nafisyah, D., Windarwan, A. M., Riyadi, S. (2023). Android-based English vocabulary learning media using speech recognition and augmented reality. International Workshop on Artificial Intelligence and Image Processing (IWAIIP), Yogyakarta, Indonesia. https://doi.org/10.1109/IWAIIP58158.2023.10462739

Kostyuchenko, E., Rakhmanenko, I., Lapina, M. (2021). Evaluation of a method for measuring speech quality based on an authentication approach using a correlation criterion. 17th International Conference on Intelligent Environments (IE), Dubai, United Arab Emirates. https://doi.org/10.1109/IE51775.2021.9486435

Wang, H., Xu, J., Ge, H., Wang, Y. (2019). Design and implementation of an English pronunciation scoring system for pupils based on DNN-HMM. 10th International Conference on Information Technology in Medicine and Education (ITME), Qingdao, China. https://doi.org/10.1109/ITME.2019.00085

Yu, C. (2024). Application of artificial intelligence and speech recognition technology in English listening and speaking ability assessment system. 2nd International Conference on Mechatronics, IoT and Industrial Informatics (ICMIII), Melbourne, Australia. https://doi.ieeecomputersociety.org/10.1109/ICMIII62623.2024.00070

Sheoran, K., Bajgoti, A., Gupta, R., Jatana, N., Dhand, G., Gupta, C., Dadheech, P., Yahya, U., Aneja, N. (2023). Pronunciation scoring with goodness of pronunciation and dynamic time warping. IEEE Access, 11, 15485–15495. https://doi.org/10.1109/ACCESS.2023.3244393

Tejedor-García, C., Escudero-Mancebo, D., Cámara-Arenas, E., Gonzalez-Ferreras, C., Cardeñoso Payo, V. (2020). Assessing pronunciation improvement in students of English using a controlled computer-assisted pronunciation tool. IEEE Transactions on Learning Technologies, 13 (2), 269–282. https://doi.org/10.1109/TLT.2020.2980261

Hair, A., Ballard, K. J., Markoulli, C., Monroe, P., Mckechnie, J., Ahmed, B., Gutierrez-Osuna, R. (2021). A longitudinal evaluation of tablet-based child speech therapy with Apraxia World. ACM Transactions on Accessible Computing (TACCESS), 14 (1), 1–26. https://doi.org/10.1145/3433607

Gómez González, M. A., Ferreiro, A. L. (2024). Web-assisted instruction for teaching and learning EFL phonetics to Spanish learners: Effectiveness, perceptions and challenges. Computers and Education Open, 7, 1-17. https://doi.org/10.1016/j.caeo.2024.100214

Tejedor-García, C., Escudero-Mancebo, D., Cardeñoso-Payo, V., González-Ferreras, C. (2020). Using challenges to enhance a learning game for pronunciation training of English as a second language. IEEE Access, 8, 74250–74266. https://doi.org/10.1109/ACCESS.2020.2988406

Published

2025-06-29

How to Cite

García Pazos, E. A., Escalante Vega, J. E., Ramírez, O. A., & Castañeda Sánchez , F. (2025). Speech recognition software for english pronunciation development: A systematic review. Revista De Investigación En Tecnologías De La Información, 13(29), 166–179. https://doi.org/10.36825/RITI.13.29.014

Issue

Section

Artículos

Most read articles by the same author(s)