Software de reconocimiento de voz para el desarrollo de la pronunciación de inglés: Una revisión sistemática
DOI:
https://doi.org/10.36825/RITI.13.29.014Palabras clave:
Reconocimiento de Voz, Software, Calidad de la Pronunciación, Evaluación de la Pronunciación, Análisis de VozResumen
Este artículo presenta una revisión sistemática de la literatura enfocada en herramientas de software diseñadas para mejorar la pronunciación del vocabulario en inglés mediante el reconocimiento de voz. El objetivo principal es identificar y analizar las características, desafíos y efectividad de estas herramientas. Se realizó un análisis comparativo de diversas soluciones existentes, evaluando su impacto en la mejora de la pronunciación, sus funcionalidades específicas para el aprendizaje de vocabulario en inglés y su capacidad para adaptarse a diferentes contextos de enseñanza. El propósito de este estudio es proporcionar una visión integral del estado actual de las herramientas de reconocimiento de voz en este campo, destacando áreas para mejorar y posibles direcciones futuras para la investigación y el desarrollo.
Citas
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. L., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., McGuinness, L. A., Stewart, L. A., Thomas, J., Tricco, A. C., Welch, V. A., Whiting, P., Moher, D., Yepes-Nuñez, J. J., Urrútia, G., Romero-García, M., Alonso-Fernández, S. (2021). Declaración PRISMA 2020: Una guía actualizada para la publicación de revisiones sistemáticas. Revista Española de Cardiología, 74 (9), 790–799. https://doi.org/10.1016/j.recesp.2021.06.016
Chen, Y., Hu, J., Zhang, X. (2019). SELL-Corpus: An open source multiple accented Chinese-English speech corpus for L2 English learning assessment. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK. https://doi.org/10.1109/ICASSP.2019.8682612
Anand, N., Sirigiraju, M., Yarra, C. (2023). Unsupervised pronunciation assessment analysis using utterance level alignment distance with self-supervised representations. IEEE 20th India Council International Conference (INDICON), Hyderabad, India. https://doi.org/10.1109/INDICON59947.2023.10440736
Boyi, H., Guangliang, L. (2024). Analysis of English speech learning quality based on speech recognition technology. International Conference on Optimization Computing and Wireless Communication (ICOCWC), Tabor, Ethiopia. https://doi.org/10.1109/ICOCWC60930.2024.10470570
Hoesen, D., Putri, F. Y., Lestari, D. P. (2019). Automatic pronunciation generator for Indonesian speech recognition system based on sequence-to-sequence model. 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), Cebu, Philippines. https://doi.org/10.1109/O-COCOSDA46868.2019.9041182
Nandal, P., Kadian, Y., Upadhyay, S., Mudgal, B. P. (2021). Pronunciation accuracy calculator using machine learning. 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India. https://doi.org/10.1109/ICCMC51019.2021.9418381
Korzekwa, D., Lorenzo-Trueba, J., Zaporowski, S., Calamaro, S., Drugman, T., Kostek, B. (2021). Mispronunciation detection in nonnative (L2) English with uncertainty modeling. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada. https://doi.org/10.1109/ICASSP39728.2021.9413953
Lesnov, A., Zueva, N., Turalchuk, K. (2024). Leveraging deep learning for automatic pronunciation assessment in a mobile application. 4th International Conference on Technology Enhanced Learning in Higher Education (TELE), Lipetsk, Russian Federation. https://doi.org/10.1109/TELE62556.2024.10605133
Getman, Y., Phan, N., Al-Ghezi, R., Voskoboinik, E., Singh, M., Grosz, T., Kurimo, M., Salvi, G., Svendsen, T., Strömbergsson, S., Smolander, A., Ylinen, S. (2023). Developing an AI-assisted low-resource spoken language learning app for children. IEEE Access, 11, 86025–86037. https://doi.org/10.1109/ACCESS.2023.3304274
You, Z., Nijat, M., Shi, Y., Chen, C., Du, W., Hamdulla, A., Wang, D. (2023). Zero-shot mispronunciation detection by knowledge-based data augmentation. 26th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA), Delhi, India. https://doi.org/10.1109/O-COCOSDA60357.2023.10482946
Zhang, Z. (2023). Research on oral language evaluation method of business English based on speech recognition. 7th Asian Conference on Artificial Intelligence Technology (ACAIT), Jiaxing, China. https://doi.org/10.1109/ACAIT60137.2023.10528551
Tolba, R. M., Elarif, T., Taha, Z., Hammady, R. (2024). Interactive augmented reality system for learning phonetics using artificial intelligence. IEEE Access, 12, 78219–78231. https://doi.org/10.1109/ACCESS.2024.3406494
Ke, D., Yao, W., Hu, R., Huang, L., Luo, Q., Shu, W. (2022). A new spoken language teaching tech: Combining multi-attention and adain for one-shot cross-language voice conversion. 13th International Symposium on Chinese Spoken Language Processing (ISCSLP), Singapore, Singapore. https://doi.org/10.1109/ISCSLP57327.2022.10038137
Yan, B.-C., Wang, H.-W., Wang, Y.-C., Li, J.-T., Lin, C.-H., Chen, B. (2023). Preserving phonemic distinctions for ordinal regression: A novel loss function for automatic pronunciation assessment. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Taipei, Taiwan. https://doi.org/10.1109/ASRU57964.2023.10389777
Lin, B., Wang, L. (2023). Multi-lingual pronunciation assessment with unified phoneme set and language-specific embeddings. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece. https://doi.org/10.1109/ICASSP49357.2023.10095673
Lin, B., Wang, L. (2021). Uncertainty estimation in automatic pronunciation assessment with pseudo samples based on deep kernel learning. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Tokyo, Japan. https://ieeexplore.ieee.org/document/9689465
Zahran, A. I., Fahmy, A. A., Wassif, K. T., Bayomi, H. (2023). Fine-tuning self-supervised learning models for end-to-end pronunciation scoring. IEEE Access, 11, 112650–112663. https://doi.org/10.1109/ACCESS.2023.3317236
Sancinetti, M., Vidal, J., Bonomi, C., Ferrer, L. (2022). A transfer learning approach for pronunciation scoring. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore. https://doi.org/10.1109/ICASSP43922.2022.9747727
Du, L. (2024). Evaluation of English pronunciation interaction quality based on deep learning. International Conference on Integrated Circuits and Communication Systems (ICICACS), Raichur, India. https://doi.org/10.1109/ICICACS60521.2024.10498572
Pei, H.-C., Fang, H., Luo, X., Xu, X.-S. (2023). Gradformer: A framework for multi-aspect multi-granularity pronunciation assessment. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32, 554–563. https://doi.org/10.1109/TASLP.2023.3335807
Zhu, C., Wumaier, A., Wei, D., Fan, Z., Yang, J., Yu, H., Kadeer, Z., Wang, L. (2024). Pronunciation error detection model based on feature fusion. Speech Communication, 156, 1-12. https://doi.org/10.1016/j.specom.2023.103009
Korzekwa, D., Lorenzo-Trueba, J., Drugman, T., Kostek, B. (2022). Computer-assisted pronunciation training—Speech synthesis is almost all you need. Speech Communication, 142, 1–12. https://doi.org/10.1016/j.specom.2022.06.003
Masyhur, A. M., Andriyani, A. D., Sakinah, N., Nafisyah, D., Windarwan, A. M., Riyadi, S. (2023). Android-based English vocabulary learning media using speech recognition and augmented reality. International Workshop on Artificial Intelligence and Image Processing (IWAIIP), Yogyakarta, Indonesia. https://doi.org/10.1109/IWAIIP58158.2023.10462739
Kostyuchenko, E., Rakhmanenko, I., Lapina, M. (2021). Evaluation of a method for measuring speech quality based on an authentication approach using a correlation criterion. 17th International Conference on Intelligent Environments (IE), Dubai, United Arab Emirates. https://doi.org/10.1109/IE51775.2021.9486435
Wang, H., Xu, J., Ge, H., Wang, Y. (2019). Design and implementation of an English pronunciation scoring system for pupils based on DNN-HMM. 10th International Conference on Information Technology in Medicine and Education (ITME), Qingdao, China. https://doi.org/10.1109/ITME.2019.00085
Yu, C. (2024). Application of artificial intelligence and speech recognition technology in English listening and speaking ability assessment system. 2nd International Conference on Mechatronics, IoT and Industrial Informatics (ICMIII), Melbourne, Australia. https://doi.ieeecomputersociety.org/10.1109/ICMIII62623.2024.00070
Sheoran, K., Bajgoti, A., Gupta, R., Jatana, N., Dhand, G., Gupta, C., Dadheech, P., Yahya, U., Aneja, N. (2023). Pronunciation scoring with goodness of pronunciation and dynamic time warping. IEEE Access, 11, 15485–15495. https://doi.org/10.1109/ACCESS.2023.3244393
Tejedor-García, C., Escudero-Mancebo, D., Cámara-Arenas, E., Gonzalez-Ferreras, C., Cardeñoso Payo, V. (2020). Assessing pronunciation improvement in students of English using a controlled computer-assisted pronunciation tool. IEEE Transactions on Learning Technologies, 13 (2), 269–282. https://doi.org/10.1109/TLT.2020.2980261
Hair, A., Ballard, K. J., Markoulli, C., Monroe, P., Mckechnie, J., Ahmed, B., Gutierrez-Osuna, R. (2021). A longitudinal evaluation of tablet-based child speech therapy with Apraxia World. ACM Transactions on Accessible Computing (TACCESS), 14 (1), 1–26. https://doi.org/10.1145/3433607
Gómez González, M. A., Ferreiro, A. L. (2024). Web-assisted instruction for teaching and learning EFL phonetics to Spanish learners: Effectiveness, perceptions and challenges. Computers and Education Open, 7, 1-17. https://doi.org/10.1016/j.caeo.2024.100214
Tejedor-García, C., Escudero-Mancebo, D., Cardeñoso-Payo, V., González-Ferreras, C. (2020). Using challenges to enhance a learning game for pronunciation training of English as a second language. IEEE Access, 8, 74250–74266. https://doi.org/10.1109/ACCESS.2020.2988406
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2025 Revista de Investigación en Tecnologías de la Información

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial 4.0.
Esta revista proporciona un acceso abierto a su contenido, basado en el principio de que ofrecer al público un acceso libre a las investigaciones ayuda a un mayor intercambio global del conocimiento.
El texto publicado en la Revista de Investigación en Tecnologías de la Información (RITI) se distribuye bajo la licencia Creative Commons (CC BY-NC), que permite a terceros utilizar lo publicado citando a los autores del trabajo y a RITI, pero sin hacer uso del material con propósitos comerciales.