Software de reconocimiento de voz para el desarrollo de la pronunciación de inglés: Una revisión sistemática

Eliud Alberto García Pazos; Juana Elisa Escalante Vega; Oscar Alonso Ramírez; Fredy Castañeda Sánchez

doi:10.36825/RITI.13.29.014

Autores/as

Eliud Alberto García Pazos Universidad Veracruzana https://orcid.org/0009-0002-5724-3030
Juana Elisa Escalante Vega Universidad Veracruzana https://orcid.org/0000-0001-8192-6267
Oscar Alonso Ramírez Universidad Veracruzana https://orcid.org/0009-0007-6476-5781
Fredy Castañeda Sánchez Universidad Veracruzana https://orcid.org/0000-0003-1675-6438

DOI:

https://doi.org/10.36825/RITI.13.29.014

Palabras clave:

Reconocimiento de Voz, Software, Calidad de la Pronunciación, Evaluación de la Pronunciación, Análisis de Voz

Resumen

Este artículo presenta una revisión sistemática de la literatura enfocada en herramientas de software diseñadas para mejorar la pronunciación del vocabulario en inglés mediante el reconocimiento de voz. El objetivo principal es identificar y analizar las características, desafíos y efectividad de estas herramientas. Se realizó un análisis comparativo de diversas soluciones existentes, evaluando su impacto en la mejora de la pronunciación, sus funcionalidades específicas para el aprendizaje de vocabulario en inglés y su capacidad para adaptarse a diferentes contextos de enseñanza. El propósito de este estudio es proporcionar una visión integral del estado actual de las herramientas de reconocimiento de voz en este campo, destacando áreas para mejorar y posibles direcciones futuras para la investigación y el desarrollo.

Citas

Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. L., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., McGuinness, L. A., Stewart, L. A., Thomas, J., Tricco, A. C., Welch, V. A., Whiting, P., Moher, D., Yepes-Nuñez, J. J., Urrútia, G., Romero-García, M., Alonso-Fernández, S. (2021). Declaración PRISMA 2020: Una guía actualizada para la publicación de revisiones sistemáticas. Revista Española de Cardiología, 74 (9), 790–799. https://doi.org/10.1016/j.recesp.2021.06.016

Chen, Y., Hu, J., Zhang, X. (2019). SELL-Corpus: An open source multiple accented Chinese-English speech corpus for L2 English learning assessment. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK. https://doi.org/10.1109/ICASSP.2019.8682612

Anand, N., Sirigiraju, M., Yarra, C. (2023). Unsupervised pronunciation assessment analysis using utterance level alignment distance with self-supervised representations. IEEE 20th India Council International Conference (INDICON), Hyderabad, India. https://doi.org/10.1109/INDICON59947.2023.10440736

Boyi, H., Guangliang, L. (2024). Analysis of English speech learning quality based on speech recognition technology. International Conference on Optimization Computing and Wireless Communication (ICOCWC), Tabor, Ethiopia. https://doi.org/10.1109/ICOCWC60930.2024.10470570

Hoesen, D., Putri, F. Y., Lestari, D. P. (2019). Automatic pronunciation generator for Indonesian speech recognition system based on sequence-to-sequence model. 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), Cebu, Philippines. https://doi.org/10.1109/O-COCOSDA46868.2019.9041182

Nandal, P., Kadian, Y., Upadhyay, S., Mudgal, B. P. (2021). Pronunciation accuracy calculator using machine learning. 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India. https://doi.org/10.1109/ICCMC51019.2021.9418381

Korzekwa, D., Lorenzo-Trueba, J., Zaporowski, S., Calamaro, S., Drugman, T., Kostek, B. (2021). Mispronunciation detection in nonnative (L2) English with uncertainty modeling. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada. https://doi.org/10.1109/ICASSP39728.2021.9413953

Lesnov, A., Zueva, N., Turalchuk, K. (2024). Leveraging deep learning for automatic pronunciation assessment in a mobile application. 4th International Conference on Technology Enhanced Learning in Higher Education (TELE), Lipetsk, Russian Federation. https://doi.org/10.1109/TELE62556.2024.10605133

Getman, Y., Phan, N., Al-Ghezi, R., Voskoboinik, E., Singh, M., Grosz, T., Kurimo, M., Salvi, G., Svendsen, T., Strömbergsson, S., Smolander, A., Ylinen, S. (2023). Developing an AI-assisted low-resource spoken language learning app for children. IEEE Access, 11, 86025–86037. https://doi.org/10.1109/ACCESS.2023.3304274

You, Z., Nijat, M., Shi, Y., Chen, C., Du, W., Hamdulla, A., Wang, D. (2023). Zero-shot mispronunciation detection by knowledge-based data augmentation. 26th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA), Delhi, India. https://doi.org/10.1109/O-COCOSDA60357.2023.10482946

Zhang, Z. (2023). Research on oral language evaluation method of business English based on speech recognition. 7th Asian Conference on Artificial Intelligence Technology (ACAIT), Jiaxing, China. https://doi.org/10.1109/ACAIT60137.2023.10528551

Tolba, R. M., Elarif, T., Taha, Z., Hammady, R. (2024). Interactive augmented reality system for learning phonetics using artificial intelligence. IEEE Access, 12, 78219–78231. https://doi.org/10.1109/ACCESS.2024.3406494

Ke, D., Yao, W., Hu, R., Huang, L., Luo, Q., Shu, W. (2022). A new spoken language teaching tech: Combining multi-attention and adain for one-shot cross-language voice conversion. 13th International Symposium on Chinese Spoken Language Processing (ISCSLP), Singapore, Singapore. https://doi.org/10.1109/ISCSLP57327.2022.10038137

Yan, B.-C., Wang, H.-W., Wang, Y.-C., Li, J.-T., Lin, C.-H., Chen, B. (2023). Preserving phonemic distinctions for ordinal regression: A novel loss function for automatic pronunciation assessment. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Taipei, Taiwan. https://doi.org/10.1109/ASRU57964.2023.10389777

Lin, B., Wang, L. (2023). Multi-lingual pronunciation assessment with unified phoneme set and language-specific embeddings. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece. https://doi.org/10.1109/ICASSP49357.2023.10095673

Lin, B., Wang, L. (2021). Uncertainty estimation in automatic pronunciation assessment with pseudo samples based on deep kernel learning. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Tokyo, Japan. https://ieeexplore.ieee.org/document/9689465

Zahran, A. I., Fahmy, A. A., Wassif, K. T., Bayomi, H. (2023). Fine-tuning self-supervised learning models for end-to-end pronunciation scoring. IEEE Access, 11, 112650–112663. https://doi.org/10.1109/ACCESS.2023.3317236

Sancinetti, M., Vidal, J., Bonomi, C., Ferrer, L. (2022). A transfer learning approach for pronunciation scoring. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore. https://doi.org/10.1109/ICASSP43922.2022.9747727

Du, L. (2024). Evaluation of English pronunciation interaction quality based on deep learning. International Conference on Integrated Circuits and Communication Systems (ICICACS), Raichur, India. https://doi.org/10.1109/ICICACS60521.2024.10498572

Pei, H.-C., Fang, H., Luo, X., Xu, X.-S. (2023). Gradformer: A framework for multi-aspect multi-granularity pronunciation assessment. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32, 554–563. https://doi.org/10.1109/TASLP.2023.3335807

Zhu, C., Wumaier, A., Wei, D., Fan, Z., Yang, J., Yu, H., Kadeer, Z., Wang, L. (2024). Pronunciation error detection model based on feature fusion. Speech Communication, 156, 1-12. https://doi.org/10.1016/j.specom.2023.103009

Korzekwa, D., Lorenzo-Trueba, J., Drugman, T., Kostek, B. (2022). Computer-assisted pronunciation training—Speech synthesis is almost all you need. Speech Communication, 142, 1–12. https://doi.org/10.1016/j.specom.2022.06.003

Masyhur, A. M., Andriyani, A. D., Sakinah, N., Nafisyah, D., Windarwan, A. M., Riyadi, S. (2023). Android-based English vocabulary learning media using speech recognition and augmented reality. International Workshop on Artificial Intelligence and Image Processing (IWAIIP), Yogyakarta, Indonesia. https://doi.org/10.1109/IWAIIP58158.2023.10462739

Kostyuchenko, E., Rakhmanenko, I., Lapina, M. (2021). Evaluation of a method for measuring speech quality based on an authentication approach using a correlation criterion. 17th International Conference on Intelligent Environments (IE), Dubai, United Arab Emirates. https://doi.org/10.1109/IE51775.2021.9486435

Wang, H., Xu, J., Ge, H., Wang, Y. (2019). Design and implementation of an English pronunciation scoring system for pupils based on DNN-HMM. 10th International Conference on Information Technology in Medicine and Education (ITME), Qingdao, China. https://doi.org/10.1109/ITME.2019.00085

Yu, C. (2024). Application of artificial intelligence and speech recognition technology in English listening and speaking ability assessment system. 2nd International Conference on Mechatronics, IoT and Industrial Informatics (ICMIII), Melbourne, Australia. https://doi.ieeecomputersociety.org/10.1109/ICMIII62623.2024.00070

Sheoran, K., Bajgoti, A., Gupta, R., Jatana, N., Dhand, G., Gupta, C., Dadheech, P., Yahya, U., Aneja, N. (2023). Pronunciation scoring with goodness of pronunciation and dynamic time warping. IEEE Access, 11, 15485–15495. https://doi.org/10.1109/ACCESS.2023.3244393

Tejedor-García, C., Escudero-Mancebo, D., Cámara-Arenas, E., Gonzalez-Ferreras, C., Cardeñoso Payo, V. (2020). Assessing pronunciation improvement in students of English using a controlled computer-assisted pronunciation tool. IEEE Transactions on Learning Technologies, 13 (2), 269–282. https://doi.org/10.1109/TLT.2020.2980261

Hair, A., Ballard, K. J., Markoulli, C., Monroe, P., Mckechnie, J., Ahmed, B., Gutierrez-Osuna, R. (2021). A longitudinal evaluation of tablet-based child speech therapy with Apraxia World. ACM Transactions on Accessible Computing (TACCESS), 14 (1), 1–26. https://doi.org/10.1145/3433607

Gómez González, M. A., Ferreiro, A. L. (2024). Web-assisted instruction for teaching and learning EFL phonetics to Spanish learners: Effectiveness, perceptions and challenges. Computers and Education Open, 7, 1-17. https://doi.org/10.1016/j.caeo.2024.100214

Tejedor-García, C., Escudero-Mancebo, D., Cardeñoso-Payo, V., González-Ferreras, C. (2020). Using challenges to enhance a learning game for pronunciation training of English as a second language. IEEE Access, 8, 74250–74266. https://doi.org/10.1109/ACCESS.2020.2988406

Software de reconocimiento de voz para el desarrollo de la pronunciación de inglés: Una revisión sistemática

Autores/as

DOI:

Palabras clave:

Resumen

Citas

Descargas

Publicado

Cómo citar

Número

Sección

Licencia

Artículos más leídos del mismo autor/a

Enviar un artículo

Idioma

Información