Identificación del acento en hablantes de español mediante el análisis de atributos MFCC y aprendizaje supervisado

Carmen Victoria Robles Contreras; Maya Carrillo Ruiz; José Luis Hernández Ameca; Francisco Javier Robles Mendoza

doi:10.36825/RITI.12.26.002

Authors

Carmen Victoria Robles Contreras Benemérita Universidad Autónoma de Puebla https://orcid.org/0000-0001-7964-7883
Maya Carrillo Ruiz Benemérita Universidad Autónoma de Puebla https://orcid.org/0000-0001-6152-456X
José Luis Hernández Ameca Benemérita Universidad Autónoma de Puebla https://orcid.org/0000-0002-7672-5409
Francisco Javier Robles Mendoza Benemérita Universidad Autónoma de Puebla https://orcid.org/0009-0007-3176-5005

DOI:

https://doi.org/10.36825/RITI.12.26.002

Keywords:

Accent Recognition, MFCC, Machine Learning Algorithms, Supervised Learning

Abstract

Speaker recognition has multiple real-life applications. The purpose of this study is to determine the feasibility of classifying samples of human speech, specifically Spanish speakers, based on their distinctive accents. In this work, Mel-Frequency Cepstral Coefficients (MFCC) combined with machine learning techniques were used to identify the nationality of Spanish-speaking individuals through voice recordings obtained from the Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech corpus. Data preprocessing was performed by extracting 50 MFCC from each recording, which formed the dataset for experimentation. Experiments were conducted with different subsets, and the best results were obtained with individuals from four Latin American countries, including both males and females. Neural networks were employed for the classification stage, achieving an accuracy of 99.84%.

References

Wei, H., Cheong-Fat, C., Chiu-Sing, C., Kong-Pang, P. (2006). An efficient MFCC extraction method in speech recognition. IEEE International Symposium on Circuits and Systems (ISCAS), Kos, Greece . https://doi.org/10.1109/ISCAS.2006.1692543

Honnavalli, D., Shylaja, S. S. (2019). Supervised Machine Learning Model for Accent Recognition in English Speech Using Sequential MFCC Features Advances in Artificial Intelligence and Data Engineering. International Conference on Artificial Intelligence and Data Engineering, Udupi, India. https://doi.org/10.1007/978-981-15-3514-7

Mannepalli, K., Narahari Sastry, P., Suman M. (2016). MFCC-GMM based accent recognition system for Telugu speech signals. International Journal of Speech Technology, 9 (19), 87-93. https://doi.org/10.1007/s10772-015-9328-y

Ma, Z., Fokoué, E. (2014). A Comparison of Classifiers in Performing Speaker Accent Recognition Using MFCCs. Open Journal of Statistics, 4 (4), 258-266. http://dx.doi.org/10.4236/ojs.2014.44025

Chervonenkis, A.Y. (2013). Early History of Support Vector Machines. En B. Schölkopf, Z. Luo, V. Vovk, (Eds.) Empirical Inference (pp. 13-20). Springer. https://doi.org/10.1007/978-3-642-41136-6_3

Sigtia, S., Dixon, S. (2014). Improved Music Feature Learning with Deep Neural Networks. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Florence, Italy. http://dx.doi.org/10.1109/ICASSP.2014.6854949

Viswanath, P., Hitendra Sarma, T. (2011). An improvement to k-nearest neighbor classifier. IEEE Recent Advances in Intelligent Computational Systems (RAICS), Trivandrum, India. http://dx.doi.org/10.1109/RAICS.2011.6069307

Taha Jijo, B., Mohsin Abdulazeez, A. (2021). Classification Based on Decision Tree Algorithm for Machine Learning. Journal Of Applied Science And Technology Trends. 2 (01), 20-28. http://dx.doi.org/10.38094/jastt20165

Urquiza Aguiar, L., Campos Yucailla, P., Hidalgo Lascano, P., Becerra Camacho, F. (2020). Detección de Nodos en Zonas Ocultas en redes LAA a través de Aprendizaje Automático Supervisado. Revista De Investigación en Tecnologías de la Información (RITI) , 8 (15), 114–127. https://doi.org/10.36825/RITI.08.15.011

del Castillo Collazo, N. (2020). Predicción en el diagnóstico de tumores de cáncer de mama empleando métodos de clasificación. Revista De Investigación En Tecnologías De La Información (RITI), 8 (15), 96–104. https://doi.org/10.36825/RITI.08.15.009

Guevara-Rukoz, A., Demirsahin, I., He, F., Chu, S. C., Sarin, S., Pipatsrisawat, K., Gutkin, A., Butryna, A., Kjartansson, O. (2020). Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech. Proceedings of the 12th Conference on Language Resources and Evaluation, Marseille, France. https://aclanthology.org/2020.lrec-1.801

McFee, B., McVicar, M., Faronbi ,D., et al. (2023). librosa/librosa: 0.10.1. https://doi.org/10.5281/zenodo.8252662

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12 (85), 2825–2830. https://www.jmlr.org/papers/v12/pedregosa11a.html

Accent identification in spanish speakers through MFCC attribute analysis and supervised learning

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

Language

Information