Mejora de la estimación del esfuerzo en proyectos de software mediante métodos de sobremuestreo y aprendizaje computacional

Beatriz Bedolla Martínez; Raúl Cruz-Barbosa; Iván Antonio García Pacheco

doi:10.36825/RITI.13.31.008

Autores/as

Beatriz Bedolla Martínez Universidad Tecnológica de la Mixteca, Huajuapan de León, México https://orcid.org/0009-0002-3474-3227
Raúl Cruz-Barbosa Universidad Tecnológica de la Mixteca, Huajuapan de León, México https://orcid.org/0000-0002-5494-7027
Iván Antonio García Pacheco Universidad Tecnológica de la Mixteca, Huajuapan de León, México https://orcid.org/0000-0002-7594-6410

DOI:

https://doi.org/10.36825/RITI.13.31.008

Palabras clave:

Estimación del Esfuerzo, Proyectos de Software, Aprendizaje Computacional, Predicción, Sobremuestreo, Regresión

Resumen

La predicción de la estimación del esfuerzo determina el tiempo que tomará desarrollar un software o los recursos que se requerirán para terminarlo en el tiempo establecido. Una alternativa actual para predecir las estimaciones es utilizar métodos de aprendizaje computacional, sin embargo, los conjuntos de datos disponibles públicamente generalmente contienen pocas muestras, por lo cual dichos métodos no pueden mejorar su efectividad. Entonces, es necesario aumentar el número de muestras mediante métodos de sobremuestreo. Por lo anterior, en este artículo se utilizan principalmente métodos de ensamble con combinaciones de sobremuestreo y submuestreo para analizar el efecto en el rendimiento de los regresores utilizados sobre conjuntos pequeños y medianos, evaluando así su efectividad en la mejora de la estimación del esfuerzo en proyectos de software, mediante el uso de medidas como MMRE, MAE, RMSE y Pred. Los resultados obtenidos de MMRE y Pred, principalmente, muestran que la aplicación de estas estrategias permite reducir los errores de predicción. Por tanto, la utilización de un modelo de ensamble adecuado, junto con las estrategias de sobremuestreo y submuestreo, permite mejorar la predicción del esfuerzo, especialmente en conjuntos de datos pequeños como COCOMO, Maxwell y Desharnais con alto desbalanceo en la distribución de sus muestras.

Citas

Durgesh, D. V. S., Saket, M. V. S., Reddy, B. R. (2023). Improving software effort estimation with heterogeneous stacked ensemble using SMOTER over ELM and SVR base learners. En R. Morusupalli, T. S. Dandibhotla, V. V. Atluri, D. Windridge, P. Lingras, V. R. Komati (Eds.), Multi-disciplinary trends in artificial intelligence (pp. 442–448). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-36402-0_41

Sunda, N., Sinha, R. R. (2023). Optimizing effort estimation in agile software development: Traditional vs. advanced ML methods. IEEE International Conference on Communication, Security and Artificial Intelligence (ICCSAI). Greater Noida, India. https://doi.org/10.1109/ICCSAI59793.2023.10421235

Belhaouari, S. B., Islam, A., Kassoul, K., Al-Fuqaha, A., Bouzerdoum, A. (2024). Oversampling techniques for imbalanced data in regression. Expert Systems with Applications, 252, 1-19. https://doi.org/10.1016/j.eswa.2024.124118

Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Avelino, J. G., Cavalcanti, G. D. C., Cruz, R. M. O. (2024). Resampling strategies for imbalanced regression: A survey and empirical analysis. Artificial Intelligence Review, 57, 82–124. https://doi.org/10.1007/s10462-024-10724-3

Moniz, N., Ribeiro, R., Cerqueira, V., & Chawla, N. (2018). SMOTEBoost for regression: Improving the prediction of extreme values. IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). Turin, Italy. https://doi.org/10.1109/DSAA.2018.00025

Torgo, L., Ribeiro, R. P., Pfahringer, B., Branco, P. (2013). SMOTE for regression. En L. Correia, L. P. Reis, J. Cascalho (Eds.), Progress in Artificial Intelligence (pp. 378–389). Springer. https://doi.org/10.1007/978-3-642-40669-0_33

Jawa, M., Meena, S. (2022). Software effort estimation using synthetic minority over-sampling technique for regression (SMOTER). IEEE 3rd International Conference for Emerging Technology (INCET). Belgaum, India. https://doi.org/10.1109/INCET54531.2022.9824043

Yun, F. H. (2025). China: Effort estimation dataset. Zenodo. https://zenodo.org/records/268446

Li, Y. (2025). Effort estimation: Maxwell. Zenodo. https://zenodo.org/records/268461

Kaggle. (2025). Effort-estimation-on-cocomo-dataset. https://kaggle.com/code/vanlocbk1996/effort-estimation-on-cocomo-dataset

Esteves, A. (2025). Software effort estimation. https://github.com/yy2111/Software-Effort-Estimation/blob/master/Datasets/02.desharnais.csv

Bhattacharyya, A., Srijith, K., Behera, R. P., Dasgupta, A., Chakraborty, R. S. (2024). A study on effects of synthetic data for predicting the remaining useful life of aluminium electrolytic capacitors using bagging-based ensemble learning. International Conference on Advances in Data-driven Computing and Intelligent Systems (ADCIS). Goa, India. https://doi.org/10.1007/978-981-99-9518-9_40

Qi, L., Zhihao, L., & Jianxiao, Z. I. (2024). A SMOGN-based MPSO-BP model to predict the height of a hydraulically conductive fracture zone. Coal Geology & Exploration, 52 (11), 72–85. https://cge.researchcommons.org/journal/vol52/iss11/7/

Rad, M., Rafiei, A., Grunwell, J., Kamaleswaran, R. (2025). Tackling the small imbalanced horizontal dataset regressions by stability selection and SMOGN: A case study of ventilation-free days prediction in the pediatric intensive care unit and the importance of PRISM. International Journal of Medical Informatics, 196. https://doi.org/10.1016/j.ijmedinf.2025.105809

Branco, P., Torgo, L., Ribeiro, R. P. (2017). SMOGN: A pre-processing approach for imbalanced regression. First International Workshop on Learning with Imbalanced Domains: Theory and Applications (LIDTA). Skopje, Macedonia. https://proceedings.mlr.press/v74/branco17a/branco17a.pdf

Rahman, M., Sarwar, H., Kader, M. D. A., Gonçalves, T., Tin, T. T. (2024). Review and empirical analysis of machine learning-based software effort estimation. IEEE Access, 12, 85661–85680. https://doi.org/10.1109/ACCESS.2024.3404879

Abid, M., Bukhari, S., Saqlain, M. (2025). Enhancing software effort estimation in healthcare informatics: A comparative analysis of machine learning models with correlation-based feature selection. Sustainable Machine Intelligence, 10, 50–66. https://doi.org/10.61356/SMIJ.2025.10451

Mienye, I. D., Sun, Y. (2022). A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access, 10, 99129–99149. https://doi.org/10.1109/ACCESS.2022.3207287

Varshini, A. G. P., Kumari, K. A., Janani, D., Soundariya, S. (2021). Comparative analysis of machine learning and deep learning algorithms for software effort estimation. Journal of Physics: Conference Series, 1767, 1-11. https://doi.org/10.1088/1742-6596/1767/1/012019

Şengüneş, B., Öztürk, N. (2023). An artificial neural network model for project effort estimation. Systems, 11 (2), 1-22. https://doi.org/10.3390/systems11020091

Zakrani, A., Hain, M., Idri, A. (2019). Improving software development effort estimating using support vector regression and feature selection. IAES International Journal of Artificial Intelligence, 8 (4), 399–410. https://doi.org/10.11591/ijai.v8.i4.pp399-410

Rahman, M., Roy, P. P., Ali, M., Goncalves, T., Sarwar, H. (2023). Software effort estimation using machine learning technique. International Journal of Advanced Computer Science and Applications, 14 (4), 822–827. https://doi.org/10.14569/IJACSA.2023.0140491

Mejora de la estimación del esfuerzo en proyectos de software mediante métodos de sobremuestreo y aprendizaje computacional

Autores/as

DOI:

Palabras clave:

Resumen

Citas

Descargas

Publicado

Cómo citar

Número

Sección

Licencia

Enviar un artículo

Idioma

Información