Prediction in the diagnosis of breast cancer tumors using classification methods

Authors

DOI:

https://doi.org/10.36825/RITI.08.15.009

Keywords:

Breast Cancer, Random Forests, Neural Networks, Vector Support Machine, Classification Methods

Abstract

The present work consists of the application of data sciences with the objective of predicting whether a breast cancer tumor is benign or not, for this the following classification method are used: neural networks, random forests and vector support machine. A dataset from the University of Wisconsin Hospital related to breast cancer was used. Confusion matrices are used to know the measurements of the forecast models and the ROC (Receiver Operating Characteristics) curve to determine the discriminant capacity of these, based on the value of AUC (Area Under the Curve). The proposed models reach accuracy values ​​that indicate that very accurate predictions can be made with them, although it is important to highlight that the vector support machine model is the most convenient to use since its level of accuracy in the forecast exceeds 99 %. The use of these techniques is recommended in hospitals and laboratories where the detection of this disease is carried out, since it can be a support tool in the diagnosis of breast cancer.

References

American Cancer Society. (2019). Surveillance, Epidemiology, and End Results (SEER). National Cancer Institute. Recuperado de: https://cancerstatisticscenter.cancer.org/?_ga=2.151790777.1241982100.1584820087-1304861891.1584820087#!/

MLData. (2018). Breast Cancer: Predict if tumor is benign or malignant. Recuperado de: https://www.mldata.io/dataset-details/breast_cancer/

Aldás, J., Uriel, E. (2017). Análisis Multivariante aplicado con R (2da Ed.). Madrid, España: Ediciones Paraninfo.

Hair, J. F., Anderson, R. E., Tatham, R. L., Black, W. C. (1999). Análisis Multivariante. Madrid: Prentice Hall.

Hodnett, M., Wiley, J. F. (2018). R Deep Learning Essentials (2da Ed.). UK: Packt Publishing Ltd.

Cirillo, A. (2017). R Data Mining. Implement data mining techniques through practical use cases and real-world datasets. Birmingham, Mumbai: Packt Publishing Ltd.

Villalba Bergado, F. (2017). Aprendizaje supervisado en R. Recuperado de: https://fervilber.github.io/Aprendizaje-supervisado-en-R/bosques.html

del Castillo Collazo, N. (2020). Incidencias en el pronóstico al aplicar reducción de variables. Un ejemplo práctico. Revista de Investigación en Tecnología de la Información (RITI), 8 (15), 50-69. doi: https://doi.org/10.36825/RITI.08.15.006

Published

2020-05-06

How to Cite

del Castillo Collazo, N. (2020). Prediction in the diagnosis of breast cancer tumors using classification methods. Revista De Investigación En Tecnologías De La Información, 8(15), 96–104. https://doi.org/10.36825/RITI.08.15.009

Issue

Section

Artículos