Prediction in the diagnosis of breast cancer tumors using classification methods
DOI:
https://doi.org/10.36825/RITI.08.15.009Keywords:
Breast Cancer, Random Forests, Neural Networks, Vector Support Machine, Classification MethodsAbstract
The present work consists of the application of data sciences with the objective of predicting whether a breast cancer tumor is benign or not, for this the following classification method are used: neural networks, random forests and vector support machine. A dataset from the University of Wisconsin Hospital related to breast cancer was used. Confusion matrices are used to know the measurements of the forecast models and the ROC (Receiver Operating Characteristics) curve to determine the discriminant capacity of these, based on the value of AUC (Area Under the Curve). The proposed models reach accuracy values that indicate that very accurate predictions can be made with them, although it is important to highlight that the vector support machine model is the most convenient to use since its level of accuracy in the forecast exceeds 99 %. The use of these techniques is recommended in hospitals and laboratories where the detection of this disease is carried out, since it can be a support tool in the diagnosis of breast cancer.
References
American Cancer Society. (2019). Surveillance, Epidemiology, and End Results (SEER). National Cancer Institute. Recuperado de: https://cancerstatisticscenter.cancer.org/?_ga=2.151790777.1241982100.1584820087-1304861891.1584820087#!/
MLData. (2018). Breast Cancer: Predict if tumor is benign or malignant. Recuperado de: https://www.mldata.io/dataset-details/breast_cancer/
Aldás, J., Uriel, E. (2017). Análisis Multivariante aplicado con R (2da Ed.). Madrid, España: Ediciones Paraninfo.
Hair, J. F., Anderson, R. E., Tatham, R. L., Black, W. C. (1999). Análisis Multivariante. Madrid: Prentice Hall.
Hodnett, M., Wiley, J. F. (2018). R Deep Learning Essentials (2da Ed.). UK: Packt Publishing Ltd.
Cirillo, A. (2017). R Data Mining. Implement data mining techniques through practical use cases and real-world datasets. Birmingham, Mumbai: Packt Publishing Ltd.
Villalba Bergado, F. (2017). Aprendizaje supervisado en R. Recuperado de: https://fervilber.github.io/Aprendizaje-supervisado-en-R/bosques.html
del Castillo Collazo, N. (2020). Incidencias en el pronóstico al aplicar reducción de variables. Un ejemplo práctico. Revista de Investigación en Tecnología de la Información (RITI), 8 (15), 50-69. doi: https://doi.org/10.36825/RITI.08.15.006
Published
How to Cite
Issue
Section
License
Copyright (c) 2020 Revista de Investigación en Tecnologías de la Información
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Esta revista proporciona un acceso abierto a su contenido, basado en el principio de que ofrecer al público un acceso libre a las investigaciones ayuda a un mayor intercambio global del conocimiento.
El texto publicado en la Revista de Investigación en Tecnologías de la Información (RITI) se distribuye bajo la licencia Creative Commons (CC BY-NC), que permite a terceros utilizar lo publicado citando a los autores del trabajo y a RITI, pero sin hacer uso del material con propósitos comerciales.