Study of preferences for white and red wine using binary classification methods

Authors

DOI:

https://doi.org/10.36825/RITI.08.16.003

Keywords:

Data Mining, Classification Methods, Neural Networks, Discriminating Analysis, Logistic Regression

Abstract

The application of data mining methods allows us to detect a series of patterns that may exist in the data we analyze but are not easy to detect a simple view. In this case, we apply some techniques to predict the frequencies of the taste of wine from a series of physical - chemical characteristics of its composition, both wine and white wine, drinks that have been liked by many people internationally for a long time The data set that was used in this work was taken from Green Wine from the North of Portugal. These data had a group of variables that allowed applying classification methods to predict the flavor specifications of the wine based on the criteria given by the customers. The methods were used to achieve this objective: discriminant analysis, logistic regression and neural networks. The results showed that for the two data sets the results are very similar when the three specific methods are applied. The discriminant capacity of the models makes it possible to clearly distinguish the separation of the two groups for classification.

References

Rayo Llerena, I., Marín Huerta, E. (1998). Vino y corazón. Revista Española de Cardiología, 51 (6), 435-449. Recuperado de: https://www.revespcardiol.org/es-vino-corazon-articulo-X0300893298002947?redirect=true

Doll, R., Peto, R., Hall, E., Wheatley, K., Gray, R. (1994). Mortality in relation to consumption of alcohol: 13 years observations on male British doctors. The BMJ (Clinical research ed.), 309 (6959), 911-918. doi: https://doi.org/10.1136/bmj.309.6959.911

Kannel, W. B., Curtis Ellison R. (1996). Alcohol and coronary heart disease: the evidence for a protective effect. Clinica Chimica Acta, 246 (1-2), 59-76. doi: https://doi.org/10.1016/0009-8981(96)06227-4

Hosmer, D. W., Lemesbow, S. (1980). A Goodness-of-Fit Tests for the Multiple Logistic Regression Model. Communications in Statistics-Theory and Methods, 9 (10), 1043-1069. doi: https://doi.org/10.1080/03610928008827941

Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J. (2009). Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47 (4), 547-553. doi: https://doi.org/10.1016/j.dss.2009.05.016

Cortez, P. (2009). UCI-Machine leaning repository. Recuperado de: https://archive.ics.uci.edu/ml/datasets/Wine+Quality

Hair, J. F., Anderson, R. E., Tatham, R. L., Black, W. C. (1999). Análisis Multivariante. Madrid: Prentice Hall.

Pearson, R. K. (2018). Exploratory Data Analysis Using R. Boca Raton, US: CRC Press-Taylor & Francis Group.

Henao Zuluaga, K. J., Correa Morales, J. C. (2018). Regresión Logística Bivariable para Tablas de Contingencia Usando Metodología GSK. Revista Comunicaciones en Estadística, 11 (2), 153–170.

Wiley, M. H. (2018). R Deep Learning Essentials. UK: Packt Publishing Ltd.

Aldás, J., Uriel, E. (2017). Análisis Multivariante aplicado con R (2da. Ed.). Madrid, España: Ediciones Paraninfo .

del Castillo Collazo, N. (2020). Predicción en el diagnóstico de tumores de cáncer de mama empleando métodos de clasificación. Revista de Investigación en Tecnología de la Información (RITI), 8 (15), 96-104. doi: https://doi.org/10.36825/RITI.08.15.009

Published

2020-08-01

How to Cite

del Castillo Collazo, N., Alvarado Pegueros, L. F., Flores Rodríguez, V., & Rodríguez, N. A. (2020). Study of preferences for white and red wine using binary classification methods. Revista De Investigación En Tecnologías De La Información, 8(16), 23–32. https://doi.org/10.36825/RITI.08.16.003