Hybrid EfficientNet and Vision Transformer architecture for cross-platform agricultural pathology detection
DOI:
https://doi.org/10.36825/RITI.14.33.009Keywords:
Deep Learning, Vision Transformers, EfficientNet, Foliar Diseases, Precision Agriculture, Reality GapAbstract
The early diagnosis of foliar diseases is crucial for food security; however, there is an applicability gap (reality gap) that limits the operational viability of Deep Learning models in real agricultural environments due to their high computational complexity and sensitivity to visual noise. The objective of this study is to design, validate, and deploy a lightweight, hybrid architecture (AgroScan) capable of operating as a real-time decision support tool against 49 diseases distributed across 10 high-impact crops. The methodology proposes a two-stage inference process: a binary filtering "gatekeeper" model to mitigate visual background noise, followed by a classifier trained on 98,000 images, which fuses the parametric efficiency of EfficientNetB0 with the spatial correlation capability of a Transformer module (Multi-Head Self-Attention). Trained and evaluated on 98,000 images. The hybrid architecture achieved an overall accuracy of 94.69% (95% CI: [94.28%, 95.11%]) and an F1-Score of 94.68% (95% CI: [94.22%, 95.08%]). Confidence intervals were calculated using non-parametric resampling on the test set. The main contribution of the study lies in the empirical demonstration that global attention increases accuracy in disease diagnosis while maintaining an inference latency (9.45 ms per image). Finally, the implementation of the architecture through a client-server platform, accessible via web and mobile applications, has demonstrated its operational robustness as a viable agricultural support tool.
References
Secretaría de Agricultura, Ganadería, Desarrollo Rural, Pesca y Alimentación [SAGARPA]. (2017). Planeación Agrícola Nacional 2017-2030. Gobierno de México. https://www.gob.mx/agricultura/acciones-y-programas/planeacion-agricola-nacional-2017-2030-126813
Organización de las Naciones Unidas para la Alimentación y la Agricultura [FAO]. (2011). Seguridad alimentaria y nutricional: Conceptos básicos. https://www.fao.org/3/at772s/at772s.pdf
IBM. (2023). ¿Qué son las redes neuronales convolucionales? https://www.ibm.com/mx-es/topics/convolutional-neural-networks
TensorFlow. (2024). Transferencia de aprendizaje y ajuste fino. https://www.tensorflow.org/tutorials/images/transfer_learning?hl=es-419
Hugging Face. (2024). Vision Transformer (ViT). https://huggingface.co/docs/transformers/model_doc/vit
Keras. (2020). EfficientNet B0 to B7. https://keras.io/api/applications/efficientnet/
Google Research. (2020). Transformers for Image Recognition at Scale. https://blog.research.google/2020/12/transformers-for-image-recognition-at.html
Mohanty, S. P., Hughes, D. P., Salathé, M. (2016). Using deep learning for image-based plant disease detection. Frontiers in Plant Science, 7, 1-10. https://doi.org/10.3389/fpls.2016.01419
Saleem, M. H., Potgieter, J., Arif, K. M. (2019). Plant disease detection and classification by deep learning. Plants, 8 (11), 1-22. https://doi.org/10.3390/plants8110468
Li, X., Li., S. (2022). Transformer help CNN see better: A lightweight hybrid apple disease identification model based on Transformers. Agriculture, 12 (6), 1-16. https://doi.org/10.3390/agriculture12060884
De Silva, M., Brown, D. (2023). Multispectral plant disease detection with Vision Transformer–Convolutional Neural Network hybrid approaches. Sensors, 23 (20), 1-22. https://doi.org/10.3390/s23208531
Ashurov, A. Y., Al-Gaashani, M. S. A. M., Samee, N. A., Alkanhel, R., Atteia, G., Abdallah, H. A., Muthanna, M. S. A. (2025). Enhancing plant disease detection through deep learning: a Depthwise CNN with squeeze and excitation integration and residual skip connections. Frontiers in Plant Science, 15, 01-16. https://doi.org/10.3389/fpls.2024.1505857
Kalpana, P., Gera, P., Alabdulkreem, E., Quasim, M. T., Baili, J., Cho, Y., Nam, Y. (2025). An ensemble heterogeneous transformer model for an effective diagnosis of multiple plant diseases. Frontiers in Plant Science, 16, 01-22. https://doi.org/10.3389/fpls.2025.1693095
Kaggle. (2023). Leaf vs Non-Leaf Images Dataset. Kaggle Open Datasets. https://www.kaggle.com/datasets/robiulhasanjisan/leaf-vs-non-leaf-images/data
Amazon Web Services. (2023). ¿Qué es una API de RESTful? https://aws.amazon.com/es/what-is/restful-api/
Kaggle. (2019). PlantVillage Dataset. Kaggle Open Datasets. https://www.kaggle.com/datasets/emmarex/plantdisease
TensorFlow. (2024). Aumento de datos (Data augmentation). https://www.tensorflow.org/tutorials/images/data_augmentation?hl=es-419
TensorFlow. (2024). Precisión combinada (Mixed Precision). https://www.tensorflow.org/guide/mixed_precision?hl=es-419
Lozano Ramirez, M. C. (2025). El profesorado y el uso de la inteligencia artificial (IA) como proceso de aprendizaje. Revista de Investigación en Tecnologías de la Información (RITI), 13 (30), 1-8. https://doi.org/10.36825/RITI.13.30.001
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Revista de Investigación en Tecnologías de la Información

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Esta revista proporciona un acceso abierto a su contenido, basado en el principio de que ofrecer al público un acceso libre a las investigaciones ayuda a un mayor intercambio global del conocimiento.
El texto publicado en la Revista de Investigación en Tecnologías de la Información (RITI) se distribuye bajo la licencia Creative Commons (CC BY-NC![]()
), que permite a terceros utilizar lo publicado citando a los autores del trabajo y a RITI, pero sin hacer uso del material con propósitos comerciales.
