Hybrid EfficientNet and Vision Transformer architecture for cross-platform agricultural pathology detection

Authors

DOI:

https://doi.org/10.36825/RITI.14.33.009

Keywords:

Deep Learning, Vision Transformers, EfficientNet, Foliar Diseases, Precision Agriculture, Reality Gap

Abstract

The early diagnosis of foliar diseases is crucial for food security; however, there is an applicability gap (reality gap) that limits the operational viability of Deep Learning models in real agricultural environments due to their high computational complexity and sensitivity to visual noise. The objective of this study is to design, validate, and deploy a lightweight, hybrid architecture (AgroScan) capable of operating as a real-time decision support tool against 49 diseases distributed across 10 high-impact crops. The methodology proposes a two-stage inference process: a binary filtering "gatekeeper" model to mitigate visual background noise, followed by a classifier trained on 98,000 images, which fuses the parametric efficiency of EfficientNetB0 with the spatial correlation capability of a Transformer module (Multi-Head Self-Attention). Trained and evaluated on 98,000 images. The hybrid architecture achieved an overall accuracy of 94.69% (95% CI: [94.28%, 95.11%]) and an F1-Score of 94.68% (95% CI: [94.22%, 95.08%]). Confidence intervals were calculated using non-parametric resampling on the test set. The main contribution of the study lies in the empirical demonstration that global attention increases accuracy in disease diagnosis while maintaining an inference latency (9.45 ms per image). Finally, the implementation of the architecture through a client-server platform, accessible via web and mobile applications, has demonstrated its operational robustness as a viable agricultural support tool.

References

Secretaría de Agricultura, Ganadería, Desarrollo Rural, Pesca y Alimentación [SAGARPA]. (2017). Planeación Agrícola Nacional 2017-2030. Gobierno de México. https://www.gob.mx/agricultura/acciones-y-programas/planeacion-agricola-nacional-2017-2030-126813

Organización de las Naciones Unidas para la Alimentación y la Agricultura [FAO]. (2011). Seguridad alimentaria y nutricional: Conceptos básicos. https://www.fao.org/3/at772s/at772s.pdf

IBM. (2023). ¿Qué son las redes neuronales convolucionales? https://www.ibm.com/mx-es/topics/convolutional-neural-networks

TensorFlow. (2024). Transferencia de aprendizaje y ajuste fino. https://www.tensorflow.org/tutorials/images/transfer_learning?hl=es-419

Hugging Face. (2024). Vision Transformer (ViT). https://huggingface.co/docs/transformers/model_doc/vit

Keras. (2020). EfficientNet B0 to B7. https://keras.io/api/applications/efficientnet/

Google Research. (2020). Transformers for Image Recognition at Scale. https://blog.research.google/2020/12/transformers-for-image-recognition-at.html

Mohanty, S. P., Hughes, D. P., Salathé, M. (2016). Using deep learning for image-based plant disease detection. Frontiers in Plant Science, 7, 1-10. https://doi.org/10.3389/fpls.2016.01419

Saleem, M. H., Potgieter, J., Arif, K. M. (2019). Plant disease detection and classification by deep learning. Plants, 8 (11), 1-22. https://doi.org/10.3390/plants8110468

Li, X., Li., S. (2022). Transformer help CNN see better: A lightweight hybrid apple disease identification model based on Transformers. Agriculture, 12 (6), 1-16. https://doi.org/10.3390/agriculture12060884

De Silva, M., Brown, D. (2023). Multispectral plant disease detection with Vision Transformer–Convolutional Neural Network hybrid approaches. Sensors, 23 (20), 1-22. https://doi.org/10.3390/s23208531

Ashurov, A. Y., Al-Gaashani, M. S. A. M., Samee, N. A., Alkanhel, R., Atteia, G., Abdallah, H. A., Muthanna, M. S. A. (2025). Enhancing plant disease detection through deep learning: a Depthwise CNN with squeeze and excitation integration and residual skip connections. Frontiers in Plant Science, 15, 01-16. https://doi.org/10.3389/fpls.2024.1505857

Kalpana, P., Gera, P., Alabdulkreem, E., Quasim, M. T., Baili, J., Cho, Y., Nam, Y. (2025). An ensemble heterogeneous transformer model for an effective diagnosis of multiple plant diseases. Frontiers in Plant Science, 16, 01-22. https://doi.org/10.3389/fpls.2025.1693095

Kaggle. (2023). Leaf vs Non-Leaf Images Dataset. Kaggle Open Datasets. https://www.kaggle.com/datasets/robiulhasanjisan/leaf-vs-non-leaf-images/data

Amazon Web Services. (2023). ¿Qué es una API de RESTful? https://aws.amazon.com/es/what-is/restful-api/

Kaggle. (2019). PlantVillage Dataset. Kaggle Open Datasets. https://www.kaggle.com/datasets/emmarex/plantdisease

TensorFlow. (2024). Aumento de datos (Data augmentation). https://www.tensorflow.org/tutorials/images/data_augmentation?hl=es-419

TensorFlow. (2024). Precisión combinada (Mixed Precision). https://www.tensorflow.org/guide/mixed_precision?hl=es-419

Lozano Ramirez, M. C. (2025). El profesorado y el uso de la inteligencia artificial (IA) como proceso de aprendizaje. Revista de Investigación en Tecnologías de la Información (RITI), 13 (30), 1-8. https://doi.org/10.36825/RITI.13.30.001

Published

2026-04-08

How to Cite

Rodríguez Ramírez, E., Carrillo Ruiz, M., & Carrillo Ruiz, H. (2026). Hybrid EfficientNet and Vision Transformer architecture for cross-platform agricultural pathology detection. Revista De Investigación En Tecnologías De La Información, 14(33), 111–125. https://doi.org/10.36825/RITI.14.33.009

Issue

Section

Artículos

Most read articles by the same author(s)