Property-based software testing automation through prompt engineering and artificial intelligence

Authors

  • Ricardo Rafael Quintero Meza Instituto Tecnológico de Culiacán, Culiacán, México
  • Erven Germán Gil García Instituto Tecnológico de Culiacán, Culiacán, México

DOI:

https://doi.org/10.36825/RITI.13.31.005

Keywords:

Prompt, Properties, Test, IA, LLM

Abstract

The increasing complexity of modern software development demands more efficient, comprehensive, and adaptive testing methodologies that ensure the reliability, robustness, and quality of applications, while also optimizing the costs and time associated with system maintenance and evolution. Although traditional testing approaches remain widely used, they present limitations in terms of coverage, scalability, and adaptability, especially when dealing with dynamic and constantly evolving systems. In this context, the integration of large language models (LLMs) through prompt engineering techniques emerges as a promising and innovative alternative for automating, enhancing, and expanding software testing processes. This work presents a tool that combines the generative capabilities of LLMs, accessible through specialized APIs, with the rigor of property-based testing (PBT). This synergy enables the automatic generation of test properties and the intelligent validation of code, facilitating early error detection and contributing to the development of more robust and reliable software from the early stages of the development lifecycle. Through prompt engineering, the tool guides the precise formulation of test properties and orchestrates the generation of diverse and relevant data. This approach aims to overcome the limitations of traditional methodologies by improving test coverage, reducing manual effort, and increasing scalability. The result is a more optimized verification process that promotes higher standards of software quality and reliability. This proposal represents a step forward in the intelligent automation of testing, integrating artificial intelligence with formal validation methodologies and opening new possibilities for its application in software engineering.

References

Baresi, L., Pezze, M. (2006). An introduction to software testing. Electronic Notes in Theoretical Computer Science, 148 (1), 89–111. https://doi.org/10.1016/j.entcs.2005.12.014

Fink, G., Bishop, M. (1997). Property-based testing: A new approach to testing for assurance. ACM SIGSOFT Software Engineering Notes, 22 (4), 74–80. https://doi.org/10.1145/263244.263267

Kaner, C., Bach, J., Pettichord, B. (2002). Testing computer software. Wiley.

Beizer, B. (1990). Software testing techniques. Van Nostrand Reinhold Company. https://dl.acm.org/doi/10.5555/79060

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ..., Amodei, D. (2020). Language models are few-shot learners. 34th International Conference on Neural Information Processing Systems. Vancouver BC, Canada. https://dl.acm.org/doi/abs/10.5555/3495724.3495883

Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55 (9), 1–35. https://doi.org/10.1145/3560815

White, J., Fu., Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D. C. (2023). Prompt engineering techniques for large language models: A survey. arXiv preprint. https://arxiv.org/abs/2302.11382

Zhou, X., Schärli, N., Hou, L. Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O. Le, Q., Chi, E. (2022). Least-to-most prompting enables complex reasoning in large language models. arXiv preprint. https://doi.org/10.48550/arXiv.2205.10625

Díaz Benito, G. (2024). Análisis sobre la utilización de transformers y modelos generativos para generación de anuncios (Trabajo de Fin de Grado). Universidad Politécnica de Madrid, España. https://oa.upm.es/82703/

Jason ZK. (2024). Ingeniería de prompts y prompts de IA: Conceptos, diseño y optimización. https://es.blog.jasonzk.com/ai/aipromptengineering/

Kippel01. (2024). La importancia del “prompt engineering” en la calidad de respuestas de herramientas de inteligencia artificial. https://www.kippel01.com/tecnologia/importancia-prompt-engineering-calidad-respuestas-herramientas-inteligencia-artificial

MacIver, D. R., Hatfield-Doods, Z. (2018). Hypothesis: A new approach to property-based testing. Journal of Open Source Software, 4 (43). https://doi.org/10.21105/joss.01891

Goldstein, H., Cutler, J. W., Dickstein, D., Pierce, B. C., Head, A. (2024). Property-Based Testing in Practice. IEEE/ACM 46th International Conference on Software Engineering (ICSE). Lisbon, Portugal. https://doi.org/10.1145/3597503.3639581

Claessen, K., Hughes, J. (2000). QuickCheck: A lightweight tool for random testing of Haskell programs. Fifth ACM SIGPLAN international conference on Functional programming. Singapore. https://doi.org/10.1145/351240.351266

Papadakis, M., Sagonas, K. (2011). A PropEr integration of types and function specifications with property-based testing. 10th ACM SIGPLAN workshop on Erlang. Tokyo, Japan. https://doi.org/10.1145/2034654.2034663

keploy. (2024). Property-based testing: A comprehensive guide. https://dev.to/keploy/property-based-testing-a-comprehensive-guide-lc2

Classen, A., Heymans, P., Schobbens, P.-Y., Legay, A., Raskin, J.-F. (2010). Model checking lots of systems: Efficient verification of temporal properties in software product lines. 32nd ACM/IEEE International Conference on Software Engineering - Volume 1. Cape Town South Africa. https://doi.org/10.1145/1806799.1806850

Austin, J., Odena, A., Nye, M., Bosma, M., Michalewski, H., Dohan, D., Jiang, E., Cai, C., Terry, M., Le, Q., Sutton, C. (2023). Program synthesis with large language models. arXiv preprint. https://doi.org/10.48550/arXiv.2108.07732

Chen, M., Tworek, J., Jun, H., ... (2021). Evaluating large language models trained on code. arXiv preprint. https://doi.org/10.48550/arXiv.2107.03374

Kazemitabaar, M., Williams, J., Drosos, I., Grossman, T., Henley, A. Z., Negreanu, C., Sarkar, A. (2024). Improving steering and verification in AI-assisted data analysis with interactive task decomposition. 37th Annual ACM Symposium on User Interface Software and Technology (UIST). Pittsburgh PA, USA. https://doi.org/10.1145/3654777.3676345

Claessen, K., Hughes, J. (2000). QuickCheck: A lightweight tool for random testing of Haskell programs. ACM SIGPLAN Notices, 35 (9), 268–279. https://doi.org/10.1145/357766.351266

Higginbotham, G. Z., Matthews, N. S. (2024). Prompting and in-context learning: Optimizing prompts for Mistral Large. https://doi.org/10.21203/rs.3.rs-4430993/v1

Mistral AI Team. (2024). Au large. https://mistral.ai/news/mistral-large

Zheng, Q., Guo, Y., ... (2023). Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x. arXiv preprint. https://doi.org/10.48550/arXiv.2303.17568

Yu, Z., Wang, Z., ... (2024). Humaneval pro and mbpp pro: Evaluating large language models on self-invoking code generation. arXiv preprint. https://doi.org/10.48550/arXiv.2412.21199

Dohmke, T. (2024). Introducing GitHub models: A new generation of AI engineers building on GitHub. https://github.blog/news-insights/product-news/introducing-github-models/

Pérula, R., Calleja, T., Hernández de la Cruz, J. M. (2024). Versus: OpenAI GPT-4 vs. Google Gemini Pro vs. Mistral AI Large. https://www.paradigmadigital.com/dev/versus-openai-gpt4-google-gemini-pro-mistral-ai-large/

Visual Studio. (2025). Your first extension. https://code.visualstudio.com/api/get-started/your-first-extension

Hou, X., Zhao, Y., Wang, S., Wang, H. (2025). Model context protocol (mcp): Landscape, security threats, and future research directions. arXiv preprint. https://doi.org/10.48550/arXiv.2503.23278

Published

2025-10-26

How to Cite

Quintero Meza, R. R., & Gil García, E. G. (2025). Property-based software testing automation through prompt engineering and artificial intelligence. Revista De Investigación En Tecnologías De La Información, 13(31 Especial), 39–51. https://doi.org/10.36825/RITI.13.31.005