Innovations in Computer Vision: Evaluation of ChatGPT, Gemini, and Copilot for Image Analysis

Authors

DOI:

https://doi.org/10.37431/conectividad.v6i2.284

Keywords:

ChatGPT, Gemini, Copilot, AI, Natural Language Processing

Abstract

In recent years, Large Scale Language Models (LLM) have had an exponential growth and have evolved rapidly, from their beginnings when they were conceived under the premise of simple tools that understood text to our times when they have become multimodal systems capable of generating creative and complex content. This innovation has been driven by the great advances in neural network architectures and, in addition, the availability of large data sets. In this study, the main objective is to compare three most used LLMs: ChatGPT, Gemini and Copilot, in the execution of the task of converting images to text (I2T). The capacity of each model to describe in a detailed and precise way different types of images was evaluated, among which artistic paintings, urban scenes and images with instructions were evaluated. The results obtained show that the three models have a high level of performance, the Gemini model stands out thanks to its ability to integrate visual and textual information more efficiently. The results of the study show that LLMs continue to evolve, so we can expect to see even more significant advances in their ability to understand and generate natural language. It is also expected that this evolution will allow these models to be more widely applied in the daily lives of all people, automating processes and helping to improve the development of virtual assistants.

Published

2025-05-16

How to Cite

Minango Negrete, P. D., Zambrano Vizuete, Óscar M., Minango Negrete, J. C., Minaya Andino, C. A., & León Galeas, C. J. (2025). Innovations in Computer Vision: Evaluation of ChatGPT, Gemini, and Copilot for Image Analysis. CONECTIVIDAD, 6(2), 251–262. https://doi.org/10.37431/conectividad.v6i2.284

Similar Articles

1 2 3 > >> 

You may also start an advanced similarity search for this article.