Innovations in Computer Vision: Evaluation of ChatGPT, Gemini, and Copilot for Image Analysis
DOI:
https://doi.org/10.37431/conectividad.v6i2.284Keywords:
ChatGPT, Gemini, Copilot, AI, Natural Language ProcessingAbstract
In recent years, Large Scale Language Models (LLM) have had an exponential growth and have evolved rapidly, from their beginnings when they were conceived under the premise of simple tools that understood text to our times when they have become multimodal systems capable of generating creative and complex content. This innovation has been driven by the great advances in neural network architectures and, in addition, the availability of large data sets. In this study, the main objective is to compare three most used LLMs: ChatGPT, Gemini and Copilot, in the execution of the task of converting images to text (I2T). The capacity of each model to describe in a detailed and precise way different types of images was evaluated, among which artistic paintings, urban scenes and images with instructions were evaluated. The results obtained show that the three models have a high level of performance, the Gemini model stands out thanks to its ability to integrate visual and textual information more efficiently. The results of the study show that LLMs continue to evolve, so we can expect to see even more significant advances in their ability to understand and generate natural language. It is also expected that this evolution will allow these models to be more widely applied in the daily lives of all people, automating processes and helping to improve the development of virtual assistants.
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Instituto Superior Tecnológico Universitario Rumiñahui

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The originals published in this journal's printed and electronic editions are the property of the Instituto Superior Tecnológico Universitario Rumiñahui. Therefore, citing the source in any partial or total reproduction is necessary. All the contents of the electronic journal are distributed under a Creative Commons Attribution-Noncommercial 4.0 International (CC-BY-NC 4.0) license.