The merger between natural language processing and computer vision
On October 15, ChatGPT and GPT-Vision were launched, fulfilling the promise of a merger between natural language processing and computer vision. This initiative marks major progress in the field of artificial intelligence. The following examples illustrate the diversity of possible applications, thus opening new perspectives for exploration and innovation. Discover how these technologies are transforming our interaction with visual and textual data.
Exploring Applications
The synergy between ChatGPT and GPT-Vision offers new features. Here are some captivating examples that demonstrate the diversity of possible applications.
- Modeling from an image
A simple image can be transformed into an impressive 3D model. For example, ChatGPT Vision can generate Gcode from technical drawings.
- Personalized strength training program according to your equipment
ChatGPT Vision can guide you in developing a personalized strength training program based on the equipment you have available.
- Analysis and decoding of blurred documents
Thanks to GPT-Vision, it is possible to analyze blurred documents and reveal their hidden content.
- Converting photos to text for a complex letter
GPT-Vision can turn a letter image into editable text, making it easier to write complex letters.
- Retrieving complex objects in an image
GPT-Vision technology makes it possible to identify and recover complex objects present in an image.
- Detection of images from Google Street View or satellites
GPT-Vision can accurately detect images from Google Street View or satellites.
- Detailed analysis of an x-ray
GPT-Vision can analyze an x-ray in detail and provide answers within seconds.
- Complex image analysis
Dive into the analysis of a highly complex image with GPT-Vision.
- Creation of scenarios from the analysis of several images
GPT-Vision can create a coherent scenario from the analysis of four separate images.
- Analysis of a car engine
GPT-Vision can perform a careful analysis of a car engine and offer recommendations for repairs.
- Code optimization
GPT-Vision can optimize code by offering improvements in performance, efficiency and conciseness.
Notable Limitations
Despite the progress made, certain limitations persist. For example, reading QR Codes and sharing conversations are not yet supported. You may not see these new features, but a simple page refresh or logout/login may resolve the issue. If the problem persists, try clearing the cache related to openai.com.
Here is a screenshot showing a user interface of these new features:
GPT-Vision video
I would like to credit Emile Dev’s YouTube channel, which inspired this article. Here is the presentation video: