Google (GOOG, GOOGL) on Wednesday debuted its new Gemini generative AI model. The platform serves as Google’s answer to Microsoft-backed (MSFT) OpenAI’s GPT-4, and according to DeepMind CEO Demis Hassabis, it’s the company’s “most capable and general model” yet.
Gemini is what is referred to as a natively multimodal model, meaning it can analyze text, audio, video, images, and code. While other multimodal offerings exist, Google says Gemini stands apart because the model was designed to take all of those mediums into account from the beginning.
Other platforms, the company said, train separate models to tackle things like text, video, and photos and then string them together into a single model.
This difference, according to Hassabis, means that Gemini can better understand multimodal data and produce better results for everything from handwritten content to images and videos.
As part of the announcement, Google released a series of videos demonstrating Gemini’s capabilities. In one video, a presenter showed a program running Gemini with a drawing of a blue duck as well as a rubber blue duck, both of which the AI was able to identify.





Leave a comment