
Google DeepMind has created a new artificial intelligence (AI) that competes with ChatGPT. This AI, called Gemini, can simultaneously understand various types of multimedia such as images, videos, audio and text and respond accordingly.Most artificial intelligence technologies can only understand or create one type of content. For example, ChatGPT. This AI from OpenAI can only understand information given as text or writing. Then it can respond accordingly, it can create text-based content. Again, if we talk about Midjourney, it can create images according to written instructions.But Gemini is different. It can understand various types of content in addition to text. This is what Google claimed in a recent blog post. The post was published on December 6.Initially, Google released three versions of Gemini 1.0. Of these, Gemini Ultra can perform the largest and most complex tasks. Gemini Pro is connected to Google’s digital services. And Gemini Nano is designed to be suitable for use on smartphones.
According to Google DeepMind’s technical report, Gemini Ultra beat other AI models, including ChatGPT-4, in 30 out of 32 criteria for artificial intelligence research and development. These include subjects ranging from college-level exams to ethics, science, technology, and law.In particular, Google’s artificial intelligence models have successfully passed 9 image analysis criteria, 6 video comprehension criteria, 5 audio and translation criteria, and 10 text and logic comprehension criteria. However, Gemini Ultra lost to GPT-4 in two text and logic comprehension tests.The task of creating a model that can analyze multiple types of content is quite difficult. Because in that case, AI has to be provided with various types of data for training. In addition, the amount of data is also huge. Therefore, efficiency decreases. When it comes to correcting various types of errors, it is seen that AI is not able to improve much. At this time, artificial intelligence models show the characteristic of ‘overfitting’. That is, they give good results on the data they were trained on. But when given new types of data or instructions, they can no longer complete those tasks.
Another thing is that in multimodal training, artificial intelligence is usually trained with different types of content at different times. Then, the model is made complete by combining everything. In the case of Gemini, nothing like this was done. Various types of content have been provided together in the training dataset, that is, the data provided for training. Google DeepMind scientists have used web documents, various books and codes to collect this data. However, this training has been provided under human supervision. That is, a supervised learning model—in this case, a human tells the AI model where it is making mistakes and how to correct it—has been followed in this case.Google was really excited about this training. They used the famous Tensor Processing Unit or TPU for this large-scale work spread across their multiple data centers. Thousands of such TPUs—which many also call AI accelerator chips—were used to train the Gemini model. The name itself suggests what this chip does—to speed up the work of artificial intelligence. Google also said so. Their artificial intelligence research department—Google DeepMind—created this chip primarily to speed up the training of artificial intelligence, Google said. Not only that, but DeepMind created a cluster of 4096 chips called ‘SuperPod’ to train Gemini. As a result, Gemini was trained in much less time than before.
DeepMind scientists have designed the Gemini AI model in such a way that it can be used in an immediate need. For example, suppose you are cooking food. At that time, you show Gemini a picture and ask it to tell you what to do next. Gemini will be able to follow this instruction immediately.However, it is not yet a completely error-free artificial intelligence. It still sometimes gives wrong information with 100% confidencaae. That is, it considers this wrong information to be correct. This is called ‘hallucination’. The name is apt, to say the least! However, this is Gemini’s biggest flaw. This happens due to bias or various limitations in the data provided for training. Such errors are difficult to correct.Nevertheless, Gemini has become one of the leading AI models of today. Being associated with Google’s services, it can help users in many ways. Although it has defeated ChatGP in various criteria, it cannot be said that it has completely surpassed it. However, you don’t need to be a rocket scientist to understand that Gemini will bring new innovations to the world of artificial intelligence in the future.