Connect with us

Hi, what are you looking for?

Meta is Developing New Multimodal AI Model Chameleon to Rival OpenAI's GPT-4
Meta is Developing New Multimodal AI Model Chameleon to Rival OpenAI's GPT-4


Meta is Developing New Multimodal AI Model Chameleon to Rival OpenAI’s GPT-4

Following the recent AI competition between OpenAI and Google, Meta’s AI researchers are preparing to enter the fray with their multimodal model. [irp]

Multimodal AI models are advanced versions of large language models, capable of processing various forms of media such as text, images, sound recordings, and videos.

For instance, OpenAI’s latest GPT-4 model can now describe your surroundings when you open your camera and ask it to.

Chameleon: Meta’s Early-Fusion Approach to Multimodal AI

Meta, the parent company of Facebook, is aiming to launch a similar tool with its multimodal model named Chameleon.

Advertisement. Scroll to continue reading.

The Chameleon team describes the model as a series of ‘early-fusion token-based mixed-modal models’ that can understand and generate images and text in any sequence.


Unlike the earlier late-fusion technique, Chameleon’s early-fusion architecture processes data together rather than as separate entities, overcoming the limitations of late fusion.

According to TechXplore, the team has developed a system that integrates different types of data—such as images, text, and code—by converting them into a common set of tokens.

This method, similar to how large language models process words, allows for advanced computing techniques to be applied to mixed input data.

Advertisement. Scroll to continue reading.

With a unified vocabulary, the system can efficiently handle and transform various data types, enhancing the processing and understanding of complex information.

Meta’s Chameleon Outshines Larger Models in Multimodal AI Tasks

Unlike Google’s Gemini, Chameleon is an end-to-end model, handling the entire process directly.

The researchers introduced novel training techniques, including a two-stage learning process and a massive dataset of approximately 4.4 trillion texts, images, or token pairs, along with interleaved data.

The system was trained with 7 billion and then 34 billion parameters over 5 million hours on high-speed GPUs. For comparison, OpenAI’s GPT-4 reportedly has 1 trillion parameters.

Advertisement. Scroll to continue reading.

In a paper posted on the arXiv preprint server, the team shared promising results from testing.

The outcome is a multimodal model with impressive versatility, achieving state-of-the-art performance in image captioning tasks.

The researchers claim this model surpasses Llama-2 in text-only tasks and competes with models like Mixtral 8x7B and Gemini-Pro.

It also performs sophisticated image generation within a single, unified framework. The team asserts that Chameleon matches or even outperforms larger models such as Gemini Pro and GPT-4 based on certain tests.

Advertisement. Scroll to continue reading.
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

under ads

You May Also Like


Child Benefit is a monthly payment provided to parents or guardians of children under 16 years old. This benefit extends until the child turns...


The National Communications Authority (NCA) has permitted Space X Starlink GH LTD, the operator of Starlink Satellite Broadband, to operate satellite broadband services in...


Young Bull, a Ghanaian young rapper Thorsten Owusu Gyimah, popularly known as Yaw Tog is on heat as he release another street anthem. The...


Two online Information Technology degrees have been made available by the Open Institute of Technology (OPIT), an EQF (European Qualification Framework) -accredited higher education...