Meta Llama 4 Maverick, Llama 4 Scout Now Available On IBM’s Watsonx AI Platform

With Llama 4, different data modalities—including text, image, video—can be integrated.

IBM has announced that Llama 4, the most recent generation of open models from Meta, has been added to its watsonx.ai AI developer platform.(Source: IBM)

IBM has announced that Llama 4, the most recent generation of open models from Meta, has been added to its watsonx.ai AI developer platform. The first mixture of experts (MoE) models from Meta, Llama 4 Scout and Llama 4 Maverick offer high context length and speeds, low cost, and multimodal performance.

Meta had last week released its latest set of AI models, which power the Meta AI assistant across web, Instagram Direct, WhatsApp and Messenger. With Llama 4, different data modalities—including text, image, video—can be integrated, and both support a range of text-in, text-out and image-in, text-out use cases.

According to a company blog, IBM now supports a total of 13 Meta models in watsonx.ai. 

Llama 4 Architecture

The smaller model, Llama 4 Scout, has 16 experts and 109 billion total parameters. It can serve more users concurrently because it only has 17 billion active parameters at inference. Llama 4 Scout, trained on 40 trillion tokens of data, maintains low latency and costs while providing performance comparable to or better than models with much higher active parameter counts. It outperforms similar models on benchmarks for coding, reasoning, extended context, and picture understanding despite low computing requirements.

Drawing from its 400 billion total parameters, the bigger Llama 4 Maverick is split up into 128 experts while keeping the same number of active parameters (17 billion) as Scout. According to Meta, Maverick outperforms Google’s Gemini 2.0 Flash and OpenAI’s GPT-4o on a variety of multimodal benchmarks and competes with the considerably larger DeepSeek-V3 in reasoning and coding.

Llama 4 Context Length

Llama 4 Scout maintains accuracy on long-context benchmarks like Needle-in-a-haystack while providing an industry-leading context length of 10 million tokens. This opens up prospects for multi-document summarisation, reasoning over large codebases, and personalisation through a rich history of user activity.

As per Meta, this context length comes from two innovations: use of interleaved attention layers without positional embeddings and inference-time temperature scaling of the Llama 4 models’ attention mechanism.

Native Multimodality

Llama 4 models are built with “native multimodality,” in contrast to large language models, which are often pre-trained just on text data and then modified to other data (like pictures). Meta has pre-trained the models using unlabelled text, image, and video data, thereby enhancing the models with knowledge from various sources.

Llama 4 Models On IBM Watsonx

IBM said that developers and businesses can choose their Llama 4 model from Watsonx.ai and refine, distil, and implement it in cloud, on-premises, or edge environments.

Watch LIVE TV, Get Stock Market Updates, Top Business, IPO and Latest News on NDTV Profit.
GET REGULAR UPDATES