IBM has announced that Llama 4, the most recent generation of open models from Meta, has been added to its watsonx.ai AI developer platform. The first mixture of experts (MoE) models from Meta, Llama 4 Scout and Llama 4 Maverick offer high context length and speeds, low cost, and multimodal performance.
Meta had last week released its latest set of AI models, which power the Meta AI assistant across web, Instagram Direct, WhatsApp and Messenger. With Llama 4, different data modalities—including text, image, video—can be integrated, and both support a range of text-in, text-out and image-in, text-out use cases.
According to a company blog, IBM now supports a total of 13 Meta models in watsonx.ai.
Llama 4 Architecture
The smaller model, Llama 4 Scout, has 16 experts and 109 billion total parameters. It can serve more users concurrently because it only has 17 billion active parameters at inference. Llama 4 Scout, trained on 40 trillion tokens of data, maintains low latency and costs while providing performance comparable to or better than models with much higher active parameter counts. It outperforms similar models on benchmarks for coding, reasoning, extended context, and picture understanding despite low computing requirements.
Drawing from its 400 billion total parameters, the bigger Llama 4 Maverick is split up into 128 experts while keeping the same number of active parameters (17 billion) as Scout. According to Meta, Maverick outperforms Google’s Gemini 2.0 Flash and OpenAI’s GPT-4o on a variety of multimodal benchmarks and competes with the considerably larger DeepSeek-V3 in reasoning and coding.
Llama 4 Context Length
Llama 4 Scout maintains accuracy on long-context benchmarks like Needle-in-a-haystack while providing an industry-leading context length of 10 million tokens. This opens up prospects for multi-document summarisation, reasoning over large codebases, and personalisation through a rich history of user activity.
As per Meta, this context length comes from two innovations: use of interleaved attention layers without positional embeddings and inference-time temperature scaling of the Llama 4 models’ attention mechanism.
Native Multimodality
Llama 4 models are built with “native multimodality,” in contrast to large language models, which are often pre-trained just on text data and then modified to other data (like pictures). Meta has pre-trained the models using unlabelled text, image, and video data, thereby enhancing the models with knowledge from various sources.
Llama 4 Models On IBM Watsonx
IBM said that developers and businesses can choose their Llama 4 model from Watsonx.ai and refine, distil, and implement it in cloud, on-premises, or edge environments.
RECOMMENDED FOR YOU

Zuckerberg Says Meta To Build Several Gigawatt-Size Data Centers


India CMOs Under Pressure To Deliver Profitability, See AI As Key Growth Lever: IBM Study


Agentic AI Is Key To Unlocking Real, Scalable Impact Across Enterprises, Says IBM’s Siddhesh Naik


Meta Study Shows Open-Source AI Models Like Llama Catalyst For Economic Growth
