Apple Offers Glimpse Into Its AI Model Training With New Technical Report

Apple has published a technical document outlining the inner workings of its next-generation AI models, introduced during WWDC 2025. Titled “Apple Intelligence Foundation Language Models – Tech Report 2025,” it delves into how the models were constructed, trained, and evaluated, marking a rare level of transparency from the company on its artificial intelligence efforts.
The report sheds light on a wide range of technical components, including model architecture, training data, efficiency upgrades and multilingual capabilities.
Faster On-Device AI That Uses Less Memory
Apple’s revamped on-device model features around three billion parameters. This model has been split into two segments to improve performance and memory efficiency.
As noted in the document, Block 1 contained 62.5% of the total transformer layers, while Block 2 contained the remaining 37.5% of the transformer layers, but had the key and value projections removed. This translates into a 37.5% reduction in memory usage for caching, and speeds up the generation of the first token of a response. Apple claims this optimisation was achieved without compromising the model’s accuracy or quality, as reported by 9to5Mac.
ALSO READ
New iPhone 17 Pro Images Show Thicker Camera Bar — Check Full Specs, Launch Timeline, Prices
Cloud Model Built For Speed
Apple’s server-side model, built for complex tasks using its Private Cloud Compute system, uses a unique design called Parallel-Track Mixture-of-Experts (PT-MoE). Unlike traditional models, it activates only the expert units needed for a specific task. For example, a prompt about cooking triggers only cooking-related experts, according to 9to5Mac.
Instead of a single processing path, the model runs across multiple parallel tracks. Each track contains both standard layers and special “expert” layers that switch on only when required. This makes the model faster and easier to scale.
To further improve performance, Apple added a method called Interleaving Global and Local Attention Layers, which helps the model understand both detailed and big-picture context. According to 9to5Mac, this design improves efficiency while keeping reasoning strong.
Stronger Support For Global Languages
Apple has increased its multilingual training data from 8% to 30% and expanded its vocabulary from “100k to 150k” tokens. This led to much better performance in non-English languages. To make responses more natural, Apple tested the model using prompts written by native speakers instead of translations. These changes should help tools like Apple’s Writing Tools work better in more languages.
Ethical Data Collection
The report also details how Apple collected data for training its AI models, with a focus on privacy. Most of the data came from publicly available web pages, using Applebot, which respects robots.txt rules so sites can block access. Some content was also licenced, with reports suggesting deals with publishers like Conde Nast, NBC News and IAC.
For visual data, Apple gathered over 10 billion image-caption pairs, including photos, screenshots with text and handwritten notes, and improved them using its own AI tools.