Apple’s server-side model, built for complex tasks using its Private Cloud Compute system, uses a unique design called Parallel-Track Mixture-of-Experts (PT-MoE). Unlike traditional models, it activates only the expert units needed for a specific task. For example, a prompt about cooking triggers only cooking-related experts, according to 9to5Mac.
Instead of a single processing path, the model runs across multiple parallel tracks. Each track contains both standard layers and special “expert” layers that switch on only when required. This makes the model faster and easier to scale.
To further improve performance, Apple added a method called Interleaving Global and Local Attention Layers, which helps the model understand both detailed and big-picture context. According to 9to5Mac, this design improves efficiency while keeping reasoning strong.