LLaMA 4 Scout, equipped with 17 billion active parameters and 16 experts, runs efficiently on a single NVIDIA H100 GPU and introduces an industry-first 10 million token context window. It surpasses competitors like Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 in performance. On the other hand, LLaMA 4 Maverick also features 17 billion active parameters but scales up with 128 experts and a total of 400 billion parameters. It outperforms GPT-4o and Gemini 2.0 Flash in benchmark tests, all while remaining cost-effective