Nvidia has broken through prior barriers with their B200 GPUs We have conducted independent benchmarking and are seeing >1,000 output tokens/s on Llama 4 Maverick, >10X the speed of some other providers. This represents the fastest Maverick endpoint that we have benchmarked yet. Exciting times ahead for developers when B200-based APIs are publicly available.

May 23, 2025 · 12:04 AM UTC

10
29
354