ByteDance announces Doubao-1.5-pro
- Includes a "Deep Thinking" mode, surpassing O1-preview and O1 models on the AIME benchmark.
- Outperforms deepseek-v3, gpt4o, and llama3.1-405B on popular benchmarks.
- Built on a MoE architecture, with activated parameters far fewer than those in the above models.
- Achieves a 7x MoE performance leverage—delivering dense model performance with just 1/7 of the activated parameters (e.g., 20B activated params = 140B dense performance).
- Engineering-wise, features heterogeneous system design for prefill-decode and attn-fffn, maximizing throughput under low-latency requirements.
Jan 24, 2025 · 9:32 PM UTC
52
267
1,831

