ByteDance announces Doubao-1.5-pro - Includes a "Deep Thinking" mode, surpassing O1-preview and O1 models on the AIME benchmark. - Outperforms deepseek-v3, gpt4o, and llama3.1-405B on popular benchmarks. - Built on a MoE architecture, with activated parameters far fewer than those in the above models. - Achieves a 7x MoE performance leverage—delivering dense model performance with just 1/7 of the activated parameters (e.g., 20B activated params = 140B dense performance). - Engineering-wise, features heterogeneous system design for prefill-decode and attn-fffn, maximizing throughput under low-latency requirements.

Jan 24, 2025 · 9:32 PM UTC

52
267
1,831