🚀 Exciting News! 🚀
In a joint effort between IBM Research, Princeton, CMU, and UIUC, we are thrilled to announce the release of our high-performing hybrid Mamba2 model! This model is trained entirely on open datasets, and we’re releasing intermediate and final checkpoints to enable community experimentation.
🔗 Read more:
huggingface.co/blog/bamba
Key Takeaways
⚡ Inference Efficiency
The Bamba-9B model delivers significant improvements in throughput and latency, enhancing real-time application performance. Benchmarking with vLLM against Llama 3.1 8B for long contexts shows:
🔹 2.5x throughput improvement
🔹 2x lower latency
And this is just the beginning – further optimizations are on the way!
🏆 Competitive Benchmarks
Bamba-9B performs competitively with state-of-the-art transformer models like Meta Llama 3.1 8B. It matches average benchmark performance (excluding math and MMLU tasks), with clear opportunities to close gaps through extended training and math-focused datasets.
🤝 Open Collaboration
Developed entirely with open data, this effort emphasizes transparency and reproducibility, strengthening the foundations of the open-source AI community.
📂 For details, access to the model, and resources, check out the Bamba GitHub repository:
github.com/foundation-model-…
Let’s collaborate, experiment, and innovate together! 🔍✨
@tri_dao @_albertgu @MinjiaZhang -- it is a great collaboration and look forward to continuing to work with you.