Excited to introduce MN-Minitron-8B-Instruct 📗! We've developed an even more powerful instruct model than its parent, Mistral-NeMo-12B, with significant improvements over LLaMa3.1-8B-Instruct as well!
Weights on HF:
huggingface.co/nvidia/Mistra…
Demo:
build.nvidia.com/nvidia/mist…
Our new model outperforms LLaMa3.1-8B-Instruct on key benchmarks, including:
🧮 Math reasoning
🔧 Function calling
🧑🏫 Instruction following
Additionally, our model improves 7 out of 8 metrics of the parent 12B.
This model is a result of combining pruning and distillation, reducing the original Mistral-NeMo-12B-Base model to an efficient 8B, followed by alignment with NeMo Aligner.
Thanks to the community for support that encourages us to release more models!
💡Useful links:
NeMo-Aligner:
github.com/NVIDIA/NeMo-Align…
Minitron paper:
arxiv.org/abs/2408.11796