Guillaume Lample @ NeurIPS 2024 · Dec 11, 2023 · 2:20 PM UTC

Guillaume Lample @ NeurIPS 2024

Pinned Tweet

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

11 Dec 2023

Very excited to release our second model, Mixtral 8x7B, an open weight mixture of experts model. Mixtral matches or outperforms Llama 2 70B and GPT3.5 on most benchmarks, and has the inference speed of a 12B dense model. It supports a context length of 32k tokens. (1/n)

Mistral AI

@MistralAI

8 Dec 2023

magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce&tr=http%3A%2F%https://nitter.app/t.co/g0m9cEUz0T%3A80%2Fannounce RELEASE a6bbd9affe0c2725c1b7410d66833e24

555

4,086

2,241,678

Guillaume Lample @ NeurIPS 2024 · Feb 24, 2023 · 4:08 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

24 Feb 2023

Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters. LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B. The weights for all models are open and available at research.facebook.com/public… 1/n

150

1,345

6,506

3,227,103

Guillaume Lample @ NeurIPS 2024 · Feb 26, 2024 · 2:52 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

26 Feb 2024

Today, we are releasing Mistral Large, our latest model. Mistral Large is vastly superior to Mistral Medium, handles 32k tokens of context, and is natively fluent in English, French, Spanish, German, and Italian. We have also updated Mistral Small on our API to a model that is significantly better (and faster) than Mixtral 8x7B. Lastly, we are introducing Le Chat (chat.mistral.ai/), a chat interface (currently in beta) on top of our models.

164

751

5,011

865,196

Guillaume Lample @ NeurIPS 2024 · Jun 8, 2020 · 1:18 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

8 Jun 2020

Unsupervised Translation of Programming Languages. Feed a model with Python, C++, and Java source code from GitHub, and it automatically learns to translate between the 3 languages in a fully unsupervised way. arxiv.org/pdf/2006.03511.pdf with @MaLachaux @b_roziere @LowikChanussot

965

3,247

Guillaume Lample @ NeurIPS 2024 · Sep 27, 2023 · 3:25 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

27 Sep 2023

Mistral 7B is out. It outperforms Llama 2 13B on every benchmark we tried. It is also superior to LLaMA 1 34B in code, math, and reasoning, and is released under the Apache 2.0 licence. mistral.ai/news/announcing-m…

Mistral AI

@MistralAI

27 Sep 2023

magnet:?xt=urn:btih:208b101a0f51514ecf285885a8b0f6fb1a1e4d7d&dn=mistral-7B-v0.1&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=https%3A%2F%https://nitter.app/t.co/HAadNvH1t0%3A443%2Fannounce RELEASE ab979f50d7d406ab8d0b07d09806c72c

447

2,758

1,181,952

Guillaume Lample @ NeurIPS 2024 · Jul 24, 2024 · 3:38 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

24 Jul 2024

Today, we release Mistral Large 2, the new version of our largest model. Mistral Large 2 is a 123B-parameter model with a 128k context window. On many benchmarks (notably in code generation and math), it is superior or on par with Llama 3.1 405B. Like Mistral NeMo, it was trained on a very large amount of source code and multilingual data. (1/N)

Mistral AI

@MistralAI

24 Jul 2024

mistral.ai/news/mistral-larg…

259

2,167

534,892

Guillaume Lample @ NeurIPS 2024 · Dec 4, 2019 · 10:52 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

4 Dec 2019

Our new paper, Deep Learning for Symbolic Mathematics, is now on arXiv arxiv.org/abs/1912.01412 We added *a lot* of new results compared to the original submission. With @f_charton (1/7)

505

1,524

Guillaume Lample @ NeurIPS 2024 · Jun 14, 2023 · 9:43 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

14 Jun 2023

Life update: I recently left Meta, and we are starting Mistral.AI, a new AI company with @arthurmensch and @tlacroix6

Frontier AI LLMs, assistants, agents, services | Mistral

The most powerful AI platform for enterprises. Customize, fine-tune, and deploy AI assistants, autonomous agents, and multimodal AI with open models.

mistral.ai

1,146

335,620

Guillaume Lample @ NeurIPS 2024 · May 29, 2024 · 2:13 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

29 May 2024

Today we are releasing Codestral-22B, our first code model! Codestral is trained on more than 80 programming languages and outperforms the performance of previous code models, including the largest ones. It is available on our API platform, through instruct and fill-in-the-middle endpoints, and can be easily integrated into VScode plugins. You can also use it for free on Le Chat: chat.mistral.ai

157

1,151

178,946

Guillaume Lample @ NeurIPS 2024 · Jul 29, 2020 · 11:06 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

29 Jul 2020

Code is now available online with pretrained models! github.com/facebookresearch/…

GitHub - facebookresearch/TransCoder: Public release of the TransCoder research project https://a...

Public release of the TransCoder research project https://arxiv.org/pdf/2006.03511.pdf - facebookresearch/TransCoder

github.com

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

8 Jun 2020

248

881

Guillaume Lample @ NeurIPS 2024 · May 24, 2022 · 2:29 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

24 May 2022

Excited to release our latest work: arxiv.org/abs/2205.11491 We present a new algorithm, HyperTree Proof Search (HTPS) inspired by the recent success of AlphaZero. Our model is able to prove mathematical theorems in a fully automated way and significantly outperforms the SOTA. 1/n

167

758

Guillaume Lample @ NeurIPS 2024 · Feb 16, 2021 · 1:09 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

16 Feb 2021

New paper on code de-obfuscation: arxiv.org/abs/2102.07492 We show that if you obfuscate the name of identifiers in source code, a model can retrieve the original names with very high accuracy. It even works when you remove the name of each variable / function! 1/3

164

743

Guillaume Lample @ NeurIPS 2024 · Jun 21, 2019 · 9:06 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

21 Jun 2019

If you want to train BERT from scratch in @PyTorch, you can check out our XLM repository! Our English model outperforms the original BERT on all GLUE tasks, although it's trained on the same data and without the next sentence prediction task github.com/facebookresearch/… @alex_conneau

GitHub - facebookresearch/XLM: PyTorch original implementation of Cross-lingual Language Model...

PyTorch original implementation of Cross-lingual Language Model Pretraining. - facebookresearch/XLM

github.com

162

738

Guillaume Lample @ NeurIPS 2024 · Jul 12, 2019 · 11:49 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

12 Jul 2019

Our new paper: Large Memory Layers with Product Keys arxiv.org/abs/1907.05242 We created a key-value memory layer that can increase model capacity for a negligible computational cost. A 12-layer transformer with a memory outperforms a 24-layer transformer, and is 2x faster! 1/2

210

714

Guillaume Lample @ NeurIPS 2024 · Jan 14, 2022 · 4:19 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

14 Jan 2022

Deep Symbolic Regression for Recurrent Sequences -- arxiv.org/abs/2201.04600 We show that transformers are great at predicting symbolic functions from values, and can predict the recurrence relation of sequences better than Mathematica. You can try it here: bit.ly/3niE5FS

157

686

Guillaume Lample @ NeurIPS 2024 · Jul 16, 2024 · 3:17 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

16 Jul 2024

Today we are releasing two small models: Mathstral 7B and Codestral Mamba 7B. On the MATH benchmark, Mathstral 7B obtains 56.6% pass@1, outperforming Minerva 540B by more than 20%. Mathstral scores 68.4% on MATH with majority voting@64, and 74.6% using a reward model. Codestral Mamba is one of the first open source models with a Mamba 2 architecture. It is the best 7B code model available, and is trained with a context length of 256k tokens. Both models are released under the Apache 2 license. mistral.ai/news/mathstral/ mistral.ai/news/codestral-ma…

Mistral AI

@MistralAI

16 Jul 2024

mistral.ai/news/mathstral/ mistral.ai/news/codestral-ma…

103

690

99,220

Guillaume Lample @ NeurIPS 2024 · Sep 19, 2018 · 10:49 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

19 Sep 2018

We just received the best paper award at #emnlp2018 for our work on unsupervised machine translation !! @alex_conneau @LudovicDenoyer Paper: arxiv.org/abs/1804.07755 Code: github.com/facebookresearch/… Blog: code.fb.com/ai-research/unsu…

Phrase-Based & Neural Unsupervised Machine Translation

Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which...

arxiv.org

emnlp2020 @emnlp2020

19 Sep 2018

best long paper 2/2: Phrase-Based & Neural Unsupervised Machine Translation. Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer and Marc'Aurelio Ranzato #emnlp2018

145

641

Guillaume Lample @ NeurIPS 2024 · Oct 16, 2024 · 3:00 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

16 Oct 2024

We just released two small models, with 3B and 8B parameters. Ministral 3B is exceptionally strong, outperforming Llama 3 8B and our previous Mistral 7B on instruction following benchmarks. mistral.ai/news/ministraux/

585

92,234

Guillaume Lample @ NeurIPS 2024 · Feb 24, 2023 · 4:08 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

24 Feb 2023

Unlike Chinchilla, PaLM, or GPT-3, we only use datasets publicly available, making our work compatible with open-sourcing and reproducible, while most existing models rely on data which is either not publicly available or undocumented. 2/n

547

75,325

Guillaume Lample @ NeurIPS 2024 · Apr 17, 2024 · 2:19 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

17 Apr 2024

Mixtral 8x22B Instruct is out. It significantly outperforms existing open models, and only uses 39B active parameters (making it significantly faster than 70B models during inference). 1/n

553

60,461

Guillaume Lample @ NeurIPS 2024 · Mar 11, 2023 · 4:09 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

11 Mar 2023

LLaMA 65B can run on a MacBook! With a different model architecture it could probably run quite faster (we didn't use multi query, for instance)

Lawrence Chen

@lawrencecchen

11 Mar 2023

Replying to @ggerganov

65B running on m1 max/64gb! 🦙🦙🦙🦙🦙🦙🦙

526

154,808

Guillaume Lample @ NeurIPS 2024 · Aug 6, 2024 · 2:45 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

6 Aug 2024

Mistral Large 2 (2407) is now on @lmsysorg. It performs extremely well in the Coding, Hard Prompts, Math, and Longer Query categories, where it outperforms GPT4-Turbo and Claude 3 Opus. It is also doing very well in Instruction Following where it ranks above Llama 3.1 405B. Extremely proud of the work accomplished by the @MistralAI team in such a short period of time. Of course, this is only the beginning; we haven't spent much compute yet. Blogpost: mistral.ai/news/mistral-larg… Model weights: huggingface.co/mistralai/Mis…

502

120,095

Guillaume Lample @ NeurIPS 2024 · Jun 10, 2025 · 2:47 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

10 Jun 2025

Very excited to release our first reasoning model, Magistral. We released the weights of Magistral Small alongside a paper that presents our approach, online RL infrastructure, and findings.

Mistral AI

@MistralAI

10 Jun 2025

Announcing Magistral, our first reasoning model designed to excel in domain-specific, transparent, and multilingual reasoning.

514

58,732

Guillaume Lample @ NeurIPS 2024 · Jul 18, 2024 · 2:52 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

18 Jul 2024

Very happy to release our new small model, Mistral NeMo, a 12B model trained in collaboration with @nvidia. Mistral NeMo supports a context window of 128k tokens, comes with a FP8 aligned checkpoint, and performs extremely well on all benchmarks. Check it out! mistral.ai/news/mistral-nemo… blogs.nvidia.com/blog/mistra…

494

93,915

Guillaume Lample @ NeurIPS 2024 · Feb 26, 2024 · 3:34 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

26 Feb 2024

Due to an unexpected number of requests, Le Chat is temporarily unavailable. We apologize for the inconvenience -- we are working on getting it back up and running as soon as we can, thanks for your patience!

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

26 Feb 2024

440

79,578

Guillaume Lample @ NeurIPS 2024 · Mar 23, 2020 · 5:39 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

23 Mar 2020

The code for our @iclr_conf paper, Deep Learning for Symbolic Mathematics, is now available in @PyTorch! We also provide our datasets and pretrained models Code: github.com/facebookresearch/… Paper: arxiv.org/abs/1912.01412

GitHub - facebookresearch/SymbolicMathematics: Deep Learning for Symbolic Mathematics

Deep Learning for Symbolic Mathematics. Contribute to facebookresearch/SymbolicMathematics development by creating an account on GitHub.

github.com

122

445

Guillaume Lample @ NeurIPS 2024 · Feb 6, 2025 · 7:49 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

6 Feb 2025

Le chat now runs Mistral Large at 1000+ tokens/s ! chat.mistral.ai/

Vibe

Chat with Mistral AI's cutting edge language models.

chat.mistral.ai

Val

@onetwoval

6 Feb 2025

No this video is not sped up, genuinely mind blowing And yes this is available to all users right now.

415

43,758

Guillaume Lample @ NeurIPS 2024 · Nov 18, 2024 · 6:34 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

18 Nov 2024

Le Chat now includes image generation with FLUX1.1, web search, canvas, mistral large with vision capabilities, PDF upload, etc. And it's 100% free! chat.mistral.ai/

Vibe

Chat with Mistral AI's cutting edge language models.

chat.mistral.ai

Mistral AI

@MistralAI

18 Nov 2024

We're proud to introduce the next generation of le Chat. Search, PDF upload, coding, image generation, le Canevas... All in one place: chat.mistral.ai/ mistral.ai/news/mistral-chat…

413

59,679

Guillaume Lample @ NeurIPS 2024 · Dec 21, 2017 · 8:09 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

21 Dec 2017

We just open-sourced MUSE, our library to align embedding spaces in a supervised or unsupervised way, along with multilingual embeddings for 30 languages aligned in the same vector space, and 110 large-scale ground truth bilingual dictionaries: github.com/facebookresearch/…

GitHub - facebookresearch/MUSE: A library for Multilingual Unsupervised or Supervised word Embedd...

A library for Multilingual Unsupervised or Supervised word Embeddings - facebookresearch/MUSE

github.com

159

361

Guillaume Lample @ NeurIPS 2024 · Feb 26, 2024 · 2:52 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

26 Feb 2024

Our models are available on the @MistralAI API (La Plateforme). It supports JSON format and function calling. We are also making our commercial models available through Azure AI. Read more at: mistral.ai/news/mistral-larg… mistral.ai/news/le-chat-mist… Congrats to all the @MistralAI team for their amazing work on this release!

Au Large | Mistral AI

The most powerful AI platform for enterprises. Customize, fine-tune, and deploy AI assistants, autonomous agents, and multimodal AI with open models.

mistral.ai

321

44,951

Guillaume Lample @ NeurIPS 2024 · Dec 10, 2023 · 11:19 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

10 Dec 2023

I will be at #NeurIPS2023 this week. Feel free to reach out if you want to talk about open source or if you want to know more about @MistralAI (also we are hiring!)

333

55,078

Guillaume Lample @ NeurIPS 2024 · Jul 24, 2024 · 3:42 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

24 Jul 2024

You can use Mistral Large 2 on Le Chat -- it's free! chat.mistral.ai/

Mistral AI

@MistralAI

24 Jul 2024

mistral.ai/news/mistral-larg…

341

27,168

Guillaume Lample @ NeurIPS 2024 · Oct 9, 2020 · 4:02 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

9 Oct 2020

Last year, we showed that you can outperform a 24-layer transformer in language modeling with just 12 layers and 1 Product-key memory layer. arxiv.org/abs/2010.03881 show that these results also transfer to downstream tasks: BERT large performance with a PKM-augmented BERT base!

Large Product Key Memory for Pretrained Language Models

Product key memory (PKM) proposed by Lample et al. (2019) enables to improve prediction accuracy by increasing model capacity efficiently with insignificant computational overhead. However, their...

arxiv.org

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

12 Jul 2019

331

Guillaume Lample @ NeurIPS 2024 · Feb 24, 2023 · 4:08 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

24 Feb 2023

All our models were trained on at least 1T tokens, much more than what is typically used at this scale. Interestingly, even after 1T tokens the 7B model was still improving. 3/n

305

141,726

Guillaume Lample @ NeurIPS 2024 · May 20, 2020 · 10:41 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

20 May 2020

Amazing thread by people who believe our "Deep Learning for Symbolic Math" paper was written to introduce abstract syntax trees... Source: half a sentence cherry picked from the abstract and carefully split into two

You’re unable to view this Post because this account owner limits who can view their Posts.

319

Guillaume Lample @ NeurIPS 2024 · Dec 11, 2023 · 2:24 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

11 Dec 2023

Very proud of the small but amazing @MistralAI team for their outstanding work and building so quickly and efficiently. (n/n)

283

26,463

Guillaume Lample @ NeurIPS 2024 · Aug 27, 2019 · 5:14 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

27 Aug 2019

Just released a small and simple implementation of our Product-Key Memory (PKM) layer. A 12-layer transformer with a single PKM layer outperforms a 24-layer transformer while being almost twice faster! github.com/facebookresearch/…

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

12 Jul 2019

303

Guillaume Lample @ NeurIPS 2024 · Dec 24, 2017 · 1:20 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

24 Dec 2017

PyTorch implementation of "Arnold", our DOOM AI that won the 2017 edition of the ViZDoom competition: github.com/glample/Arnold, with @dchaplot

GitHub - glample/Arnold: Arnold - DOOM Agent

Arnold - DOOM Agent. Contribute to glample/Arnold development by creating an account on GitHub.

github.com

279

Guillaume Lample @ NeurIPS 2024 · Jan 23, 2019 · 2:18 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

23 Jan 2019

Check out our new paper on cross-lingual language model pretraining! We extend BERT to the cross-lingual setting. Huge improvements on XNLI, Supervised MT, Unsupervised MT. arxiv.org/abs/1901.07291 With @alex_conneau

266

Guillaume Lample @ NeurIPS 2024 · Sep 3, 2019 · 4:34 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

3 Sep 2019

Two papers accepted to @NeurIPSConf this year, both with a spotlight presentation :) The first is on Cross-lingual Language Model Pretraining arxiv.org/abs/1901.07291 where we extend BERT to the multi-lingual setting, with @alex_conneau (1/3)

Cross-lingual Language Model Pretraining

Recent studies have demonstrated the efficiency of generative pretraining for English natural language understanding. In this work, we extend this approach to multiple languages and show the...

arxiv.org

253

Guillaume Lample @ NeurIPS 2024 · Dec 11, 2023 · 2:20 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

11 Dec 2023

Mixtral has a similar architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks. For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. (2/n)

210

30,901

Guillaume Lample @ NeurIPS 2024 · Dec 11, 2023 · 2:20 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

11 Dec 2023

More details about Mixtral can be found at mistral.ai/news/mixtral-of-e… We are also very happy to announce "La plateforme" our early developer platform (in beta & limited access), to access our models through our API: mistral.ai/news/la-plateform… (7/n)

207

33,251

Guillaume Lample @ NeurIPS 2024 · Jun 5, 2024 · 6:03 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

5 Jun 2024

Mistral fine-tuning API is out ! You can now fine-tune your own Mistral models and deploy them efficiently on La Plateforme : mistral.ai/news/customizatio… In many cases, fine-tuning allows small models to match (and sometimes surpass) the performance of much larger models, but with a significantly lower cost and improved generation speed.

My Tailor is Mistral | Mistral AI

The most powerful AI platform for enterprises. Customize, fine-tune, and deploy AI assistants, autonomous agents, and multimodal AI with open models.

mistral.ai

238

28,139

Guillaume Lample @ NeurIPS 2024 · Apr 23, 2018 · 12:23 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

23 Apr 2018

New paper on unsupervised MT! arxiv.org/abs/1804.07755 We propose two models (neural and phrase based) that both improve the state of the art by more than 11 BLEU. By combining them we reach up to 27 BLEU points on WMT14, without using a single parallel sentence.

Phrase-Based & Neural Unsupervised Machine Translation

Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which...

arxiv.org

217

Guillaume Lample @ NeurIPS 2024 · Dec 11, 2023 · 2:20 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

11 Dec 2023

Compared to Mistral 7B, Mixtral is significantly stronger in science, in particular in mathematics and code generation. (5/n)

192

23,959

Guillaume Lample @ NeurIPS 2024 · Oct 25, 2022 · 1:09 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

25 Oct 2022

Super excited about this work! We showed that you can use large language models to align informal mathematical proofs (e.g. written in Latex) to formal proof sketches (e.g. skeletons of proofs written in a formal system like Lean or Isabelle).

Albert Jiang @AlbertQJiang

25 Oct 2022

Large language models can write informal proofs, translate them into formal ones, and achieve SoTA performance in proving competition-level maths problems! LM-generated informal proofs are sometimes more useful than the human ground truth 🤯 Preprint: arxiv.org/abs/2210.12283 🧵

204

Guillaume Lample @ NeurIPS 2024 · Feb 24, 2023 · 4:08 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

24 Feb 2023

On Common Sense Reasoning, Closed-book Question Answering, and Reading Comprehension, LLaMA-65B outperforms Chinchilla 70B and PaLM 540B on almost all benchmarks. 4/n

196

39,025

Guillaume Lample @ NeurIPS 2024 · Dec 11, 2023 · 2:20 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

11 Dec 2023

Mixtral has been trained on a lot of multilingual data and significantly outperforms Llama 2 70B on French, German, Spanish, and Italian benchmarks. (4/n)

176

27,946

Guillaume Lample @ NeurIPS 2024 · Dec 4, 2024 · 10:13 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

4 Dec 2024

Pixtral models are performing well on LMsys: - Pixtral 12B is on par with Llama 3.2 90B - Pixtral Large 123B is the best open-weight vision model by a large margin

Arena.ai

@arena

3 Dec 2024

Arena update: Pixtral Large has now overtaken Qwen-VL-72B to become the #1 open model in Vision Arena👀 Congrats @MistralAI on the remarkable open release. Check out the leaderboard to see the latest rankings!

193

34,873

Guillaume Lample @ NeurIPS 2024 · Dec 11, 2023 · 2:20 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

11 Dec 2023

Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, Mixtral decodes at the speed of a 12B model, while effectively having access to 45B parameters. (3/n)

169

25,682

Guillaume Lample @ NeurIPS 2024 · Sep 17, 2024 · 8:59 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

17 Sep 2024

Today we are excited to announce: - Pixtral 12B available on le Chat and la Plateforme - A free tier on la Plateforme - A significant price drop across all our models - An updated Mistral Small Release blogpost: mistral.ai/news/september-24…

Mistral AI

@MistralAI

17 Sep 2024

mistral.ai/news/september-24… 1/2

186

19,737

Guillaume Lample @ NeurIPS 2024 · Feb 24, 2023 · 4:08 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

24 Feb 2023

We also briefly tried instruction finetuning using the approach of Chung et al. (2022). The resulting model, LLaMA-I, outperforms Flan-PaLM-cont (62B) on MMLU and showcases some interesting instruct capabilities. 7/n

186

37,436

Guillaume Lample @ NeurIPS 2024 · Nov 7, 2019 · 12:50 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

7 Nov 2019

We just obtained the Best Resource Paper award at @emnlp2019 for our paper "The FLORES Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English." Check it out! aclweb.org/anthology/D19-163… (1/3)

182

Guillaume Lample @ NeurIPS 2024 · Jun 17, 2023 · 5:25 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

17 Jun 2023

Somebody created a fake account "ai_mistral" and started following ML people. It has now 2k+ followers, but we have no idea who this.. If you follow it, please unfollow & report it !

177

80,551

Guillaume Lample @ NeurIPS 2024 · Sep 27, 2023 · 3:25 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

27 Sep 2023

Very proud of the Mistral AI team who rebuilt a top-performance MLops stack, and designed a very sophisticated data processing pipeline, from scratch, in less than 3 months.

174

20,144

Guillaume Lample @ NeurIPS 2024 · Dec 17, 2019 · 3:58 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

17 Dec 2019

Very happy to see our work featured on MIT Technology Review today!

MIT Technology Review

@techreview

17 Dec 2019

For the first time, @facebookai has trained a neural network to do symbolic reasoning tasks involved in advanced math. bit.ly/2M41RCO

171

Guillaume Lample @ NeurIPS 2024 · Aug 15, 2018 · 12:39 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

15 Aug 2018

2/2 papers accepted at @emnlp2018! One is our paper on Unsupervised MT: arxiv.org/pdf/1804.07755.pdf for which we also open-sourced the code: github.com/facebookresearch/… Other one will come soon :) @alex_conneau @LudovicDenoyer

170

Guillaume Lample @ NeurIPS 2024 · Sep 27, 2023 · 3:25 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

27 Sep 2023

We trained it with GQA and a sliding window of 4096 tokens, resulting in constant cache size and a linear decoding speed. Our changes to FlashAttention v2 and xFormers to support sliding window are available to the community.

164

21,551

Guillaume Lample @ NeurIPS 2024 · Aug 22, 2025 · 10:11 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

22 Aug 2025

Mistral Medium 3.1 is 2nd on LMArena without style control. Very proud of the @MistralAI team !

Sophia Yang, Ph.D.

@sophiamyang

22 Aug 2025

🔥@MistralAI Mistral Medium 3.1: Our ‘minor’ update just landed 8th on the @lmarena leaderboard—competitive with models with much larger sizes. 🚀 Smaller, but mightier!

159

17,952

Guillaume Lample @ NeurIPS 2024 · Feb 24, 2023 · 4:08 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

24 Feb 2023

LLaMA-65B outperforms Minerva-62B on GSM8k, even though it has not been fine-tuned on any mathematical dataset. On the MATH benchmark, it outperforms PaLM-62B (but is quite below Minerva-62B) 5/n

153

35,479

Guillaume Lample @ NeurIPS 2024 · May 29, 2024 · 2:13 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

29 May 2024

We are also releasing the weights on HuggingFace! huggingface.co/mistralai/Cod… More details on our blogpost: mistral.ai/news/codestral/

mistralai/Codestral-22B-v0.1 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

153

118,517

Guillaume Lample @ NeurIPS 2024 · Jun 5, 2024 · 11:46 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

5 Jun 2024

good thread: teddit.net/r/LocalLLaMA/comm… Codestral is now the second most popular model on chat.mistral.ai/ (and it's free!)

From the LocalLLaMA community on Reddit: Codestral solved a problem in two messages that I couldn't...

Explore this post and more from the LocalLLaMA community

reddit.com

Rohan Paul

@rohanpaul_ai

4 Jun 2024

Codestral is underrated.

148

31,339

Guillaume Lample @ NeurIPS 2024 · Feb 24, 2023 · 4:08 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

24 Feb 2023

On code generation benchmarks, LLaMA-62B outperforms cont-PaLM (62B) as well as PaLM-540B.

141

33,322

Guillaume Lample @ NeurIPS 2024 · May 28, 2025 · 2:40 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

28 May 2025

Very happy to release Codestral Embed, our first code embedder. It can use up to 3072 dimensions, ordered by relevance. Using embeddings of 256 dimensions with int8 precision is already sufficient to outperform all existing models.

Mistral AI

@MistralAI

28 May 2025

Introducing Codestral Embed, the new state-of-the-art embedding model for code.

150

16,807

Guillaume Lample @ NeurIPS 2024 · Mar 27, 2023 · 7:42 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

27 Mar 2023

Clever prompt that enables chat interaction with the raw version of LLaMA, without requiring any fine-tuning of the model on instruction data !

Georgi Gerganov

@ggerganov

26 Mar 2023

Replying to @ggerganov

This is the prompt for anyone interested: github.com/ggerganov/whisper…

143

49,068

Guillaume Lample @ NeurIPS 2024 · Feb 24, 2023 · 4:08 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

24 Feb 2023

With @HugoTouvron, @LavrilThibaut, @gizacard, @javier_m, @MaLachaux, @tlacroix6, @b_roziere, @NamanGoyal21, Eric Hambro, Faisal Azhar, @AurR0d, @armandjoulin, @EXGRV 8/8

139

32,101

Guillaume Lample @ NeurIPS 2024 · May 2, 2024 · 2:05 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

2 May 2024

Nice paper. Very good performance by Mistral Large and Mixtral 8x22B on the new GSM1K dataset! Results from Table D (plotted in Figure below) are also a good reminder that models are very sensitive to prompt. In general, it is also good to use sampling & majority voting to reduce the noise on these benchmarks.

Albert Jiang @AlbertQJiang

2 May 2024

Nice paper! Some surprising highlights: 1. Mixtral 8x22B is ~GPT4-turbo level on GSM8K and GSM1K. Mistral large is better on both. 2. On GSM1K, Mixtral-8x22B-Instruct (84.3%) > claude-2 (83.6%) >> claude-3-haiku (79.1%) >> claude-3-sonnet (72.4%) 🤔 Also worth highlighting how different results are with a different prompt.

139

44,223

Guillaume Lample @ NeurIPS 2024 · Oct 2, 2023 · 7:14 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

2 Oct 2023

We will be at ICCV in Paris for the next couple of days. Feel free to reach out if you are interested about @MistralAI !

120

37,150

Guillaume Lample @ NeurIPS 2024 · Sep 27, 2023 · 3:25 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

27 Sep 2023

Huge thanks to our compute providers @CoreWeave and the EuroHPC, @tri_dao and @d_haziza for their help with FlashAttention and xFormers, and @huggingface, vLLM, @skypilot_org, FastChat for their help and support with this release.

120

20,557

Guillaume Lample @ NeurIPS 2024 · Nov 7, 2019 · 10:16 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

7 Nov 2019

XLM-R, the large scale version of XLM. Super impressive results. A single model trained on 2.5TB of data handles 100 languages, and outperforms mBERT by more than 10% on several classification benchmarks, with up to 21% accuracy on low-resource languages like Swahili and Urdu.

Alexis Conneau

@alex_conneau

7 Nov 2019

Our new paper: Unsupervised Cross-lingual Representation Learning at Scale arxiv.org/pdf/1911.02116.pdf We release XLM-R, a Transformer MLM trained in 100 langs on 2.5 TB of text data. Double digit gains on XLU benchmarks + strong per-language performance (~XLNet on GLUE). [1/6]

122

Guillaume Lample @ NeurIPS 2024 · Jul 24, 2024 · 3:38 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

24 Jul 2024

On HumanEval and on MultiPL-E, Mistral Large 2 outperforms Llama 3.1 405B instruct, and scores just below GPT-4o. On MATH (0-shot, without CoT) it only falls behind GPT-4o. (2/N)

115

19,852

Guillaume Lample @ NeurIPS 2024 · Dec 10, 2019 · 11:14 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

10 Dec 2019

We are at #NeurIPS2019 this week to present our two papers on Product Key Memory Layers, and Cross-lingual Language Model Pretraining. Please stop by our posters, Thursday at 5pm! Spotlight presentations are at 4:20 and 4:40pm with @alexsablay and @alex_conneau

112

Guillaume Lample @ NeurIPS 2024 · Jul 24, 2024 · 3:38 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

24 Jul 2024

The model is available (for research purposes only!) on HuggingFace: huggingface.co/mistralai/Mis… Blog post: mistral.ai/news/mistral-larg…

mistralai/Mistral-Large-Instruct-2407 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

100

14,131

Guillaume Lample @ NeurIPS 2024 · Jun 12, 2020 · 2:20 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

12 Jun 2020

Could neural networks find alternatives to classical theories? We show that they can predict abstract mathematical properties of systems involving advanced notions like Fourier transforms, Jacobians, integration. 1/4 arxiv.org/abs/2006.06462 with @Amaury_Hayat and @f_charton

101

Guillaume Lample @ NeurIPS 2024 · Sep 13, 2024 · 9:03 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

13 Sep 2024

Super excited to have @b_roziere join Mistral to lead our code generation team and build the next generation of Codestral models!

Baptiste Rozière

@b_roziere

13 Sep 2024

I'm thrilled to announce that I've recently joined @MistralAI! While I'll miss my former colleagues at Meta, I'm excited to continue building models for code generation with the incredible team here.

12,910

Guillaume Lample @ NeurIPS 2024 · Nov 21, 2024 · 11:28 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

21 Nov 2024

Nice review (in French) of le Chat: piped.video/PNGV9o_tsmQ?si=oplL… where Pixtral Large easily answers questions about complex PDFs (~100 pages, scanned, 90° rotated) that ChatGPT and Claude are unable to process.

Le Chat Mistral AI : Tuto pour Maitriser le ChatGPT Français

Mistral AI vient de frapper un grand coup dans le monde de l'IA en ...

youtube.com

15,825

Guillaume Lample @ NeurIPS 2024 · Aug 19, 2019 · 9:15 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

19 Aug 2019

XLM now provides cross-lingual BERT models pretrained on up to 100 languages! It's interesting to see that adding more languages minimally impacts the performance on high-resource languages, and even sometimes improves it.

Alexis Conneau

@alex_conneau

19 Aug 2019

Just released our new XLM/mBERT pytorch model in 100 languages. Significantly outperforms the TensorFlow mBERT OSS model while trained on the same Wikipedia data. bit.ly/2KItiC4 @GuillaumeLample @Thom_Wolf @PyTorch

Guillaume Lample @ NeurIPS 2024 · Oct 19, 2021 · 2:28 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

19 Oct 2021

Check out our last paper on Unsupervised Code Translation: arxiv.org/abs/2110.06773 We show that we can use automatic unit tests to guide the back-translation process by filtering out invalid generations, and improve the translation accuracy by up to 35%! 1/4

Baptiste Rozière

@b_roziere

18 Oct 2021

New paper on unsupervised code translation! arxiv.org/abs/2110.06773 We show that by using automatically generated unit tests we can filter out invalid back-translation samples, and reduce the error rate by up to 35% in some language pairs!

Guillaume Lample @ NeurIPS 2024 · Jan 30, 2018 · 9:06 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

30 Jan 2018

Our 2 papers are accepted at #ICLR2018 !! * Word translation without parallel data: openreview.net/forum?id=H196… * Unsupervised Machine Translation Using Monolingual Corpora Only: openreview.net/forum?id=rkYT… (With @alex_conneau , @LudovicDenoyer )

Word translation without parallel data

Aligning languages without the Rosetta Stone: with no parallel data, we construct bilingual dictionaries using adversarial training, cross-domain local scaling, and an accurate proxy criterion for...

openreview.net

Guillaume Lample @ NeurIPS 2024 · Jun 8, 2020 · 1:18 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

8 Jun 2020

We leverage the same principles that we used to translate low-resource languages (arxiv.org/abs/1804.07755), i.e. pretraining, denoising auto-encoding, and back-translation. Although initially designed for natural languages, these methods perfectly apply to programming languages.

Phrase-Based & Neural Unsupervised Machine Translation

Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which...

arxiv.org

Guillaume Lample @ NeurIPS 2024 · Jul 24, 2024 · 3:38 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

24 Jul 2024

Compared to the previous Mistral Large, much more effort was dedicated to alignment and instruction capabilities. On WildBench, ArenaHard, and MT Bench, it performs on par with the best models, while being significantly less verbose. (4/N)

16,423

Guillaume Lample @ NeurIPS 2024 · Apr 30, 2018 · 2:28 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

30 Apr 2018

We will be at #ICLR2018 this week to present our 2 papers on Unsupervised Machine Translation (arxiv.org/pdf/1711.00043.pdf) and Word Translation Without Parallel Data (arxiv.org/pdf/1710.04087.pdf). Come check them out! With @alex_conneau

Guillaume Lample @ NeurIPS 2024 · Jun 8, 2020 · 1:18 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

8 Jun 2020

The model successfully translates more than 90% of C++ functions from @geeksforgeeks into Java, and around 57% of Python functions into C++. It outperforms commercial solutions at test time, although it requires no parallel data or expert knowledge.

Guillaume Lample @ NeurIPS 2024 · Jul 24, 2024 · 3:38 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

24 Jul 2024

On Multilingual MMLU, the performance of Mistral Large 2 significantly outperforms Llama 3.1 70B base (+6.3% average over 9 languages) and is on par with Llama 3 405B (-0.4% below). (3/N)

15,448

Guillaume Lample @ NeurIPS 2024 · Nov 7, 2024 · 9:36 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

7 Nov 2024

Today we are releasing a Batch API and a Moderation API ! The batch API allows to process high-volume requests to all Mistral models at 50% lower cost. mistral.ai/news/batch-api/ mistral.ai/news/mistral-mode…

Mistral batch API

Lower cost API for AI builders.

mistral.ai

Mistral AI

@MistralAI

7 Nov 2024

Moderation API - mistral.ai/news/mistral-mode… Batch API - mistral.ai/news/batch-api/

12,784

Guillaume Lample @ NeurIPS 2024 · May 8, 2024 · 12:48 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

8 May 2024

We will be at ICLR-2024 in Vienna this week with a few people from the @MistralAI team. Feel free to DM if you want to catch up!

15,297

Guillaume Lample @ NeurIPS 2024 · Jun 8, 2020 · 1:18 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

8 Jun 2020

Thanks @geeksforgeeks for their amazing online resources! Parallel datasets for evaluation, pretrained models and code coming soon.

Guillaume Lample @ NeurIPS 2024 · Jun 13, 2020 · 7:24 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

13 Jun 2020

Another great summary of one of our papers by @ykilcher ! Once again, very clear and thorough presentation, with insightful comments at the end.

Yannic Kilcher 🇸🇨

@ykilcher

13 Jun 2020

This LANGUAGE MODEL determines stability properties of differential systems, a task that usually requires multiple steps of high-level math and at least three grad students! 😮 watch the video here piped.video/l12GXD0t_RE @f_charton @Amaury_Hayat @GuillaumeLample @facebookai

Guillaume Lample @ NeurIPS 2024 · Mar 20, 2023 · 9:03 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

20 Mar 2023

What is the difference between Stability.ai / Eleuther.ai / Carper.ai ? I always get confused.

38,701

Guillaume Lample @ NeurIPS 2024 · Jun 8, 2020 · 1:18 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

8 Jun 2020

We create a parallel test set of around 1000 parallel functions, along with associated unit tests. Unlike previous studies that typically evaluate translated functions with BLEU score, we compile and run translations to verify that they successfully pass the unit tests.

Guillaume Lample @ NeurIPS 2024 · Dec 4, 2019 · 10:52 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

4 Dec 2019

A purely neural approach is not sufficient, since it still requires a symbolic framework to check generated hypotheses. Yet, our models perform best on very long inputs, where computer algebra systems struggle. Symbolic computation may benefit from hybrid approaches. (7/7)

Guillaume Lample @ NeurIPS 2024 · Jun 8, 2023 · 5:30 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

8 Jun 2023

"No fancy prompting engineering, no fancy decoding, everything by default."

Yao Fu

@Francis_YAO_

8 Jun 2023

Is Falcon really better than LLaMA? Short take: probably not. Longer take: we reproduced LLaMA 65B eval on MMLU and we got 61.4, close to the official number (63.4), much higher than its Open LLM Leaderboard number (48.8), and clearly higher than Falcon (52.7). Code and prompt open-sourced at github.com/FranxYao/chain-of… No fancy prompting engineering, no fancy decoding, everything by default. ---- Full story: On OpenLLM Leaderboard (huggingface.co/spaces/Huggin…), Falcon is the top 1, suppressing LLaMA, and promoted by @Thom_Wolf (nitter.app/Thom_Wolf/status…) Yet later @karpathy expressed concern about why on Open LLM Leaderboard, the LLaMA 65B score is significantly lower than official (48.8 v.s. 63.4), see nitter.app/karpathy/status/… We figure that a simple quick open-sourced evaluation script on LLaMA 65B would clarify, so we just did it github.com/FranxYao/chain-of… Again, everything is default, official MMLU prompt, no fancy prompt engineering, no fancy decoding. LLaMA 65B simply can do it. We encourage everyone to try the eval script out. This result makes us continue to hold the belief that the best bet of open-source community to get close to GPT-3.5 is to do RLHF on LLaMA 65B, per our previous discovery in Chain-of-thought Hub arxiv.org/abs/2305.17306 Yet we do not intend to raise wars between LLaMA and Falcon -- both are great open-sourced models and have made significant contribution to the field! Falcon also have the advantage of a easier license, which also gives its great potential to be awesome! 🍻🍻

20,154

Guillaume Lample @ NeurIPS 2024 · Dec 4, 2019 · 10:52 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

4 Dec 2019

Although neural networks struggle on simple arithmetic tasks such as addition and multiplication, we show that transformers perform surprisingly well on difficult mathematical problems such as function integration and differential equations. (2/7)

Guillaume Lample @ NeurIPS 2024 · Nov 2, 2017 · 11:02 AM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

2 Nov 2017

We just released our new paper on Unsupervised Machine Translation. We can now translate languages using monolingual corpora only! arxiv.org/abs/1711.00043

Guillaume Lample @ NeurIPS 2024 · Sep 3, 2019 · 4:34 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

3 Sep 2019

In the second arxiv.org/abs/1907.05242 we show that adding a Product-Key Memory Layer in a transformer is as efficient as doubling the number of layers in terms of performance, and has no impact on running time. with @alexsablay @hjegou @LudovicDenoyer Marc'Aurelio Ranzato (2/3)

Large Memory Layers with Product Keys

This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by...

arxiv.org

Guillaume Lample @ NeurIPS 2024 · Jun 8, 2020 · 1:18 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

8 Jun 2020

The model learns to align functions and objects across libraries (std::unordered_set -> HashSet, printf -> System.out.println, std::vector<int> -> List<Integer>, Files.createDirectories -> os.makedirs), but also language specific patterns (a > b ? a : b -> a if a > b else b)

Guillaume Lample @ NeurIPS 2024 · Apr 17, 2024 · 2:19 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

17 Apr 2024

Mixtral 8x22B Instruct is available on our platform, along with all our models: console.mistral.ai/ The base and the instruct models are also both available for download on Hugging Face ! huggingface.co/mistralai/Mix… huggingface.co/mistralai/Mix… 3/n

4,689

Guillaume Lample @ NeurIPS 2024 · Jan 14, 2022 · 4:19 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

14 Jan 2022

As a surprising by-product, our model is capable of approximating out-of-vocabulary constants and functions with its own building blocks. Feed it with sum(1/n^2), and it will predict pi^2/6. Feed it with bessel0, it will find an asymptotic estimate (sin(x)+cos(x))/sqrt(pi*x)

Guillaume Lample @ NeurIPS 2024 · Dec 4, 2019 · 3:25 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

4 Dec 2019

Replying to @AndrewTouchet @f_charton

Yes, we will open source our datasets and models soon!

Guillaume Lample @ NeurIPS 2024 · Feb 1, 2018 · 4:46 PM UTC

Guillaume Lample @ NeurIPS 2024

@GuillaumeLample

1 Feb 2018

Very nice visualization of our adversarial approach for Unsupervised Word Translation, where we can see the evolution of the similarity between some English / French word pairs! nitter.app/alex_conneau/status/95…