Very excited to release our second model, Mixtral 8x7B, an open weight mixture of experts model. Mixtral matches or outperforms Llama 2 70B and GPT3.5 on most benchmarks, and has the inference speed of a 12B dense model. It supports a context length of 32k tokens. (1/n)
magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce&tr=http%3A%2F%https://nitter.app/t.co/g0m9cEUz0T%3A80%2Fannounce RELEASE a6bbd9affe0c2725c1b7410d66833e24
77
555
4,086
2,241,678
Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters. LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B. The weights for all models are open and available at research.facebook.com/public… 1/n
150
1,345
6,506
3,227,103
Today, we are releasing Mistral Large, our latest model. Mistral Large is vastly superior to Mistral Medium, handles 32k tokens of context, and is natively fluent in English, French, Spanish, German, and Italian. We have also updated Mistral Small on our API to a model that is significantly better (and faster) than Mixtral 8x7B. Lastly, we are introducing Le Chat (chat.mistral.ai/), a chat interface (currently in beta) on top of our models.
164
751
5,011
865,196
Unsupervised Translation of Programming Languages. Feed a model with Python, C++, and Java source code from GitHub, and it automatically learns to translate between the 3 languages in a fully unsupervised way. arxiv.org/pdf/2006.03511.pdf with @MaLachaux @b_roziere @LowikChanussot
51
965
3,247
Mistral 7B is out. It outperforms Llama 2 13B on every benchmark we tried. It is also superior to LLaMA 1 34B in code, math, and reasoning, and is released under the Apache 2.0 licence. mistral.ai/news/announcing-m…
magnet:?xt=urn:btih:208b101a0f51514ecf285885a8b0f6fb1a1e4d7d&dn=mistral-7B-v0.1&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=https%3A%2F%https://nitter.app/t.co/HAadNvH1t0%3A443%2Fannounce RELEASE ab979f50d7d406ab8d0b07d09806c72c
46
447
2,758
1,181,952
Today, we release Mistral Large 2, the new version of our largest model. Mistral Large 2 is a 123B-parameter model with a 128k context window. On many benchmarks (notably in code generation and math), it is superior or on par with Llama 3.1 405B. Like Mistral NeMo, it was trained on a very large amount of source code and multilingual data. (1/N)
47
259
2,167
534,892
Our new paper, Deep Learning for Symbolic Mathematics, is now on arXiv arxiv.org/abs/1912.01412 We added *a lot* of new results compared to the original submission. With @f_charton (1/7)
16
505
1,524
Today we are releasing Codestral-22B, our first code model! Codestral is trained on more than 80 programming languages and outperforms the performance of previous code models, including the largest ones. It is available on our API platform, through instruct and fill-in-the-middle endpoints, and can be easily integrated into VScode plugins. You can also use it for free on Le Chat: chat.mistral.ai
43
157
1,151
178,946
Code is now available online with pretrained models! github.com/facebookresearch/…
Unsupervised Translation of Programming Languages. Feed a model with Python, C++, and Java source code from GitHub, and it automatically learns to translate between the 3 languages in a fully unsupervised way. arxiv.org/pdf/2006.03511.pdf with @MaLachaux @b_roziere @LowikChanussot
6
248
881
Excited to release our latest work: arxiv.org/abs/2205.11491 We present a new algorithm, HyperTree Proof Search (HTPS) inspired by the recent success of AlphaZero. Our model is able to prove mathematical theorems in a fully automated way and significantly outperforms the SOTA. 1/n
4
167
758
New paper on code de-obfuscation: arxiv.org/abs/2102.07492 We show that if you obfuscate the name of identifiers in source code, a model can retrieve the original names with very high accuracy. It even works when you remove the name of each variable / function! 1/3
10
164
743
If you want to train BERT from scratch in @PyTorch, you can check out our XLM repository! Our English model outperforms the original BERT on all GLUE tasks, although it's trained on the same data and without the next sentence prediction task github.com/facebookresearch/… @alex_conneau
4
162
738
Our new paper: Large Memory Layers with Product Keys arxiv.org/abs/1907.05242 We created a key-value memory layer that can increase model capacity for a negligible computational cost. A 12-layer transformer with a memory outperforms a 24-layer transformer, and is 2x faster! 1/2
12
210
714
Deep Symbolic Regression for Recurrent Sequences -- arxiv.org/abs/2201.04600 We show that transformers are great at predicting symbolic functions from values, and can predict the recurrence relation of sequences better than Mathematica. You can try it here: bit.ly/3niE5FS
22
157
686
Today we are releasing two small models: Mathstral 7B and Codestral Mamba 7B. On the MATH benchmark, Mathstral 7B obtains 56.6% pass@1, outperforming Minerva 540B by more than 20%. Mathstral scores 68.4% on MATH with majority voting@64, and 74.6% using a reward model. Codestral Mamba is one of the first open source models with a Mamba 2 architecture. It is the best 7B code model available, and is trained with a context length of 256k tokens. Both models are released under the Apache 2 license. mistral.ai/news/mathstral/ mistral.ai/news/codestral-ma…
13
103
690
99,220
We just released two small models, with 3B and 8B parameters. Ministral 3B is exceptionally strong, outperforming Llama 3 8B and our previous Mistral 7B on instruction following benchmarks. mistral.ai/news/ministraux/
18
74
585
92,234
Unlike Chinchilla, PaLM, or GPT-3, we only use datasets publicly available, making our work compatible with open-sourcing and reproducible, while most existing models rely on data which is either not publicly available or undocumented. 2/n
7
38
547
75,325
Mixtral 8x22B Instruct is out. It significantly outperforms existing open models, and only uses 39B active parameters (making it significantly faster than 70B models during inference). 1/n
15
69
553
60,461
LLaMA 65B can run on a MacBook! With a different model architecture it could probably run quite faster (we didn't use multi query, for instance)
Replying to @ggerganov
65B running on m1 max/64gb! 🦙🦙🦙🦙🦙🦙🦙
8
69
526
154,808
Mistral Large 2 (2407) is now on @lmsysorg. It performs extremely well in the Coding, Hard Prompts, Math, and Longer Query categories, where it outperforms GPT4-Turbo and Claude 3 Opus. It is also doing very well in Instruction Following where it ranks above Llama 3.1 405B. Extremely proud of the work accomplished by the @MistralAI team in such a short period of time. Of course, this is only the beginning; we haven't spent much compute yet. Blogpost: mistral.ai/news/mistral-larg… Model weights: huggingface.co/mistralai/Mis…
19
68
502
120,095
Very excited to release our first reasoning model, Magistral. We released the weights of Magistral Small alongside a paper that presents our approach, online RL infrastructure, and findings.
Announcing Magistral, our first reasoning model designed to excel in domain-specific, transparent, and multilingual reasoning.
10
61
514
58,732
Very happy to release our new small model, Mistral NeMo, a 12B model trained in collaboration with @nvidia. Mistral NeMo supports a context window of 128k tokens, comes with a FP8 aligned checkpoint, and performs extremely well on all benchmarks. Check it out! mistral.ai/news/mistral-nemo… blogs.nvidia.com/blog/mistra…
10
73
494
93,915
Due to an unexpected number of requests, Le Chat is temporarily unavailable. We apologize for the inconvenience -- we are working on getting it back up and running as soon as we can, thanks for your patience!
Today, we are releasing Mistral Large, our latest model. Mistral Large is vastly superior to Mistral Medium, handles 32k tokens of context, and is natively fluent in English, French, Spanish, German, and Italian. We have also updated Mistral Small on our API to a model that is significantly better (and faster) than Mixtral 8x7B. Lastly, we are introducing Le Chat (chat.mistral.ai/), a chat interface (currently in beta) on top of our models.
22
21
440
79,578
Le chat now runs Mistral Large at 1000+ tokens/s ! chat.mistral.ai/
No this video is not sped up, genuinely mind blowing And yes this is available to all users right now.
18
34
415
43,758
Le Chat now includes image generation with FLUX1.1, web search, canvas, mistral large with vision capabilities, PDF upload, etc. And it's 100% free! chat.mistral.ai/
We're proud to introduce the next generation of le Chat. Search, PDF upload, coding, image generation, le Canevas... All in one place: chat.mistral.ai/ mistral.ai/news/mistral-chat…
13
58
413
59,679
We just open-sourced MUSE, our library to align embedding spaces in a supervised or unsupervised way, along with multilingual embeddings for 30 languages aligned in the same vector space, and 110 large-scale ground truth bilingual dictionaries: github.com/facebookresearch/…
1
159
361
Our models are available on the @MistralAI API (La Plateforme). It supports JSON format and function calling. We are also making our commercial models available through Azure AI. Read more at: mistral.ai/news/mistral-larg… mistral.ai/news/le-chat-mist… Congrats to all the @MistralAI team for their amazing work on this release!
10
17
321
44,951
I will be at #NeurIPS2023 this week. Feel free to reach out if you want to talk about open source or if you want to know more about @MistralAI (also we are hiring!)
9
15
333
55,078
Last year, we showed that you can outperform a 24-layer transformer in language modeling with just 12 layers and 1 Product-key memory layer. arxiv.org/abs/2010.03881 show that these results also transfer to downstream tasks: BERT large performance with a PKM-augmented BERT base!
Our new paper: Large Memory Layers with Product Keys arxiv.org/abs/1907.05242 We created a key-value memory layer that can increase model capacity for a negligible computational cost. A 12-layer transformer with a memory outperforms a 24-layer transformer, and is 2x faster! 1/2
3
62
331
All our models were trained on at least 1T tokens, much more than what is typically used at this scale. Interestingly, even after 1T tokens the 7B model was still improving. 3/n
6
13
305
141,726
Amazing thread by people who believe our "Deep Learning for Symbolic Math" paper was written to introduce abstract syntax trees... Source: half a sentence cherry picked from the abstract and carefully split into two
4
29
319
Very proud of the small but amazing @MistralAI team for their outstanding work and building so quickly and efficiently. (n/n)
20
6
283
26,463
Just released a small and simple implementation of our Product-Key Memory (PKM) layer. A 12-layer transformer with a single PKM layer outperforms a 24-layer transformer while being almost twice faster! github.com/facebookresearch/…
Our new paper: Large Memory Layers with Product Keys arxiv.org/abs/1907.05242 We created a key-value memory layer that can increase model capacity for a negligible computational cost. A 12-layer transformer with a memory outperforms a 24-layer transformer, and is 2x faster! 1/2
2
86
303
Check out our new paper on cross-lingual language model pretraining! We extend BERT to the cross-lingual setting. Huge improvements on XNLI, Supervised MT, Unsupervised MT. arxiv.org/abs/1901.07291 With @alex_conneau
5
82
266
Mixtral has a similar architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks. For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. (2/n)
2
7
210
30,901
More details about Mixtral can be found at mistral.ai/news/mixtral-of-e… We are also very happy to announce "La plateforme" our early developer platform (in beta & limited access), to access our models through our API: mistral.ai/news/la-plateform… (7/n)
5
11
207
33,251
Mistral fine-tuning API is out ! You can now fine-tune your own Mistral models and deploy them efficiently on La Plateforme : mistral.ai/news/customizatio… In many cases, fine-tuning allows small models to match (and sometimes surpass) the performance of much larger models, but with a significantly lower cost and improved generation speed.
5
24
238
28,139
New paper on unsupervised MT! arxiv.org/abs/1804.07755 We propose two models (neural and phrase based) that both improve the state of the art by more than 11 BLEU. By combining them we reach up to 27 BLEU points on WMT14, without using a single parallel sentence.
2
59
217
Compared to Mistral 7B, Mixtral is significantly stronger in science, in particular in mathematics and code generation. (5/n)
3
10
192
23,959
Super excited about this work! We showed that you can use large language models to align informal mathematical proofs (e.g. written in Latex) to formal proof sketches (e.g. skeletons of proofs written in a formal system like Lean or Isabelle).
Large language models can write informal proofs, translate them into formal ones, and achieve SoTA performance in proving competition-level maths problems! LM-generated informal proofs are sometimes more useful than the human ground truth 🤯 Preprint: arxiv.org/abs/2210.12283 🧵
3
22
204
On Common Sense Reasoning, Closed-book Question Answering, and Reading Comprehension, LLaMA-65B outperforms Chinchilla 70B and PaLM 540B on almost all benchmarks. 4/n
1
8
196
39,025
Mixtral has been trained on a lot of multilingual data and significantly outperforms Llama 2 70B on French, German, Spanish, and Italian benchmarks. (4/n)
2
7
176
27,946
Pixtral models are performing well on LMsys: - Pixtral 12B is on par with Llama 3.2 90B - Pixtral Large 123B is the best open-weight vision model by a large margin
Arena update: Pixtral Large has now overtaken Qwen-VL-72B to become the #1 open model in Vision Arena👀 Congrats @MistralAI on the remarkable open release. Check out the leaderboard to see the latest rankings!
5
27
193
34,873
Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, Mixtral decodes at the speed of a 12B model, while effectively having access to 45B parameters. (3/n)
2
7
169
25,682
Today we are excited to announce: - Pixtral 12B available on le Chat and la Plateforme - A free tier on la Plateforme - A significant price drop across all our models - An updated Mistral Small Release blogpost: mistral.ai/news/september-24…
5
14
186
19,737
We also briefly tried instruction finetuning using the approach of Chung et al. (2022). The resulting model, LLaMA-I, outperforms Flan-PaLM-cont (62B) on MMLU and showcases some interesting instruct capabilities. 7/n
11
7
186
37,436
We just obtained the Best Resource Paper award at @emnlp2019 for our paper "The FLORES Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English." Check it out! aclweb.org/anthology/D19-163… (1/3)
5
31
182
Somebody created a fake account "ai_mistral" and started following ML people. It has now 2k+ followers, but we have no idea who this.. If you follow it, please unfollow & report it !
17
29
177
80,551
Very proud of the Mistral AI team who rebuilt a top-performance MLops stack, and designed a very sophisticated data processing pipeline, from scratch, in less than 3 months.
4
3
174
20,144
Very happy to see our work featured on MIT Technology Review today!
For the first time, @facebookai has trained a neural network to do symbolic reasoning tasks involved in advanced math. bit.ly/2M41RCO
16
171
2/2 papers accepted at @emnlp2018! One is our paper on Unsupervised MT: arxiv.org/pdf/1804.07755.pdf for which we also open-sourced the code: github.com/facebookresearch/… Other one will come soon :) @alex_conneau @LudovicDenoyer
1
39
170
We trained it with GQA and a sliding window of 4096 tokens, resulting in constant cache size and a linear decoding speed. Our changes to FlashAttention v2 and xFormers to support sliding window are available to the community.
3
15
164
21,551
Mistral Medium 3.1 is 2nd on LMArena without style control. Very proud of the @MistralAI team !
🔥@MistralAI Mistral Medium 3.1: Our ‘minor’ update just landed 8th on the @lmarena leaderboard—competitive with models with much larger sizes. 🚀 Smaller, but mightier!
6
18
159
17,952
LLaMA-65B outperforms Minerva-62B on GSM8k, even though it has not been fine-tuned on any mathematical dataset. On the MATH benchmark, it outperforms PaLM-62B (but is quite below Minerva-62B) 5/n
2
6
153
35,479
On code generation benchmarks, LLaMA-62B outperforms cont-PaLM (62B) as well as PaLM-540B.
1
11
141
33,322
Very happy to release Codestral Embed, our first code embedder. It can use up to 3072 dimensions, ordered by relevance. Using embeddings of 256 dimensions with int8 precision is already sufficient to outperform all existing models.
Introducing Codestral Embed, the new state-of-the-art embedding model for code.
7
19
150
16,807
Clever prompt that enables chat interaction with the raw version of LLaMA, without requiring any fine-tuning of the model on instruction data !
Replying to @ggerganov
This is the prompt for anyone interested: github.com/ggerganov/whisper…
2
17
143
49,068
Nice paper. Very good performance by Mistral Large and Mixtral 8x22B on the new GSM1K dataset! Results from Table D (plotted in Figure below) are also a good reminder that models are very sensitive to prompt. In general, it is also good to use sampling & majority voting to reduce the noise on these benchmarks.
Nice paper! Some surprising highlights: 1. Mixtral 8x22B is ~GPT4-turbo level on GSM8K and GSM1K. Mistral large is better on both. 2. On GSM1K, Mixtral-8x22B-Instruct (84.3%) > claude-2 (83.6%) >> claude-3-haiku (79.1%) >> claude-3-sonnet (72.4%) 🤔 Also worth highlighting how different results are with a different prompt.
18
139
44,223
We will be at ICCV in Paris for the next couple of days. Feel free to reach out if you are interested about @MistralAI !
9
8
120
37,150
Huge thanks to our compute providers @CoreWeave and the EuroHPC, @tri_dao and @d_haziza for their help with FlashAttention and xFormers, and @huggingface, vLLM, @skypilot_org, FastChat for their help and support with this release.
5
5
120
20,557
XLM-R, the large scale version of XLM. Super impressive results. A single model trained on 2.5TB of data handles 100 languages, and outperforms mBERT by more than 10% on several classification benchmarks, with up to 21% accuracy on low-resource languages like Swahili and Urdu.
Our new paper: Unsupervised Cross-lingual Representation Learning at Scale arxiv.org/pdf/1911.02116.pdf We release XLM-R, a Transformer MLM trained in 100 langs on 2.5 TB of text data. Double digit gains on XLU benchmarks + strong per-language performance (~XLNet on GLUE). [1/6]
30
122
On HumanEval and on MultiPL-E, Mistral Large 2 outperforms Llama 3.1 405B instruct, and scores just below GPT-4o. On MATH (0-shot, without CoT) it only falls behind GPT-4o. (2/N)
1
11
115
19,852
We are at #NeurIPS2019 this week to present our two papers on Product Key Memory Layers, and Cross-lingual Language Model Pretraining. Please stop by our posters, Thursday at 5pm! Spotlight presentations are at 4:20 and 4:40pm with @alexsablay and @alex_conneau
16
112
Could neural networks find alternatives to classical theories? We show that they can predict abstract mathematical properties of systems involving advanced notions like Fourier transforms, Jacobians, integration. 1/4 arxiv.org/abs/2006.06462 with @Amaury_Hayat and @f_charton
2
21
101
Super excited to have @b_roziere join Mistral to lead our code generation team and build the next generation of Codestral models!
I'm thrilled to announce that I've recently joined @MistralAI! While I'll miss my former colleagues at Meta, I'm excited to continue building models for code generation with the incredible team here.
2
4
95
12,910
Nice review (in French) of le Chat: piped.video/PNGV9o_tsmQ?si=oplL… where Pixtral Large easily answers questions about complex PDFs (~100 pages, scanned, 90° rotated) that ChatGPT and Claude are unable to process.
3
16
89
15,825
XLM now provides cross-lingual BERT models pretrained on up to 100 languages! It's interesting to see that adding more languages minimally impacts the performance on high-resource languages, and even sometimes improves it.
Just released our new XLM/mBERT pytorch model in 100 languages. Significantly outperforms the TensorFlow mBERT OSS model while trained on the same Wikipedia data. bit.ly/2KItiC4 @GuillaumeLample @Thom_Wolf @PyTorch
2
17
92
Check out our last paper on Unsupervised Code Translation: arxiv.org/abs/2110.06773 We show that we can use automatic unit tests to guide the back-translation process by filtering out invalid generations, and improve the translation accuracy by up to 35%! 1/4
New paper on unsupervised code translation! arxiv.org/abs/2110.06773 We show that by using automatically generated unit tests we can filter out invalid back-translation samples, and reduce the error rate by up to 35% in some language pairs!
2
14
86
We leverage the same principles that we used to translate low-resource languages (arxiv.org/abs/1804.07755), i.e. pretraining, denoising auto-encoding, and back-translation. Although initially designed for natural languages, these methods perfectly apply to programming languages.
2
9
78
Compared to the previous Mistral Large, much more effort was dedicated to alignment and instruction capabilities. On WildBench, ArenaHard, and MT Bench, it performs on par with the best models, while being significantly less verbose. (4/N)
1
3
77
16,423
We will be at #ICLR2018 this week to present our 2 papers on Unsupervised Machine Translation (arxiv.org/pdf/1711.00043.pdf) and Word Translation Without Parallel Data (arxiv.org/pdf/1710.04087.pdf). Come check them out! With @alex_conneau
2
22
78
The model successfully translates more than 90% of C++ functions from @geeksforgeeks into Java, and around 57% of Python functions into C++. It outperforms commercial solutions at test time, although it requires no parallel data or expert knowledge.
2
5
70
On Multilingual MMLU, the performance of Mistral Large 2 significantly outperforms Llama 3.1 70B base (+6.3% average over 9 languages) and is on par with Llama 3 405B (-0.4% below). (3/N)
1
4
71
15,448
Today we are releasing a Batch API and a Moderation API ! The batch API allows to process high-volume requests to all Mistral models at 50% lower cost. mistral.ai/news/batch-api/ mistral.ai/news/mistral-mode…
2
5
72
12,784
We will be at ICLR-2024 in Vienna this week with a few people from the @MistralAI team. Feel free to DM if you want to catch up!
2
4
71
15,297
Thanks @geeksforgeeks for their amazing online resources! Parallel datasets for evaluation, pretrained models and code coming soon.
3
1
67
Another great summary of one of our papers by @ykilcher ! Once again, very clear and thorough presentation, with insightful comments at the end.
This LANGUAGE MODEL determines stability properties of differential systems, a task that usually requires multiple steps of high-level math and at least three grad students! 😮 watch the video here piped.video/l12GXD0t_RE @f_charton @Amaury_Hayat @GuillaumeLample @facebookai
1
6
67
What is the difference between Stability.ai / Eleuther.ai / Carper.ai ? I always get confused.
10
3
61
38,701
We create a parallel test set of around 1000 parallel functions, along with associated unit tests. Unlike previous studies that typically evaluate translated functions with BLEU score, we compile and run translations to verify that they successfully pass the unit tests.
2
4
61
A purely neural approach is not sufficient, since it still requires a symbolic framework to check generated hypotheses. Yet, our models perform best on very long inputs, where computer algebra systems struggle. Symbolic computation may benefit from hybrid approaches. (7/7)
3
4
55
"No fancy prompting engineering, no fancy decoding, everything by default."
Is Falcon really better than LLaMA? Short take: probably not. Longer take: we reproduced LLaMA 65B eval on MMLU and we got 61.4, close to the official number (63.4), much higher than its Open LLM Leaderboard number (48.8), and clearly higher than Falcon (52.7). Code and prompt open-sourced at github.com/FranxYao/chain-of… No fancy prompting engineering, no fancy decoding, everything by default. ---- Full story: On OpenLLM Leaderboard (huggingface.co/spaces/Huggin…), Falcon is the top 1, suppressing LLaMA, and promoted by @Thom_Wolf (nitter.app/Thom_Wolf/status…) Yet later @karpathy expressed concern about why on Open LLM Leaderboard, the LLaMA 65B score is significantly lower than official (48.8 v.s. 63.4), see nitter.app/karpathy/status/… We figure that a simple quick open-sourced evaluation script on LLaMA 65B would clarify, so we just did it github.com/FranxYao/chain-of… Again, everything is default, official MMLU prompt, no fancy prompt engineering, no fancy decoding. LLaMA 65B simply can do it. We encourage everyone to try the eval script out. This result makes us continue to hold the belief that the best bet of open-source community to get close to GPT-3.5 is to do RLHF on LLaMA 65B, per our previous discovery in Chain-of-thought Hub arxiv.org/abs/2305.17306 Yet we do not intend to raise wars between LLaMA and Falcon -- both are great open-sourced models and have made significant contribution to the field! Falcon also have the advantage of a easier license, which also gives its great potential to be awesome! 🍻🍻
1
4
59
20,154
Although neural networks struggle on simple arithmetic tasks such as addition and multiplication, we show that transformers perform surprisingly well on difficult mathematical problems such as function integration and differential equations. (2/7)
2
11
49
We just released our new paper on Unsupervised Machine Translation. We can now translate languages using monolingual corpora only! arxiv.org/abs/1711.00043
1
21
55
The model learns to align functions and objects across libraries (std::unordered_set -> HashSet, printf -> System.out.println, std::vector<int> -> List<Integer>, Files.createDirectories -> os.makedirs), but also language specific patterns (a > b ? a : b -> a if a > b else b)
2
2
46
Mixtral 8x22B Instruct is available on our platform, along with all our models: console.mistral.ai/ The base and the instruct models are also both available for download on Hugging Face ! huggingface.co/mistralai/Mix… huggingface.co/mistralai/Mix… 3/n
4
2
44
4,689
As a surprising by-product, our model is capable of approximating out-of-vocabulary constants and functions with its own building blocks. Feed it with sum(1/n^2), and it will predict pi^2/6. Feed it with bessel0, it will find an asymptotic estimate (sin(x)+cos(x))/sqrt(pi*x)
1
5
40
Yes, we will open source our datasets and models soon!
2
44
Very nice visualization of our adversarial approach for Unsupervised Word Translation, where we can see the evolution of the similarity between some English / French word pairs! nitter.app/alex_conneau/status/95…
15
42