TimDarcet · Apr 21, 2023 · 3:31 PM UTC

TimDarcet

Pinned Tweet

21 Apr 2023

1/ This week we released DINOv2: a series of general vision encoders pretrained without supervision. Good out-of-the-box performance on a variety of domains, matching or surpassing other publicly available encoders.

112

703

124,778

TimDarcet · Sep 29, 2023 · 2:49 PM UTC

TimDarcet @TimDarcet

29 Sep 2023

Vision transformers need registers! Or at least, it seems they 𝘸𝘢𝘯𝘵 some… ViTs have artifacts in attention maps. It’s due to the model using these patches as “registers”. Just add new tokens (“[reg]”): - no artifacts - interpretable attention maps 🦖 - improved performances!

303

1,976

466,628

TimDarcet · Jan 7, 2025 · 2:14 PM UTC

TimDarcet @TimDarcet

7 Jan 2025

Thanks python, very helpful

821

893,483

TimDarcet · Apr 2, 2025 · 3:51 PM UTC

TimDarcet @TimDarcet

2 Apr 2025

"Massive activations in LLMS" is the paper you need and that everyone should read

Seunghyun Seo @SeunghyunSEO7

2 Apr 2025

what happens in the residual stream of gemma3? l2 norm of activation explodes at the end of every transformer block after x=x+res. key architectural difference between gemma2 and 3 is softcapping vs qknorm. 1b is not even multimodal (fig reps gemma2-2b vs 3-1b). what's wrong?

609

67,593

TimDarcet · Feb 14, 2025 · 1:17 PM UTC

TimDarcet @TimDarcet

14 Feb 2025

Want strong SSL, but not the complexity of DINOv2? CAPI: Cluster and Predict Latents Patches for Improved Masked Image Modeling.

108

604

160,878

TimDarcet · Oct 27, 2023 · 4:37 PM UTC

TimDarcet @TimDarcet

27 Oct 2023

DINOv2+registers=♥️ We are releasing code and checkpoints for DINOv2 augmented with registers and a slightly better training recipe. No more of those pesky artifacts! Simple one-liner, try it out: dinov2_vitg14_reg = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitg14_reg')

479

72,294

TimDarcet · May 11, 2025 · 7:52 PM UTC

TimDarcet @TimDarcet

11 May 2025

Is there a good reason we use softmax losses in contrastive learning, instead of just doing MSE? ie L = ||xi-xi'||² - lambda sum_k ||xi-xk'||² I'd guess the optimization dynamics are maybe friendlier, but does anyone have a good pointer? Both for CLIP and SSL btw

474

109,653

TimDarcet · Feb 14, 2025 · 5:55 PM UTC

TimDarcet @TimDarcet

14 Feb 2025

Also: yes, it's a JEPA. Yes, you hated on @ylecun , but he was right. Yes, as usual

TimDarcet @TimDarcet

14 Feb 2025

Want strong SSL, but not the complexity of DINOv2? CAPI: Cluster and Predict Latents Patches for Improved Masked Image Modeling.

354

87,477

TimDarcet · Aug 22, 2025 · 8:46 AM UTC

TimDarcet @TimDarcet

22 Aug 2025

Qq has anyone ever seen the best AI researcher and the best sion euw in the same room because if not guys I've got a theory

352

34,135

TimDarcet · Mar 19, 2025 · 5:10 PM UTC

TimDarcet @TimDarcet

19 Mar 2025

Funniest bug of my phd: model loses 1 point if pretrain and eval use different conda env The difference was libjpeg vs libjpeg-turbo iiuc the jpeg algo is not entirely standardized (wtf?) and libjpeg != libjpeg-turbo Tiny differences in decoding artifacts caused a 1 point drop!

vik

@vikhyatk

17 Mar 2025

if you train a model exclusively on JPEG images, will performance drop on other image file formats?

335

25,490

TimDarcet · Jul 15, 2024 · 9:53 AM UTC

TimDarcet @TimDarcet

15 Jul 2024

Still not sure why the ML community adopted conda instead of plain old virtualenv

309

58,689

TimDarcet · Oct 19, 2024 · 10:42 PM UTC

TimDarcet @TimDarcet

19 Oct 2024

Alright actual serious post. Lingua := super simple codebase + torch.compile for speed --> clean, hackable, but still efficient *It can train a 7B >llama2 in 24h*. Crazy. If you got the gpus, not only can you train a good 7B, you can *iterate* on it. You can do *research*

TimDarcet @TimDarcet

18 Oct 2024

🚨 RELEASE ALERT ‼️ github.com/facebookresearch/… THIS CHANGES EVERYTHING $META just dropped a game-changing codebase! Now everyone can do LLM research! 😱 🧵10 best things people are already building with lingua 🔥👇

287

49,207

TimDarcet · Apr 22, 2025 · 3:19 PM UTC

TimDarcet @TimDarcet

22 Apr 2025

I did not realize people used frameworks for simple distributed trainings. Tip: for 80% of trainings you just need DDP, and it's trivial to setup For the rest go with fsdp (either pytorch fsdp2 or the single-file fsdp in the CAPI repo)

Ben (no treats)

@andersonbcdefg

20 Apr 2025

wait. distributed training with pure pytorch is not that bad. why did we all collectively get gaslit into using accelerate...

261

37,408

TimDarcet · Feb 26, 2024 · 3:40 PM UTC

TimDarcet @TimDarcet

26 Feb 2024

Mistral's "Le Chat" logo is a design masterclass The two dots make a smol cat

225

17,245

TimDarcet · May 31, 2024 · 6:53 PM UTC

TimDarcet @TimDarcet

31 May 2024

Bonus trick: you can remove the gradient reduction of the first backward (which is useless) by wrapping in no_sync() Remember to also include the forward pass in the no_sync context, else it does not work

Gabriele Berton

@gabriberton

31 May 2024

This simple pytorch trick will cut in half your GPU memory use / double your batch size (for real). Instead of adding losses and then computing backward, it's better to compute the backward on each loss (which frees the computational graph). Results will be exactly identical

232

33,518

TimDarcet · Apr 27, 2025 · 4:52 PM UTC

TimDarcet @TimDarcet

27 Apr 2025

SIGBOVIK 2025 is out A live thread of papers that make me lightly smile: 1/ UPPERCASE IS ALL YOU NEED

218

30,468

TimDarcet · Jul 4, 2025 · 6:40 AM UTC

TimDarcet @TimDarcet

4 Jul 2025

Hey I'm a doctor now, neat

Andrei Bursuc @abursuc

2 Jul 2025

🚨New doctor in the house!🚨 Congrats to @TimDarcet for his tremendous work (DINOv2, registers, CAPI) & successful PhD defense followed by ~2 hrs of questions -- he's got stamina! Congrats to his incredible team of advisors from Inria & Meta: @julienmairal @p_bojanowski M. Oquab

201

11,601

TimDarcet · Jun 22, 2025 · 4:57 PM UTC

TimDarcet @TimDarcet

22 Jun 2025

In case there is any ambiguity: DINOv2 is 100% a product of dumb hill-climbing on ImageNet-1k knn accuracy (and linear too) Overfitting an eval can be bad. But sometimes the reward signal is reliable, and leads to truly good models. It's about finding a balance

samsja

@samsja19

19 Jun 2025

Replying to @lucasmaes_ @jxmnop

Oh I am a big fan of self supervised learning. Also ssl has never been benchmark maxing on imagenet afaik. I am mainly complaining about the supervised classification imagenet hill climb

199

25,862

TimDarcet · Aug 14, 2025 · 6:01 PM UTC

TimDarcet @TimDarcet

14 Aug 2025

hey we heard you liked dinov2 so we got you more of the same shit dinov3 is like dinov2 in the sense that it's much better than the things before rumor has it that plugging dinov3 on your benchmark is a low hanging sota but be quiet im not supposed to tell

Max Seitzer @maxseitzer

14 Aug 2025

Introducing DINOv3 🦕🦕🦕 A SotA-enabling vision foundation model, trained with pure self-supervised learning (SSL) at scale. High quality dense features, combining unprecedented semantic and geometric scene understanding. Three reasons why this matters…

197

16,119

TimDarcet · Mar 11, 2025 · 9:57 PM UTC

TimDarcet @TimDarcet

11 Mar 2025

The gaussian mixture fits MNIST in like 3 iterations and the fit is super good maybe EM GMM is all we needed after all

TimDarcet @TimDarcet

11 Mar 2025

lfg it's fitting

181

24,853

TimDarcet · Mar 28, 2024 · 4:10 PM UTC

TimDarcet @TimDarcet

28 Mar 2024

If you need a replacement for an example image in a CV paper, you know what to do

Michael P. Frank 💻🔜♻️

@MikePFrank

27 Mar 2024

Just FYI, computer vision papers submitted to IEEE that include this image of Ms. Forsén will no longer be considered for publication

161

14,249

TimDarcet · Sep 29, 2023 · 2:49 PM UTC

TimDarcet @TimDarcet

29 Sep 2023

Intriguing new property: on some images, the different registers naturally adopt a “slot attention-like” behavior, each attending to a different object! Needless to say, this was never required of the model (or even encouraged). Cool future research direction!

156

9,061

TimDarcet · Apr 20, 2024 · 9:03 AM UTC

TimDarcet @TimDarcet

20 Apr 2024

In case some of you were (like me) curious about this stat for AI conferences: here it is for ICLR2024

Guido Salvaneschi @guidosalva

17 Apr 2024

Statistics from @ICSE2024. Authors submitting, *each*, 33, 27, 24, ... papers. Interactive dashboard: app.powerbi.com/view?r=eyJrI…

149

370,464

TimDarcet · May 7, 2024 · 11:53 AM UTC

TimDarcet @TimDarcet

7 May 2024

ViT need registers got an outstanding paper award! Many thanks to the comittee for the honor

ICLR @iclr_conf

7 May 2024

Announcing the #ICLR2024 Outstanding Paper Awards: blog.iclr.cc/2024/05/06/iclr… Shoutout to the awards committee: @eunsolc, @katjahofmann, @liu_mingyu, @nanjiang_cs, @guennemann, @optiML, @tkipf, @CevherLIONS

149

16,374

TimDarcet · Nov 20, 2024 · 12:38 PM UTC

TimDarcet @TimDarcet

20 Nov 2024

Some people are still not using Fréchet DINOv2 distance?

Ethan

@torchcompiled

20 Nov 2024

Oh wow, FID is fragile...

137

13,666

TimDarcet · Apr 27, 2025 · 2:04 PM UTC

TimDarcet @TimDarcet

27 Apr 2025

I realized I have a strong opinion on experiment management that not everybody shares: when I launch an experiment, I want **zero** parameter in the commandline. **All** informations should be commited to the repo for full reproducibility The only command is `./<scriptname>.sh`

TimDarcet @TimDarcet

23 Apr 2025

Replying to @davnords

I've come to really not like submitit honestly In practice what I was doing in my codebase is just writing my own sbatch files, and every exp is a different script (which is commited to the repo)

135

16,019

TimDarcet · Sep 29, 2023 · 2:49 PM UTC

TimDarcet @TimDarcet

29 Sep 2023

Our hypothesis is: the model recognizes useless patches, discards the info in them, and uses them as 𝘢𝘨𝘨𝘳𝘦𝘨𝘢𝘵𝘰𝘳𝘴 𝘰𝘧 𝘨𝘭𝘰𝘣𝘢𝘭 𝘪𝘯𝘧𝘰𝘳𝘮𝘢𝘵𝘪𝘰𝘯.

126

12,455

TimDarcet · Mar 26, 2024 · 10:37 AM UTC

TimDarcet @TimDarcet

26 Mar 2024

Hey! If you are using DINOv2, whether in a startup, in research or whatever, could you send me a DM? I want your feedback on the model. Reward for you? Simple: next model is gonna be 𝘦𝘷𝘦𝘯 𝘮𝘰𝘳𝘦 suited to your needs 👌

127

91,523

TimDarcet · May 15, 2024 · 4:41 PM UTC

TimDarcet @TimDarcet

15 May 2024

Current state of neurips abstract submissions This neurips is gonna be crazy

Chandan Singh @csinva

15 May 2024

2024 update

119

116,463

TimDarcet · Apr 24, 2024 · 12:51 PM UTC

TimDarcet @TimDarcet

24 Apr 2024

With satellite imagery, it’s hard to get labels. Solution? DINOv2! WRI+Meta trained a satellite DINOv2 for tree height estimation. They created an interactive map of tree height of the whole globe (!) at 1-meter res (!): meta-forest-monitoring-okw37… Quizz: Can you recognize this city?

117

12,977

TimDarcet · Sep 29, 2023 · 2:49 PM UTC

TimDarcet @TimDarcet

29 Sep 2023

What I mean when I say “registers”: additional learnable tokens (like the [CLS]), but these ones are not used at output. No additional info at input, not used at output: these tokens could seem useless!

115

33,220

TimDarcet · Feb 8, 2025 · 9:39 AM UTC

TimDarcet @TimDarcet

8 Feb 2025

These visuals really highlight super well the differences between DINOv2 and CLIP: the latter has these text-induced abstractions that span across visual concepts, while the former has more advanced geometric concepts

Harry Thasarathan @HThasarathan

7 Feb 2025

Replying to @HThasarathan

Our method reveals model-specific patterns too: DinoV2 (left) shows specialized geometric features (depth, perspective), while SigLIP (right) captures unique text-aware visual concepts: This opens new paths for understanding model differences! (7/9)

119

8,472

TimDarcet · Apr 28, 2025 · 6:37 PM UTC

TimDarcet @TimDarcet

28 Apr 2025

Summary of "Massive activations in LLMs": - "artifact" tokens are in all transformers, ViTs and LLMs - their weirdness is ~only on 1 channel - they are the same as the quantization outliers - their purpose is *not* global information - there's a fix simpler than registers

Gabriele Berton

@gabriberton

28 Apr 2025

Replying to @gaur_manu

Could you give a summary for all the lazy readers who won't open the link?

114

66,050

TimDarcet · May 17, 2024 · 1:09 PM UTC

TimDarcet @TimDarcet

17 May 2024

echo "echo 'sleep 0.5' >> ~/.bashrc" >> ~/.bashrc

yobibyte @y0b1byte

17 May 2024

Every time a colleague of mine does not lock their laptop, I add something to their .bashrc. alias vim='nano' is a good one, but moving file to a random folder is even funnier. rm is too evil, don't do it!

ALT Arnold Schwarzenegger Smile GIF

109

15,610

TimDarcet · Dec 26, 2024 · 8:46 AM UTC

TimDarcet @TimDarcet

26 Dec 2024

Very happy to say DINOv2 got outstanding certification finalist at TMLR! The models had an amazing reception already, but this kind of award is the cherry on top 😁

Transactions on Machine Learning Research @TmlrOrg

19 Dec 2024

Replying to @TmlrOrg

Outstanding Finalist 2: “DINOv2: Learning Robust Visual Features without Supervision," by Maxime Oquab, Timothée Darcet (@TimDarcet), Théo Moutakanni (@TheoMoutakanni) et al. 5/n

113

9,532

TimDarcet · Jan 16, 2024 · 2:04 PM UTC

TimDarcet @TimDarcet

16 Jan 2024

ICLR results are out so its bragging time: ViT need reg got an oral and very good scores (top-15), so that's cool. Thanks a lot to the reviewers who found it good If you want to try a model with registers, we published some DINOv2 checkpoints earlier:

TimDarcet @TimDarcet

27 Oct 2023

105

11,125

TimDarcet · Apr 24, 2025 · 1:42 PM UTC

TimDarcet @TimDarcet

24 Apr 2025

I am once again asking you to use einops for this kind of operations

Gabriele Berton

@gabriberton

24 Apr 2025

Never thought I'd see a transpose with 8 numbers

109

4,357

TimDarcet · Dec 8, 2023 · 10:47 PM UTC

TimDarcet @TimDarcet

8 Dec 2023

PSA: when someone asks you a question including words such as "false positive rate", 𝗱𝗼 𝗻𝗼𝘁 𝗮𝗻𝘀𝘄𝗲𝗿 𝗿𝗶𝗴𝗵𝘁 𝗮𝘄𝗮𝘆. Simply state that you know your rights, and go on wikipedia to consult the 𝔐𝔞𝔡 𝕮𝔬𝔫𝔣𝔲𝔰𝔦𝔬𝔫 𝔐𝔞𝔱𝔯𝔦𝔵 𝔬𝔣 𝕳𝔢𝔩𝔩

Jeremy Kauffman 🦔🌲🌕

@jeremykauffman

8 Dec 2023

Fewer than 1 in 5 doctors can correctly answer a basic question about statistics

100

16,877

TimDarcet · Sep 29, 2023 · 2:49 PM UTC

TimDarcet @TimDarcet

29 Sep 2023

But in fact, the model learns to use them. And they work quite well: a single register entirely fixes the attention maps, and gives a boost on downstream tasks. Adding more further increases the scores a bit. We improve upon DINOv2, which was already quite stronk 💪

102

7,564

TimDarcet · Feb 14, 2025 · 5:58 PM UTC

TimDarcet @TimDarcet

14 Feb 2025

Worth mentioning that this clustering idea does not come from nowhere: The iBOT head comes from the DINO head, which comes from the SwAV prototypes, which is an online version of the DeepCluster clustering We've been doing clustering all along

TimDarcet @TimDarcet

14 Feb 2025

Replying to @TimDarcet

2. Loss? “DINO head”: good results, too unstable Idea: preds and targets have diff. distribs, so EMA head does not work on targets → need to separate the 2 heads So we just use a clustering on the target side instead, and it works

101

9,522

TimDarcet · Mar 14, 2025 · 7:41 PM UTC

TimDarcet @TimDarcet

14 Mar 2025

Re generative NN: To bypass the intractable log-likelyhood, you can either: - optimize the wrong objective and hope it will work (GAN/VAE/diffusion) - use a NN that you can invert (surprisingly easy) Is this right?? Is there a downside to coupling flows?? Do they work??

100

22,015

TimDarcet · Sep 29, 2023 · 2:49 PM UTC

TimDarcet @TimDarcet

29 Sep 2023

Do check out the paper! It’s got much more detail than I can give here. Thanks to Maxime Oquab, Julien Mairal and Piotr Bojanowski who were patient enough to work with me, and competent enough to compensate for my mistakes 😅. arxiv.org/abs/2309.16588

7,441

TimDarcet · Jun 4, 2024 · 1:54 PM UTC

TimDarcet @TimDarcet

4 Jun 2024

fuck your fancy personal page template im rawdoggin the html and you wont even make me use css

27,693

TimDarcet · Apr 21, 2024 · 6:18 PM UTC

TimDarcet @TimDarcet

21 Apr 2024

Actually the accept rate decreases monotonically with number of 1st author submissions: the more prolific the first author is, the lower the quality of their paper.

Jon Barron

@jon_barron

21 Apr 2024

The acceptance rate among aspiring ICLR2024 first authors who submitted >= 4 papers was 15%! Contrast that with the base acceptance rate that year: 30.5%. Unsettling.

64,684

TimDarcet · Apr 27, 2025 · 2:12 PM UTC

TimDarcet @TimDarcet

27 Apr 2025

It's wild how few major modifications to the Transformer architecture took off. Most improvement papers stayed under the radar. Makes me wonder how many niche innovations actually work, but got lost because of community momentum

François Fleuret

@francoisfleuret

27 Apr 2025

As expected, that was popular. Here is my attempt at consolidating all the answers into a list. - Prenorm: normalization in the residual blocks before the attention operation and the FFN respectively - GQA (Group Query Attention): more Q than (K, V)

8,163

TimDarcet · Sep 29, 2023 · 2:49 PM UTC

TimDarcet @TimDarcet

29 Sep 2023

“Fine with me if you need global aggregators, but please don’t do this in my feature maps. I need those for downstream tasks! Here, have a few registers instead” - historical reconstruction of how it happened

8,374

TimDarcet · Feb 14, 2025 · 6:09 PM UTC

TimDarcet @TimDarcet

14 Feb 2025

Also of note: the repo contains a single-file fully standalone implem of FSDP in 500 LOC by the goat @fvsmassa . There's 0 guarantee associated with it, but if you want an understable implem of FSDP it's a good one (and no it's not slow I've got 58% MFU lfg)

samsja

@samsja19

14 Feb 2025

Replying to @TimDarcet

nice work !! Did you roll out your own fsdp implementation? github.com/facebookresearch/…

16,322

TimDarcet · Sep 29, 2023 · 2:49 PM UTC

TimDarcet @TimDarcet

29 Sep 2023

This starts with a very simple observation: ~all ViTs have attention maps focused on a few seemingly random patches. DINO has clean attention maps, sure, but then why did the artifacts reappear in DINOv2? What 𝘢𝘳𝘦 these artifacts?

12,006

TimDarcet · Jul 17, 2024 · 9:27 PM UTC

TimDarcet @TimDarcet

17 Jul 2024

Okay this uiua thing is actually pretty fun

ludwig

@ludwigABAP

6 Jun 2024

uiua goes unbelievably hard wtf array-orientated, stack based, glyph programming language and now I wanna make the game of life in it this weekend

18,384

TimDarcet · Feb 28, 2024 · 4:55 PM UTC

TimDarcet @TimDarcet

28 Feb 2024

Hey guys quick update vision transformers don't need registers after all brb gotta test some stuff

Zhuang Liu

@liuzhuang1234

28 Feb 2024

LLMs are great, but their internals are less explored. I'm excited to share very interesting findings in paper “Massive Activations in Large Language Models” LLMs have very few internal activations with drastically outsized magnitudes, e.g., 100,000x larger than others. (1/n)

15,025

TimDarcet · Jun 4, 2024 · 1:57 PM UTC

TimDarcet @TimDarcet

4 Jun 2024

You may not like it, but this is what peak personal page looks like

TimDarcet @TimDarcet

4 Jun 2024

fuck your fancy personal page template im rawdoggin the html and you wont even make me use css

13,901

TimDarcet · Oct 18, 2024 · 4:12 PM UTC

TimDarcet @TimDarcet

18 Oct 2024

GitHub - facebookresearch/lingua: Meta Lingua: a lean, efficient, and easy-to-hack codebase to...

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs. - facebookresearch/lingua

github.com

AI at Meta

@AIatMeta

18 Oct 2024

Open science is how we continue to push technology forward and today at Meta FAIR we’re sharing eight new AI research artifacts including new models, datasets and code to inspire innovation in the community. More in the video from @jpineau1. This work is another important step towards our goal of achieving Advanced Machine Intelligence (AMI). What we’re releasing: • Meta Spirit LM: An open source language model for seamless speech and text integration. • Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. Plus a new developer suite to make it easier for developers to build with SAM 2. • Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance. • SALSA: New code to enable researchers to benchmark AI-based attacks in support of validating security for post-quantum cryptography. • Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale. • Meta Open Materials: New open source models and the largest dataset of its kind to accelerate AI-driven discovery of new inorganic materials. • MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder with coverage across 80 languages. • Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations. Access to state-of-the-art AI creates opportunities for everyone. We’re excited to share this work and look forward to seeing the community innovation that results from it. Details and access to everything released by FAIR today ➡️ go.fb.me/hgtkel

59,325

TimDarcet · Sep 29, 2023 · 4:20 PM UTC

TimDarcet @TimDarcet

29 Sep 2023

Thanks @_akhaliq and @arankomatsuzaki for featuring our paper! It's great to see it 1st on the trending list on HF papers 😁 huggingface.co/papers

@_akhaliq

29 Sep 2023

Vision Transformers Need Registers paper page: huggingface.co/papers/2309.1… Transformers have recently emerged as a powerful tool for learning visual representations. In this paper, we identify and characterize artifacts in feature maps of both supervised and self-supervised ViT networks. The artifacts correspond to high-norm tokens appearing during inference primarily in low-informative background areas of images, that are repurposed for internal computations. We propose a simple yet effective solution based on providing additional tokens to the input sequence of the Vision Transformer to fill that role. We show that this solution fixes that problem entirely for both supervised and self-supervised models, sets a new state of the art for self-supervised visual models on dense visual prediction tasks, enables object discovery methods with larger models, and most importantly leads to smoother feature maps and attention maps for downstream visual processing.

22,626

TimDarcet · Sep 29, 2023 · 2:49 PM UTC

TimDarcet @TimDarcet

29 Sep 2023

We find a few properties of these artifacts. 1. They appear on patches with useless information (redundant to their neighbors). 2. They contain little information about the original patch. It “forgot” its original value!

9,512

TimDarcet · Feb 1, 2024 · 1:10 PM UTC

TimDarcet @TimDarcet

1 Feb 2024

Happy to share that DINOv2 was accepted at TMLR! A special thanks to the reviewers and action editor. I found the review process to be actually pleasant and constructive. I believe that right now TMLR is possibly the best place to publish in ML

Accepted papers at TMLR @TmlrPub

22 Jan 2024

DINOv2: Learning Robust Visual Features without Supervision Maxime Oquab, Timothée Darcet, Théo Moutakanni et al.. Action editor: Abhishek Kumar. openreview.net/forum?id=a68S… #supervised #visual #features

8,090

TimDarcet · Apr 28, 2025 · 6:46 AM UTC

TimDarcet @TimDarcet

28 Apr 2025

I also view layernorm as hyperplane proj + hypersphere proj Hyperplane proj makes no sense, hence we do RMSnorm now Although don't forget the epsilon. We project onto the hyper*ball* actually

Saurabh Kumar

@drummatick

27 Apr 2025

Absolutely gold article. Changed the way I see Layer Norm

3,292

TimDarcet · Feb 3, 2025 · 6:59 PM UTC

TimDarcet @TimDarcet

3 Feb 2025

Llamadrama being discussed in public?

Yann LeCun

@ylecun

2 Feb 2025

Replying to @RawSucces @DAcemogluMIT

You misread. There had been multiple LLM projects within FAIR for years. Some were open sourced as research prototypes (e.g. OPT175B, Galactica, BlenderBot...). In mid-2022, FAIR started a large LLM project called Zetta, which was still going in late 2022 when ChatGPT came out. A small group at FAIR-Paris was working on theorem proving. They needed an LLM for their own purpose and thought Zetta was too big and not ready. They developed their own model, which eventually became Llama-1. What happened internally between Zetta and Llama is somewhat similar to what just happened between DeepSeek and the big US players: a small team of talented folks innovated and beat the large teams.

10,152

TimDarcet · Sep 29, 2023 · 2:49 PM UTC

TimDarcet @TimDarcet

29 Sep 2023

On the other hand, the output tokens seem to contain 𝗹𝗼𝘁𝘀 of global information. We probe on a few different classification datasets. We find that these tokens contain much more class information than other patch tokens, and almost as much as the [CLS]!

8,751

TimDarcet · Mar 17, 2025 · 2:19 PM UTC

TimDarcet @TimDarcet

17 Mar 2025

Guess what model Depth Anything V2 is based on? 🦖🦖🦖 (yes, I only have one tune. No, I won't stop)

Super Real Name @DebatableChild

16 Mar 2025

Replying to @giffmana

Depth anything V2 one shots this problem btw. All it requires is an algorithm to create a coherent world via past imagery and depth calculations.

9,667

TimDarcet · Feb 14, 2025 · 1:17 PM UTC

TimDarcet @TimDarcet

14 Feb 2025

ArXiv: arxiv.org/abs/2502.08769 Github: github.com/facebookresearch/… Detailed video explanation: piped.video/watch?v=dQw4w9Wg…

3,109

TimDarcet · Jan 18, 2025 · 7:58 PM UTC

TimDarcet @TimDarcet

18 Jan 2025

Object counting is a surprinsingly unsolved problem so far, especially in terms of foundation models. AFAIK DINOv2 and CLIP-style models fail pretty hard. Of course, the VLMs on top can't do better than the encoder so they also fail there. One of the remaining things to solve

Jeremy Nguyen ✍🏼 🚢

@JeremyNguyenPhD

18 Jan 2025

What's a vision model I can use to count toy pieces like this? GPT-4o tells me things like, "um, about 20", or counts incorrectly. Bonus points if it's easy to use via API and can work with plain english prompts

5,857

TimDarcet · Aug 31, 2023 · 1:18 PM UTC

TimDarcet @TimDarcet

31 Aug 2023

Do try out the new depth estimation parallax view, it's trippy

104,918

TimDarcet · May 3, 2024 · 8:48 AM UTC

TimDarcet @TimDarcet

3 May 2024

Thanks to DINO's nice attention maps, the model's behavior is quite interpretable! That's really cool

TimDarcet @TimDarcet

3 May 2024

Another banger by @TheoMoutakanni : RayDINO, a DINO for chest X-ray. Excellent results on a ton of benchmarks with the frozen model, with great generalization and low bias. Check it out! arxiv.org/abs/2405.01469

20,492

TimDarcet · Apr 21, 2023 · 3:31 PM UTC

TimDarcet @TimDarcet

21 Apr 2023

6/ With these capabilities emerge new interesting properties. A very nice one is the ability to perform semantic keypoint matching between images simply by matching the closest features. This works across very different domains !

8,599

TimDarcet · Mar 11, 2025 · 6:49 PM UTC

TimDarcet @TimDarcet

11 Mar 2025

lfg it's fitting

TimDarcet @TimDarcet

11 Mar 2025

Damn expectation-maximisation of a GMM got hands (it's the easiest algo in stat learning im just bad)

29,506

TimDarcet · Apr 21, 2023 · 3:31 PM UTC

TimDarcet @TimDarcet

21 Apr 2023

2/ As opposed to other recent SSL works, the goal is to provide vision encoders that work off-the-shelf, without any fine-tuning. In this setup, we improve significantly over previous SSL works, and even match or surpass CLIP-type models on a variety of tasks

38,177

TimDarcet · Nov 28, 2023 · 12:39 PM UTC

TimDarcet @TimDarcet

28 Nov 2023

Published my first paper, and my second one. I like them. I used to feel anxious about not being able to publish anything. It's getting better.

Jacy, LPC | the TRAUMA queen

@ATMwithJacy

27 Nov 2023

BRAG ABOUT SOMETHING YOU’RE PROUD OF ACCOMPLISHING IN 2023 ✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨

14,334

TimDarcet · Sep 28, 2024 · 10:17 PM UTC

TimDarcet @TimDarcet

28 Sep 2024

Hey everyone, will be in Milan for ECCV until Thursday Always up for a chat! Also I'm gonna need a job after my PhD ends in a few month so if you have some opportunities I'm interested 😇

4,289

TimDarcet · Sep 9, 2024 · 5:51 AM UTC

TimDarcet @TimDarcet

9 Sep 2024

Replying to @giffmana

To my knowledge it's common to just ditch the torch scheduler, use an array of learning rates, and at each iter do something equivalent to `optim.set_lr(lrs[iter])` Eg github.com/facebookresearch/…

dino/main_dino.py at 7c446df5b9f45747937fb0d72314eb9f7b66930a · facebookresearch/dino

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO - facebookresearch/dino

github.com

2,988

TimDarcet · Mar 15, 2025 · 6:15 PM UTC

TimDarcet @TimDarcet

15 Mar 2025

ctrl+enter, run an epoch scroll twitter for 30s ctrl+enter, run another epoch scroll twitter for 30s the loss is not going down any more, divide lr by 3 ctrl+enter, run another epoch ...

5,919

TimDarcet · Jul 24, 2024 · 3:39 PM UTC

TimDarcet @TimDarcet

24 Jul 2024

Lmao they waited for the 405B release just to be able to 1-up it

Mistral AI

@MistralAI

24 Jul 2024

mistral.ai/news/mistral-larg…

4,436

TimDarcet · Apr 27, 2025 · 2:56 PM UTC

TimDarcet @TimDarcet

27 Apr 2025

Apple would be no.1 >>>all if they had just looked into LLMs for Siri 5 years ago Biggest blunder in the field so far

finbarr

@finbarrtimbers

26 Apr 2025

I don’t get why Apple hasn’t incorporated Whisper. They should be funding a team of, say, 5 researchers systematically iterating on it. Apple should have the world’s best voice-to-text and TTS models.

2,930

TimDarcet · Feb 14, 2025 · 1:17 PM UTC

TimDarcet @TimDarcet

14 Feb 2025

As always thank you to all the people who helped me, @BaldassarreFe , Maxime Oquab, @julienmairal , @p_bojanowski With this I’m wrapping up my PhD. An amazing journey, thanks to the excellent advisors and colleagues! And also I’m looking for a job lmao

ALT slop of a capybara graduating riding a dinosaur

3,122

TimDarcet · Mar 19, 2025 · 7:12 PM UTC

TimDarcet @TimDarcet

19 Mar 2025

Some plots are worth it just for the aesthetics

2,475

TimDarcet · May 3, 2024 · 8:44 AM UTC

TimDarcet @TimDarcet

3 May 2024

8,218

TimDarcet · Aug 22, 2025 · 8:50 AM UTC

TimDarcet @TimDarcet

22 Aug 2025

Mfw the shitpost with 7 people in the target audience gets not engagement

ALT Breaking Bad Bryan Cranston GIF

2,715

TimDarcet · Aug 19, 2025 · 3:05 AM UTC

TimDarcet @TimDarcet

19 Aug 2025

PLONKKK

Nicolas DUFOUR @nico_dufour

18 Aug 2025

🚀 DinoV3 just became the new go-to backbone for geoloc! It outperforms CLIP-like models (SigLip2, finetuned StreetCLIP)… and that’s shocking 🤯 Why? CLIP models have an innate advantage — they literally learn place names + images. DinoV3 doesn’t.

3,316

TimDarcet · Oct 4, 2023 · 6:45 AM UTC

TimDarcet @TimDarcet

4 Oct 2023

Quite a few people have been asking me "can registers work with LLMs?" Here is a paper that says yes !

Aran Komatsuzaki

@arankomatsuzaki

4 Oct 2023

Think before you speak: Training Language Models With Pause Tokens - Performing training and inference on LMs with a learnable pause token appended to the input prefix - Gains on 8 tasks, e,g, +18% on SQuAD arxiv.org/abs/2310.02226

5,294

TimDarcet · Nov 22, 2024 · 2:18 PM UTC

TimDarcet @TimDarcet

22 Nov 2024

AIMv2 looks great! When SSL and text-supervised training both work so well, it was inevitable that combining both would be a great idea Big congrats to @DonkeyShot21 @alaa_nouby @MustafaShukor1 and team!

Enrico Fini @DonkeyShot21

22 Nov 2024

We release AIMv2, the second iteration of the AIM family of large autoregressive vision encoders. This time we bring multimodality into the game 🔥 Paper: arxiv.org/abs/2411.14402 Repo: github.com/apple/ml-aim Model Gallery: huggingface.co/collections/a…

3,237

TimDarcet · Feb 14, 2025 · 1:17 PM UTC

TimDarcet @TimDarcet

14 Feb 2025

13,033

TimDarcet · Mar 26, 2024 · 2:58 PM UTC

TimDarcet @TimDarcet

26 Mar 2024

2,133

TimDarcet · Apr 29, 2025 · 8:01 PM UTC

TimDarcet @TimDarcet

29 Apr 2025

Replying to @JFPuget

Chinese room argument? IMO consciousness is super badly defined but it may be something that emerges at a system level. The same way a collection of individual cells are "conscious", a collection of "not conscious" elements (layers etc) might be conscious

53,231

TimDarcet · Oct 2, 2024 · 12:39 PM UTC

TimDarcet @TimDarcet

2 Oct 2024

Btw for those at ECCV come to the DINOv2 demo in the Meta stand @TheoMoutakanni is showing the literal "map of every tree" (spoiler it's pretty cool)

1,731

TimDarcet · Aug 31, 2023 · 1:10 PM UTC

TimDarcet @TimDarcet

31 Aug 2023

Big news on the DINOv2 side! - Apache2 license (commercial use) - Releasing the segmentation and depth heads - significantly updated demo, with keypoint matching! - New fairness evaluations on FACET

1,637

TimDarcet · May 11, 2024 · 9:53 AM UTC

TimDarcet @TimDarcet

11 May 2024

The viennese street artists are a different breed

4,435

TimDarcet · Feb 14, 2025 · 1:17 PM UTC

TimDarcet @TimDarcet

14 Feb 2025

Language modeling solved NLP. So vision people have tried masked image modeling (MIM). The issue? It’s 𝘩𝘢𝘳𝘥. BeiT/MAE are not great for representations. iBOT works well, but is too unstable to train without DINO. →Pure MIM lags behind DINOv2

4,334

TimDarcet · Apr 26, 2024 · 1:49 PM UTC

TimDarcet @TimDarcet

26 Apr 2024

The biggest step change in the DINOv2 project was a skillful yolo run by Maxime yoloing is a dangerous but powerful weapon

Jason Wei

@_jasonwei

25 Apr 2024

In AI research there is tremendous value in intuitions on what makes things work. In fact, this skill is what makes “yolo runs” successful, and can accelerate your team tremendously. However, there’s no track record on how good someone’s intuition is. A fun way to do this is “betting”, where researchers try to predict the results of an experiment, or whether an approach would ultimately be successful. When I was at Google Brain in 2022, I made a bet for what accuracy a 540B-parameter LLM would get on mate-in-one in chess after finetuning. I had great fun asking my friends to participate—their predictions ranged from 10% to 80% (I think it ended up being around 30%). I particularly enjoyed a few bets with @LiamFedus (now my manager at OpenAI). Back in the day when we were writing a paper on emergent abilities, we bet on whether he would be able to predict the final accuracy of a task based on the log-prob trends from smaller models, and I won that one. More recently, we had a bet on how much data would be needed for a model to reach a certain performance, and I lost that bet by an order of magnitude. It was a nice ego check for me. (Bro tip: if you bet a dinner, specify the price range before you lose) Having a track record holds you to be accountable for intuitions and helps you remember when you were wrong. The best researchers excite their peers about only a few things, and some of those things work well in a big way. You don’t want to be excited about everything, but then only a small portion of those things actually work. Finally, I think there is also a lot of value in correctly predicting that and research direction won't go well—these “negative bets” aren’t typically rewarded in today’s culture, but I believe there is a lot of value in saving your team time.

6,805

TimDarcet · Apr 13, 2024 · 9:36 AM UTC

TimDarcet @TimDarcet

13 Apr 2024

In case you haven't got it yet: google scholar pdf reader extension for chrome is a must chromewebstore.google.com/de…

3,773

TimDarcet · Mar 19, 2025 · 4:44 PM UTC

TimDarcet @TimDarcet

19 Mar 2025

Highly recommend scholar inbox it's the highest SNR for papers that fit your tastes / topic

rami

@rami_mmo

19 Mar 2025

Replying to @cloneofsimo @jm_alexia

checkout scholar-inbox.com

1,759

TimDarcet · Apr 27, 2025 · 4:52 PM UTC

TimDarcet @TimDarcet

27 Apr 2025

Dropout: A Simple Way to Prevent Neurons from Depression

1,489

TimDarcet · Feb 1, 2024 · 12:51 PM UTC

TimDarcet @TimDarcet

1 Feb 2024

Next week I'll be talking about registers, what they are and why we need them, at Cohere for AI! More info: cohere.com/events/c4ai-Timot…

Cohere Labs

@Cohere_Labs

31 Jan 2024

Next week on Wednesday, February 7th, our Geo-Regional Asia Group is excited to welcome Timothée Darcet, PhD student, building large vision models at @Meta AI (FAIR) & @Inria to present "Vision Transformers need Registers." Learn more: cohere.com/events/c4ai-Timot…

4,606

TimDarcet · Aug 3, 2023 · 6:38 AM UTC

TimDarcet @TimDarcet

3 Aug 2023

Very clear and simple tutorial on how to use DINOv2 as an image featurizer. Check it out !

Niels Rogge @NielsRogge

31 Jul 2023

DINOv2, a SOTA ViT trained by @Meta on 142 million images, is now part of 🤗 Transformers! It's one of the strongest vision backbones at the moment, so I created a tutorial on training a linear classifier on top of it for semantic segmentation, using DINOv2's frozen features 1/2

3,644

TimDarcet · Feb 14, 2025 · 1:17 PM UTC

TimDarcet @TimDarcet

14 Feb 2025

Code and weights are Apache2 so don’t hesitate to try it out! If you have torch you can load the models in a single line anywhere The repo is a flat folder of like 10 files, it should be pretty readable

2,571

TimDarcet · May 14, 2024 · 6:44 PM UTC

TimDarcet @TimDarcet

14 May 2024

Replying to @torchcompiled @Ethan_smith_20

Contrastive loss in general push the model to use the whole space In DINOv2 we used the specific KoLeo loss, which pushes the embedding distribution towards higher entropy Higher entropy --> uniform distribution (on the hypersphere) --> full usage of the space

1,117

TimDarcet · Apr 21, 2024 · 7:50 AM UTC

TimDarcet @TimDarcet

21 Apr 2024

Okay caveat of my last post: maybe those are all middle authorship? Let's look at the same plot but only for _first_ and _last_ authors. First authors: (1/2)

TimDarcet @TimDarcet

20 Apr 2024

In case some of you were (like me) curious about this stat for AI conferences: here it is for ICLR2024

104,321

TimDarcet · Nov 18, 2023 · 12:59 AM UTC

TimDarcet @TimDarcet

18 Nov 2023

Wait till they hear about selective checkpointing github.com/facebookresearch/…

Thomas Capelle @capetorch

17 Nov 2023

Gradient Checkpointing is the single most effective way of reducing GPU memory footprint. This thing is fantastic! Am I missing something, or is it that good?

15,243

TimDarcet · Jun 5, 2024 · 6:51 AM UTC

TimDarcet @TimDarcet

5 Jun 2024

Always check the image normalization! It can completely change results. eg CLIP uses its own specific norm, and openclip uses either the CLIP values or the inception values depending on the model. When in doubt, often you can check in timm

Gabriele Berton

@gabriberton

3 Jun 2024

Replying to @gabriberton

Notable models that use non-imagenet norm are Dust3r, OpenIBL, many image matching models, and some (many?) remote sensing models. This is an issue when you create a fair codebase to benchmark multiple models (where ideally you can simply swap the model to compute the results).

2,931

TimDarcet · Feb 14, 2025 · 1:17 PM UTC

TimDarcet @TimDarcet

14 Feb 2025

Qualitatively the features are pretty good imo DINOv2+reg still has artifacts (despite my best efforts), while the MAE features are mostly color, not semantics (see shadow in the first image, or rightmost legs in the second) CAPI has both semantic and smooth feature maps

2,506

TimDarcet · May 14, 2025 · 11:22 AM UTC

TimDarcet @TimDarcet

14 May 2025

So the reason I was asking about this is because the squared L2 has the very pleasant property of reducing to "just push away from the avg" and that would eliminate all batch size issues (you an use an EMA avg) It's basically what DINO does, w/ softmax+CE loss instead of L2

TimDarcet @TimDarcet

11 May 2025

2,567

TimDarcet · Feb 14, 2025 · 1:17 PM UTC

TimDarcet @TimDarcet

14 Feb 2025

Let’s dissect a bit the anatomy of a mask image model. 1. take an image, convert its patches to representations. 2. given part of this image, train a model to predict the content of the missing parts 3. measure a loss between pred and target

6,443