Amir Bar · Jun 11, 2019 · 2:39 AM UTC

Amir Bar

Amir Bar

@_amirbar

11 Jun 2019

(1/2) New CVPR paper on speech-to-gesture prediction! Human speech is often accompanied by hand and arm gestures. Given audio speech input, we generate plausible gestures to go along with the sound and synthesize a corresponding video of the speaker.

281

835

Amir Bar · Sep 2, 2022 · 5:48 PM UTC

Amir Bar

@_amirbar

2 Sep 2022

📢 New paper alert! How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by #prompting in NLP, our new paper investigates Visual Prompting. (1/5)

106

693

Amir Bar · Oct 23, 2024 · 7:58 PM UTC

Amir Bar

@_amirbar

23 Oct 2024

model = torch.compile(model) is magic. With only one line of code, I get ~40% speed up per training iteration.

465

87,786

Amir Bar · Apr 29, 2024 · 4:12 PM UTC

Amir Bar

@_amirbar

29 Apr 2024

Animals are intelligent agents that plan and act to accomplish complex goals. Can we try learning from them? We present EgoPet, a new ego centric video dataset of animals scraped from YouTube and TikTok.

407

182,430

Amir Bar · Apr 2, 2025 · 3:52 PM UTC

Amir Bar

@_amirbar

2 Apr 2025

CLIP is arguably the leading pretraining paradigm in computer vision. In a new preprint, we show that vision-only SSL models trained on web data can match CLIP on VQA tasks, despite not using language. Paper: arxiv.org/abs/2504.01017 Project Page: davidfan.io/webssl/

310

19,297

Amir Bar · Jul 29, 2024 · 9:13 PM UTC

Amir Bar

@_amirbar

29 Jul 2024

Life update: Wrapping up my PhD and graduating in two weeks from @TelAvivUni and @berkeley_ai! Next up: moving to NYC to start a postdoc at @AIatMeta, where i will be working with @ylecun. 🚀 Also, looking to meet some new and old friends in NYC area, DM me :)

312

36,900

Amir Bar · Jun 13, 2025 · 9:34 PM UTC

Amir Bar

@_amirbar

13 Jun 2025

Navigation World Models won the Best Paper Honorable Mention Award at #CVPR2025 ☺️ It is my first postdoc paper since joining Yann's lab at @AIatMeta, so I am very excited. It was also extremely fun working with @GaoyueZhou, @dans_t123, @trevordarrell (and @ylecun) Fun story:

#CVPR2026 @CVPR

13 Jun 2025

Congratulations to the #CVPR2025 Honorable Mentions for Best Paper! @GoogleDeepMind, @UCBerkeley, @UMich, @AIatMeta, @nyuniversity, @berkeley_ai, #AllenInstituteforAI, @UW, #UniversityCollegeLondon, @UniversityLeeds, @ZJU_China, @NTUsg, @PKU1898, @Huawei Singapore Research Center

283

74,760

Amir Bar · Dec 5, 2024 · 5:22 PM UTC

Amir Bar

@_amirbar

5 Dec 2024

Happy to share our new work on Navigation World Models! 🔥🔥 Navigation is a fundamental skill of agents with visual-motor capabilities. We train a single World Model across multiple environments and diverse agent data. w/ @GaoyueZhou, Danny Tran, @trevordarrell and @ylecun.

276

83,570

Amir Bar · Aug 9, 2025 · 3:49 PM UTC

Amir Bar

@_amirbar

9 Aug 2025

a recipe to reproduce #Genie3: 1️⃣ collect a large egocentric video dataset and apply VGGT to get camera poses. Add more data from 3D reconstructed scenes. 2️⃣ train a Navigation World Model with long context → amirbar.net/nwm 3️⃣ distill to an efficient model for RT.

196

26,739

Amir Bar · Oct 25, 2025 · 4:22 PM UTC

Amir Bar

@_amirbar

25 Oct 2025

also- it is a distraction. long horizon planning in pixel space doesn’t make sense.

C. Zhang @ChongZzZhang

25 Oct 2025

On world model / egocentric visual dynamics model, also on building robotic simulation, also on building robotic genAI models: Being visually realistic doesn't mean being physically accurate and semantically correct.

151

42,766

Amir Bar · Jul 20, 2024 · 11:11 AM UTC

Amir Bar

@_amirbar

20 Jul 2024

#ICML2024 Flying to Vienna to present our paper "Stochastic Positional Embeddings Improve Masked Image Modeling" in @icmlconf. Masked Image Modeling is a popular SSL objective but scaling MIM might suffer due to appearance and location uncertainties. (1/n)

138

53,789

Amir Bar · Apr 2, 2025 · 6:41 PM UTC

Amir Bar

@_amirbar

2 Apr 2025

FAIR is probably the only lab outside of academia where research projects can start like this.

David Fan

@DavidJFan

2 Apr 2025

Replying to @DavidJFan

[7/8] This side project started in October when @TongPetersb, @_amirbar, and I were thinking about the rise of CLIP as a popular vision encoder for MLLMs. The community often assumes that language supervision is the primary reason for CLIP's strong performance. However, we realized that the pretraining data distribution and scale differ a lot. For example, CLIP models are often trained on billion-scale image-text pairs from the web, while SSL models are often trained on million-scale or hundred million-scale data from an ImageNet-like distribution. Thus, we really need apples-to-apples comparisons to study this question. We hope that our work will inspire a return to more controlled experimentation whenever possible!

111

15,809

Amir Bar · Apr 10, 2025 · 7:57 PM UTC

Amir Bar

@_amirbar

10 Apr 2025

Excited to share that our paper on Navigation World Models was selected for an Oral presentation at CVPR! Code & models: github.com/facebookresearch/… huggingface.co/facebook/nwm

GitHub - facebookresearch/nwm: Official code for the CVPR 2025 paper "Navigation World Models".

Official code for the CVPR 2025 paper "Navigation World Models". - facebookresearch/nwm

github.com

Amir Bar

@_amirbar

5 Dec 2024

101

8,298

Amir Bar · Oct 19, 2025 · 6:48 PM UTC

Amir Bar

@_amirbar

19 Oct 2025

heading to #ICCV2025, anyone up for a ☕️? also, my team at FAIR has an internship opening on world modeling, planning, and their robotics applications. DM me if you’re interested.

103

12,639

Amir Bar · Feb 26, 2025 · 11:56 PM UTC

Amir Bar

@_amirbar

26 Feb 2025

Navigation World Models was accepted to #CVPR2025 🎉 Congrats to co-authors @GaoyueZhou, Danny Tran, @trevordarrell and @ylecun See you in Nashville!

Amir Bar

@_amirbar

5 Dec 2024

6,310

Amir Bar · Apr 10, 2024 · 8:20 PM UTC

Amir Bar

@_amirbar

10 Apr 2024

How does visual in-context learning work? We find "task vectors" -- latent activations that encode task-specific information and can guide the model to perform a desired task without providing any task examples. 🧵 (1/n)

Aran Komatsuzaki

@arankomatsuzaki

9 Apr 2024

Finding Visual Task Vectors Find task vectors, activations that encode task-specific information, which guide the model towards performing a task better than the original model w/o the need for input-output examples arxiv.org/abs/2404.05729

18,857

Amir Bar · Oct 26, 2025 · 3:17 PM UTC

Amir Bar

@_amirbar

26 Oct 2025

(1/2) say you want to plan your way back from honolulu to nyc using a WM in pixels- 1. the world is stochastic and partially observable. you can plan how to pack your suitcase and leave the hotel room, but 99% of your pixel-level plan afterwards is useless

Yossi Gandelsman

@YGandelsman

25 Oct 2025

Replying to @_amirbar

Why not?

53,806

Amir Bar · Jul 27, 2021 · 11:49 PM UTC

Amir Bar

@_amirbar

27 Jul 2021

Thanks for featuring our work @ak92501 :) Also, we've just released the code for running most of the experiments github.com/amirbar/DETReg.

GitHub - amirbar/DETReg: Official implementation of the CVPR 2022 paper "DETReg: Unsupervised...

Official implementation of the CVPR 2022 paper "DETReg: Unsupervised Pretraining with Region Priors for Object Detection". - amirbar/DETReg

github.com

@_akhaliq

9 Jun 2021

DETReg: Unsupervised Pretraining with Region Priors for Object Detection pdf: arxiv.org/pdf/2106.04550.pdf abs: arxiv.org/abs/2106.04550 project page: amirbar.net/detreg/ unsupervised pretraining approach for object detection with transformers using region priors

Amir Bar · Nov 17, 2025 · 2:22 PM UTC

Amir Bar

@_amirbar

17 Nov 2025

brace yourselves, CVPR preprints are dropping in 3, 2, 1…

10,883

Amir Bar · Jun 11, 2019 · 2:40 AM UTC

Amir Bar

@_amirbar

11 Jun 2019

(2/2) We also release our full dataset and will make the code available. Joint work with with @shiryginosar, Gefen Kohavi, Caroline Chan, Andrew Ownes and Jitendra Malik. For more details see project page: people.eecs.berkeley.edu/~sh…. @berkeley_ai @ZebraMedVision

Amir Bar · Oct 20, 2025 · 7:05 PM UTC

Amir Bar

@_amirbar

20 Oct 2025

Workshop on World Modeling @ ICCV 2025. Starting now!

12,828

Amir Bar · Dec 15, 2024 · 10:23 PM UTC

Amir Bar

@_amirbar

15 Dec 2024

to my chinese friends and collaborators-you are great, we love you 🫶

Mandi Zhao @ZhaoMandi

14 Dec 2024

How hard is it to Not be racist towards exactly One nationality at a public keynote talk at a top conference? - Apparently extremely (for an MIT prof too)

6,663

Amir Bar · Jun 9, 2025 · 6:23 PM UTC

Amir Bar

@_amirbar

9 Jun 2025

heading to Nashville to attend @CVPR tomorrow. looking forward to meeting old & new friends and chat about #WorldModels

3,352

Amir Bar · Jan 11, 2023 · 6:34 PM UTC

Amir Bar

@_amirbar

11 Jan 2023

Replying to @ylecun

And use the money collected to pay reviewers 😇

11,878

Amir Bar · Jun 9, 2021 · 5:55 PM UTC

Amir Bar

@_amirbar

9 Jun 2021

Excited to share DETREg, our new work on unsupervised pretraining for object detection with transformers. Compared to previous works, the key idea in DETReg is attempting to learn object detection in the unsupervised pre-training stage. @berkeley_ai , @TelAvivUni, @NVIDIAAI

Amir Bar · Jun 23, 2022 · 4:22 PM UTC

Amir Bar

@_amirbar

23 Jun 2022

If you are at CVPR, come visit our poster today! Thursday 2:30-5:00, poster 127.

Amir Bar

@_amirbar

9 Jun 2021

Amir Bar · May 2, 2025 · 7:26 PM UTC

Amir Bar

@_amirbar

2 May 2025

Need a strong feature extractor for your upcoming NeurIPS paper? we got you 😉

Peter Tong

@TongPetersb

24 Apr 2025

We are open-sourcing all the models in Web-SSL, from ViT-L to ViT-7B! It was super fun to train and play with these massive ViTs. Models: huggingface.co/collections/f… Github: github.com/facebookresearch/… Huge credit to @DavidJFan for putting these models together!

3,641

Amir Bar · Jul 9, 2024 · 3:53 PM UTC

Amir Bar

@_amirbar

9 Jul 2024

Mom, I’m on a podcast! Thanks for having me on your podcast, @samcharrington

The TWIML AI Podcast

@twimlai

9 Jul 2024

Today we're joined by @_amirbar, PhD candidate at @berkeley_ai and @TelAvivUni to discuss his research on visual-based learning and his paper, “EgoPet: Egomotion and Interaction Data from an Animal’s Perspective.” We dig into: 🔹Caption-based dataset limitations 🔹The ‘Learning problem’ in robotics 🔹Visual Interaction Prediction (VIP), Vision to Proprioception Prediction (VPP), Locomotion Prediction (LP), and more! 🎧 / 🎥 Listen or watch the full episode on our page: twimlai.com/go/692. 📖 CHAPTERS =============================== 00:00 - Introduction 02:36 - Research interests 09:42 - Research projects 20:02 - EgoPet 27:31 - EgoPet dataset 29:25 - Visual Interaction Prediction (VIP) vs object recognition 31:09 - Findings on the model performance trained on EgoPet dataset 32:29 - Benchmark tasks (VIP, VPP, LP) 37:50 - Future directions

2,838

Amir Bar · Apr 29, 2024 · 4:12 PM UTC

Amir Bar

@_amirbar

29 Apr 2024

Paper: arxiv.org/abs/2404.09991 Project Page: amirbar.net/egopet/ Data: github.com/bakhtiararya/EgoP… Code: github.com/DannyTran123/egop… Kudos to the authors Arya Bakhtiar, Danny Tran, @antoniloq, @jathushan, @ylecun, @amirgloberson and @trevordarrell.

72,449

Amir Bar · Feb 7, 2024 · 6:14 PM UTC

Amir Bar

@_amirbar

7 Feb 2024

We're organizing the 1st workshop on Prompting in Vision at #CVPR2024 in Seattle. We have an amazing line of speakers, and we accept 8-pages paper submissions. Stay tuned! Website: prompting-in-vision.github.i… OpenReview: openreview.net/group?id=thec…

3,133

Amir Bar · Jun 27, 2025 · 2:23 PM UTC

Amir Bar

@_amirbar

27 Jun 2025

Check out PEVA 🌎, our recent attempt to build a world model for human body control.

Yutong Bai

@YutongBAI1002

27 Jun 2025

What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to be annoyingly complex—in both the action and vision space—to even get close to real life. We did an initial attempt: Whole-Body Conditioned Egocentric Video Prediction. In collaboration with @dans_t123 , @_amirbar, @ylecun , @trevordarrell and @JitendraMalikCV. (For more details, check: arxiv.org/abs/2506.21552) What we did is very simple: Predict Egocentric Video from human Actions (PEVA) - Given the past video and a future action represented by relative 3D body pose, PEVA predicts how the world looks next—from the first-person view. By conditioning on kinematic pose trajectories, structured by the joint hierarchy of the body, it learns how physical actions shape perception.

2,771

Amir Bar · Nov 6, 2023 · 11:04 PM UTC

Amir Bar

@_amirbar

6 Nov 2023

Replying to @RepRashida

There was a cease fire on October 6th, before Hamas killed over 1400 innocent Israelis.

419

Amir Bar · Jan 30, 2025 · 7:01 AM UTC

Amir Bar

@_amirbar

30 Jan 2025

when the @CVPR rebuttal deadline collides with @icmlconf submission deadline… 😵‍💫☕

1,867

Amir Bar · Sep 2, 2022 · 5:48 PM UTC

Amir Bar

@_amirbar

2 Sep 2022

Paper link: arxiv.org/abs/2209.00647, We will release the dataset & code soon in the project page: yossigandelsman.github.io/vi… Joint work with collaborators @YGandelsman , @trevordarrell @amirgloberson and Alyosha Efros from @berkeley_ai & @TelAvivUni. (5/5)

Amir Bar · Jan 6, 2021 · 2:59 PM UTC

Amir Bar

@_amirbar

6 Jan 2021

Replying to @TomerUllman @GaryMarcus

Interestingly, compositionality & spatial understanding isn't just learned with *almost infinite data*. This suggests that there is room for inductive biases and more structured models.

Amir Bar · Dec 7, 2024 · 6:27 PM UTC

Amir Bar

@_amirbar

7 Dec 2024

(1/n) so why train a large world model in an end-to-end manner on large data for navigation indeed? navigation is a first step in a larger vision to build a model capable of simulating many tasks by planning in a single framework (e.g, think manipulation and more).

Andrew Davison @AjdDavison

7 Dec 2024

Devil's advocate mode on: Navigation World Models have existed for a long time... they're called maps! And there are plenty of good algorithms out there which enable robots to build them / render views from them / localise within them / use them for planning. #SLAM #SpatialAI :)

8,295

Amir Bar · Feb 19, 2024 · 6:25 PM UTC

Amir Bar

@_amirbar

19 Feb 2024

Working on prompting in computer vision? Consider submitting your paper to our #CVPR2024 workshop. Paper submission deadline: March 15th. Website: prompting-in-vision.github.i…

12,636

Amir Bar · Aug 18, 2025 · 2:03 PM UTC

Amir Bar

@_amirbar

18 Aug 2025

basically this - hunyuan-gamecraft.github.io/

Amir Bar

@_amirbar

9 Aug 2025

5,232

Amir Bar · Jan 3, 2025 · 4:07 PM UTC

Amir Bar

@_amirbar

3 Jan 2025

if you like torch.compile, wait till you hear about FlexAttention. if you're using attention masks (e.g, causal masking), expect another 30%-40% boost. best to test using the recent torch nightly.

Amir Bar

@_amirbar

23 Oct 2024

model = torch.compile(model) is magic. With only one line of code, I get ~40% speed up per training iteration.

4,049

Amir Bar · Feb 22, 2023 · 6:17 PM UTC

Amir Bar

@_amirbar

22 Feb 2023

Replying to @chaimlevinson

השלב הבא - להכפיף את למ״ס לגלית דיסטל. מה שווה הסטטיסטיקה אם אנחנו לא שולטים בה?

1,927

Amir Bar · Dec 5, 2023 · 11:07 PM UTC

Amir Bar

@_amirbar

5 Dec 2023

Introducing IMProv, our new multimodal prompting model trained on 200k Semantic Scholar figures & LAION 400m. Shows cool in-context learning capabilities for computer vision. Kudos to @Jerry_XU_Jiarui @YGandelsman @jw2yang4ai @JianfengGao0217 @trevordarrell @xiaolonw. 🧵👇

Xiaolong Wang

@xiaolonw

5 Dec 2023

Can a machine solve diverse computer vision tasks even on the ones it is not trained on? Introducing IMProv: It performs multimodal in-context learning for solving generic computer vision tasks. It formulates all tasks as an image inpainting problem. arxiv.org/abs/2312.01771

5,579

Amir Bar · Nov 29, 2024 · 5:18 AM UTC

Amir Bar

@_amirbar

29 Nov 2024

2025 is going to be the year of world models.

44,713

Amir Bar · Apr 29, 2024 · 4:12 PM UTC

Amir Bar

@_amirbar

29 Apr 2024

EgoPet contains 84 hours of video footage of mainly cats and dogs but also other exotic animals like turtles, eagles and snakes.

2,800

Amir Bar · Jun 30, 2020 · 4:07 PM UTC

Amir Bar

@_amirbar

30 Jun 2020

Our new work "Compositional Video Synthesis with Action Graphs" is out! We focus on goal-oriented #video #generation and introduce the new task of *Action Graph to Video*. Project page: roeiherz.github.io/AG2Video Abstract: arxiv.org/abs/2006.15327 @BAIR @NVIDIAAI @TAU @Nvidia

Amir Bar · Oct 26, 2025 · 3:17 PM UTC

Amir Bar

@_amirbar

26 Oct 2025

(2/2) 2. you’d need to generate hours of video for a single plan, which is highly inefficient. so you must lift the level of abstraction from pixels.

2,666

Amir Bar · Jun 14, 2024 · 9:26 PM UTC

Amir Bar

@_amirbar

14 Jun 2024

Join us to the first workshop on Prompting in Vision at @CVPR on June 17th, starting on 9am. We have an amazing line of speakers, poster session, and a panel moderated by @trevordarrell.

Kaiyang Zhou @kaiyangzhou

14 Jun 2024

Looking for a good prompt? Join our workshop at #CVPR2024 on June 17! 🚀✨ @CVPR @_amirbar @liuziwei7 @YGandelsman @SharonYixuanLi @hyojinbahng @LINJIEFUN @amirgloberson @zhang_yuanhan @BoLi68567011 @JingkangY

5,801

Amir Bar · Apr 24, 2025 · 3:38 AM UTC

Amir Bar

@_amirbar

24 Apr 2025

Our code & pretrained models: github.com/facebookresearch/…

GitHub - facebookresearch/webssl: Code for "Scaling Language-Free Visual Representation Learning"...

Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL). - facebookresearch/webssl

github.com

Yann LeCun

@ylecun

3 Apr 2025

New paper from FAIR+NYU: Q: Is language supervision required to learn effective visual representations for multimodal tasks? A: No. ⬇️⬇️⬇️

1,625

Amir Bar · Jun 16, 2024 · 5:25 PM UTC

Amir Bar

@_amirbar

16 Jun 2024

Replying to @jon_barron

These are actually hard to catch as it requires collaboration between different organizing committees and a manual review. If major AI conferences converge to openreview this can probably be automated.

7,358

Amir Bar · Apr 29, 2024 · 4:12 PM UTC

Amir Bar

@_amirbar

29 Apr 2024

Together with the dataset, we provide a benchmark and evaluation suite of two in domain tasks (interaction prediction, locomotion prediction) and a downstream robotic task (quadruped vision to proprioception prediction).

2,573

Amir Bar · Dec 8, 2023 · 10:39 PM UTC

Amir Bar

@_amirbar

8 Dec 2023

Ping me if you're attending #NeurIPS2023 and want to grab some coffee or chat about visual prompting and large vision models! I'm also in the job market (graduating summer 2024).

992

Amir Bar · Apr 15, 2024 · 6:55 PM UTC

Amir Bar

@_amirbar

15 Apr 2024

Replying to @srush_nlp

One potential explanation why random masking does not scale: different sentences have different information density. Using constant 15% masking might not extend to larger and more diverse datasets. Next token prediction is just simpler.

4,679

Amir Bar · Apr 29, 2024 · 4:12 PM UTC

Amir Bar

@_amirbar

29 Apr 2024

We find that video models pre-training on EgoPet perform better on these tasks compared to video models trained on other larger ego datasets like Ego4D.

2,742

Amir Bar · May 2, 2024 · 6:49 PM UTC

Amir Bar

@_amirbar

2 May 2024

fun fact -- EgoPet is heavily inspired by @ylecun's take that we're still far from cat level intelligence.

Amir Bar

@_amirbar

29 Apr 2024

2,516

Amir Bar · Oct 31, 2024 · 6:17 PM UTC

Amir Bar

@_amirbar

31 Oct 2024

VLMs are the new cool kid, but what representations make instruction tuning and in-context learning work? TL;DR: No matter how you define the task (image examples, text examples, or instructions), VLMs convert it into a shared cross-modal task representation. More details 👇

Grace Luo @graceluo_

31 Oct 2024

In a new preprint, we show that VLMs can perform cross-modal tasks... ...since text ICL 📚, instructions 📋, and image ICL 🖼️ are compressed into similar task representations. See “Task Vectors are Cross-Modal”, work w/ @trevordarrell, @_amirbar. task-vectors-are-cross-modal…

2,857

Amir Bar · Apr 23, 2018 · 2:44 PM UTC

Amir Bar

@_amirbar

23 Apr 2018

Learning to segment organs in abdomen CT

Amir Bar · Mar 21, 2024 · 7:37 PM UTC

Amir Bar

@_amirbar

21 Mar 2024

Very cool work. A different way to think about it is via equivariance. Upsampling of the features should be consistent w.r.t augmentations T: F(T(x))=T(F(x))

@_akhaliq

19 Mar 2024

FeatUp A Model-Agnostic Framework for Features at Any Resolution Deep features are a cornerstone of computer vision research, capturing image semantics and enabling the community to solve downstream tasks even in the zero- or few-shot regime. However, these features

3,510

Amir Bar · Dec 17, 2023 · 9:02 PM UTC

Amir Bar

@_amirbar

17 Dec 2023

Enjoyed catching up with friends and colleagues at #NeurIPS2023! Wishing everyone safe travels.

810

Amir Bar · Aug 2, 2019 · 12:08 AM UTC

Amir Bar

@_amirbar

2 Aug 2019

Happy to share the code for our #CVPR paper "Learning Individual Styles of Conversational Gesture" github.com/amirbar/speech2ge… @shiryginosar

Amir Bar · Mar 12, 2024 · 3:23 AM UTC

Amir Bar

@_amirbar

12 Mar 2024

Deadline is coming up fast! Submit your prompting in vision paper by March 15th. @CVPR #CVPR2024

Amir Bar

@_amirbar

19 Feb 2024

Working on prompting in computer vision? Consider submitting your paper to our #CVPR2024 workshop. Paper submission deadline: March 15th. Website: prompting-in-vision.github.i…

6,570

Amir Bar · Oct 26, 2023 · 12:12 AM UTC

Amir Bar

@_amirbar

26 Oct 2023

Replying to @hasolidit

אני מזדהה עם התחושות שלך. אבל.. חשוב לי גם להגיד שבתור סטודנט בברקלי, מעוז השמאל הפרוגרסיבי, הפידבקים שאני מקבל מהפרופסורים ומסטודנטים לדוקטורט הוא של תמיכה בישראל. יש גם הפגנות נגדנו, בעיקר של סטודנטים לתואר ראשון (רובם ככל הנראה לא כזה יודעים על מה הם מפגינים).

1,464

Amir Bar · Sep 2, 2022 · 5:48 PM UTC

Amir Bar

@_amirbar

2 Sep 2022

Given input-output image example(s) of a new task a new input image, the goal is to produce the output image, consistent with the given examples. Posing this problem as simple image inpainting, literally just filling in a hole in a concatenated grid-like visual prompt image (2/5)

Amir Bar · May 14, 2025 · 7:10 AM UTC

Amir Bar

@_amirbar

14 May 2025

a NeurIPS 2025 nightmare ☠️

2,032

Amir Bar · Sep 2, 2022 · 5:48 PM UTC

Amir Bar

@_amirbar

2 Sep 2022

The secret ingredient to get this to work is the training data. To obtain image data that better resembles our visual prompts, we curated 88k unlabeled figures from paper sources on Arxiv. (3/5)

Amir Bar · Aug 9, 2025 · 3:49 PM UTC

Amir Bar

@_amirbar

9 Aug 2025

4️⃣ bonus: start from a strong base model like #Veo3.

1,996

Amir Bar · Dec 9, 2024 · 11:57 PM UTC

Amir Bar

@_amirbar

9 Dec 2024

Thanks for featuring our work! We’ll answer questions on alphaXiv throughout the week. alphaxiv.org/abs/2412.03572

alphaXiv

@askalphaxiv

8 Dec 2024

Replying to @askalphaxiv

Navigation World Models A controllable video generation model that predicts future visual observations for navigation tasks based on past observations and actions. Problem: Visual-motor agents struggle with planning flexible navigation trajectories, especially in dynamic or unfamiliar environments. Method: Introduces a Conditional Diffusion Transformer (CDiT) scaled to 1 billion parameters, trained on egocentric videos from human and robotic agents, enabling trajectory simulation and evaluation. Insights: Dynamic trajectory planning benefits from learned visual priors, allowing adaptation to new constraints and environments, even imagining trajectories in unseen spaces. Results: Achieves superior trajectory planning by simulating or ranking options, improving navigation performance in both familiar and unfamiliar environments. Author @_amirbar is on alphaXiv this week to discuss the paper!

1,996

Amir Bar · Jun 26, 2022 · 6:03 PM UTC

Amir Bar

@_amirbar

26 Jun 2022

Had an amazing time meeting all co-authors in person at #CVPR2022 ! @trevordarrell @GalChechik @amirgloberson @xinw_ai @colorado_reed @roeiherzig @vadimkantorov Anna Rohrbach.

Amir Bar · Aug 3, 2021 · 8:54 AM UTC

Amir Bar

@_amirbar

3 Aug 2021

The code for our paper Compositional Video Synthesis with Action Graphs is now available here: github.com/roeiherz/AG2Video ICML 2021 camera ready: arxiv.org/abs/2006.15327 project page: roeiherz.github.io/AG2Video/

GitHub - roeiherz/AG2Video: Code for "Compositional Video Synthesis with Action Graphs", Bar &...

Code for "Compositional Video Synthesis with Action Graphs", Bar & Herzig et al., ICML 2021 - roeiherz/AG2Video

github.com

Roei Herzig

@roeiherzig

30 Jun 2020

Our work *Compositional Video Synthesis with Action Graphs* is out! We introduce *Action Graphs*, a natural and convenient structure representing the dynamics of actions between objects over time and show we can synthesize goal-oriented videos on 2 datasets. #TAU #BAIR #NVIDIA

Amir Bar · Sep 2, 2022 · 5:48 PM UTC

Amir Bar

@_amirbar

2 Sep 2022

We then trained an MAE to predict the VQGAN tokens of randomly masked image patches. (4/5)

Amir Bar · Oct 3, 2024 · 1:25 PM UTC

Amir Bar

@_amirbar

3 Oct 2024

Come visit our EgoPet poster #92 today at 4:30pm! Unfortunately I couldn’t attend Milan, but say hi to the great @antoniloq !!!

Antonio Loquercio @antoniloq

3 Oct 2024

Today at 4:30p at #ECCV2024 in Milan, I'll present EgoPet, the first large collection of egocentric videos from animals' perspective! If you're curious about what we can learn from animals, come to poster #92! Project Website: amirbar.net/egopet/

1,082

Amir Bar · Dec 8, 2023 · 12:07 AM UTC

Amir Bar

@_amirbar

8 Dec 2023

We train a large 3B LLaMA-like transformer on over 50 computer vision datasets with the sole objective to predict the next VQGAN token. We visually prompt the model in test time and observe very interesting completions. Congrats to @YutongBAI1002 and @younggeng. 🧵👇

Yutong Bai

@YutongBAI1002

4 Dec 2023

How far can we go with vision alone? Excited to reveal our Large Vision Model! Trained with 420B tokens, effective scalability, and enabling new avenues in vision tasks! (1/N) Kudos to @younggeng @Karttikeya_m @_amirbar, @YuilleAlan Trevor Darrell @JitendraMalikCV Alyosha Efros!

1,260

Amir Bar · Oct 14, 2023 · 8:37 AM UTC

Amir Bar

@_amirbar

14 Oct 2023

Replying to @emilymbender

Burning babies, raping women, butchering innocent party goers, not to mention kidnapping over 100 civilians -- these are war crimes. Evacuating the civil population is a measure to protect them from being used as human shield for Hamas. But you know this.

380

Amir Bar · Dec 5, 2024 · 5:22 PM UTC

Amir Bar

@_amirbar

5 Dec 2024

For more information, see our paper and project page: Project Page: amirbar.net/nwm Preprint: arxiv.org/abs/2412.03572 work is a collaboration between @AIatMeta and @berkeley_ai 😊

1,577

Amir Bar · Aug 26, 2020 · 5:11 AM UTC

Amir Bar

@_amirbar

26 Aug 2020

#ECCV2020 Come visit our Q&A session! we even have a virtual poster :)

Amir Bar · Aug 8, 2023 · 1:48 PM UTC

Amir Bar

@_amirbar

8 Aug 2023

Replying to @chaimlevinson @shukisadeh

בתחקיר דובר על הכנסה בשחור ש*מולבנת* כנגד צ׳ק מהגמ״ח, לא על קצבת אברך.

412

Amir Bar · Jun 7, 2024 · 7:22 AM UTC

Amir Bar

@_amirbar

7 Jun 2024

Replying to @_bondit_

גם לי קרה לא מזמן, זה קטע? רופאת משפחה אמרה שלפי ה bmi אני צריך להוריד במשקל. ולא עזר להסביר שאני מתאמן קבוע במכון ואחוז שומן סבבה 🤦‍♂️

1,434

Amir Bar · Jun 17, 2019 · 5:33 PM UTC

Amir Bar

@_amirbar

17 Jun 2019

Very excited to share that we've received an FDA clearance for our intracranial hemorrhage triage algorithm! Trained on over 250,000 CT scans, we hope to augment radiologists in their day-to-day practice and ensure patients receives the best care. @ZebraMedVision

Amir Bar · Jun 14, 2024 · 9:07 PM UTC

Amir Bar

@_amirbar

14 Jun 2024

Our Large Vision Model (LVM) paper code and interactive demo are finally live here: github.com/ytongbai/LVM huggingface.co/spaces/Emma02…

GitHub - ytongbai/LVM

Contribute to ytongbai/LVM development by creating an account on GitHub.

github.com

Yutong Bai

@YutongBAI1002

4 Dec 2023

1,897

Amir Bar · Jan 6, 2024 · 5:17 AM UTC

Amir Bar

@_amirbar

6 Jan 2024

The papers I was assigned to review for #CVPR2024 feel distant from my expertise compared to #NeurIPS2023. Wondering if the matching system in CVPR is worse or if I've shifted away from being a vision person. 🤔

5,480

Amir Bar · Dec 19, 2024 · 3:54 AM UTC

Amir Bar

@_amirbar

19 Dec 2024

MetaMorph extends instruction tuning (e.g, LLaVA-like models) to image generation, showing very appealing results 👇

Zhuang Liu

@liuzhuang1234

19 Dec 2024

How far is an LLM from not only understanding but also generating visually? Not very far! Introducing MetaMorph---a multimodal understanding and generation model. In MetaMorph, understanding and generation benefit each other. Very moderate generation data is needed to elicit visual generation from an LLM, when trained jointly with visual understanding.

1,234

Amir Bar · Jan 8, 2019 · 10:07 AM UTC

Amir Bar

@_amirbar

8 Jan 2019

We explore the nature of I.V Contrast using deep learning methodologies, driven by data and guided by clinical insights. Joint work with Raouf Muhamedrahimov, Jonathan Laserson, Ayelet Akselrod-Ballin, Eldad Elnekave openreview.net/forum?id=SylD…

Amir Bar · Jun 13, 2025 · 9:34 PM UTC

Amir Bar

@_amirbar

13 Jun 2025

NWM intersects with multiple communities (video generation, 3D vision, robotics, RL, representation learning...) and it seemed to equally piss off everyone. I remember I told @GaoyueZhou and @dans_t123 - "lower expectations, this paper is a 99% reject".

3,668

Amir Bar · Apr 19, 2025 · 8:38 PM UTC

Amir Bar

@_amirbar

19 Apr 2025

WORLDMEM: Adding memory to world models

Zeqi Xiao @zeqi_xiao

18 Apr 2025

Thanks for sharing! @_akhaliq For more information: 📜ArXiv: arxiv.org/abs/2504.12369 🤗 Hugging Face: huggingface.co/papers/2504.1… 🌐 xizaoqu.github.io/worldmem/ 🧑‍💻 GitHub: github.com/xizaoqu/WorldMem 🚀 Demo: huggingface.co/spaces/yslan/…

849

Amir Bar · Apr 10, 2020 · 11:06 PM UTC

Amir Bar

@_amirbar

10 Apr 2020

(1/n) New work on Scene Graph to Image generation! We focus on input scenes that are more complex than those previously tackled and show improved performance on three different datasets: Visual Genome, COCO, CLEVR. pdf: arxiv.org/abs/1912.07414

Amir Bar · Jun 6, 2021 · 12:40 PM UTC

Amir Bar

@_amirbar

6 Jun 2021

Replying to @CSProfKGD

This was actually suggested in the ICCV reviewer guidelines:

Amir Bar · Nov 16, 2019 · 5:10 AM UTC

Amir Bar

@_amirbar

16 Nov 2019

Replying to @hardmaru

And overleaf didn't even crash..

Amir Bar · Oct 28, 2017 · 1:15 AM UTC

Amir Bar

@_amirbar

28 Oct 2017

Replying to @dlowd @ryan_p_adams

Following up your approach, I was able to achieve superior performance using cp -r test-data....

Amir Bar · Sep 13, 2023 · 6:27 PM UTC

Amir Bar

@_amirbar

13 Sep 2023

Replying to @jxmnop

Using static graphs that do not allow a convenient way to debug the code. By the time they moved to dynamic graphs it was too late.

859

Amir Bar · Dec 5, 2024 · 5:22 PM UTC

Amir Bar

@_amirbar

5 Dec 2024

Our Navigation World Model can simulate trajectories by generating video. This capability unlocks planning: simply find the sequence of actions that leads from the input image to a target goal. In unknown environments, our model can hallucinate navigation trajectories.

1,796

Amir Bar · Oct 31, 2024 · 6:31 PM UTC

Amir Bar

@_amirbar

31 Oct 2024

Interested in our work but busy doing the dishes? No problem! you can listen to the generated #notebooklm podcast, which is 95% accurate 😉 notebooklm.google.com/notebo…

Google NotebookLM | Your research and thinking partner, grounded in the information you trust

Use the power of AI for quick summarization and note taking, NotebookLM is your powerful virtual research assistant rooted in information you can trust.

notebooklm.google.com

Amir Bar

@_amirbar

31 Oct 2024

695

Amir Bar · Sep 25, 2024 · 10:54 PM UTC

Amir Bar

@_amirbar

25 Sep 2024

great example on how to stick to your research agenda despite temporary distractions.

Zhuang Liu

@liuzhuang1234

25 Sep 2024

Paper is rejected, but a followup paper that completely depends on the rejected paper is accepted #NeurIPS

1,044

Amir Bar · Jun 13, 2025 · 9:34 PM UTC

Amir Bar

@_amirbar

13 Jun 2025

(Un)surprisingly @ylecun didn't mind 😅 and so did @trevordarrell which was a bit reassuring. Anyway--it's nice to see the outcome is an award. if you're interested to hear more, come tomorrow (Sat) to Oral Session 4B (ExHall A2, 1:00-2:15) and visit poster #396 (Hall D, 5-7pm)

1,509

Amir Bar · Sep 22, 2023 · 5:57 AM UTC

Amir Bar

@_amirbar

22 Sep 2023

You’re nailing it @iclr_conf

ICLR @iclr_conf

15 Sep 2023

Replying to @qberthet

Just be free and live your best age.

1,465

Amir Bar · Jul 11, 2018 · 6:40 PM UTC

Amir Bar

@_amirbar

11 Jul 2018

Our first 510(k) for a deep learning based algorithm! businesswire.com/news/home/2…

Amir Bar · May 27, 2019 · 6:56 AM UTC

Amir Bar

@_amirbar

27 May 2019

Had a great time yesterday giving a talk on the challenges for AI in Radiology at the Hebrew University computer vision seminar. @CseHuji @ZebraMedVision.

Amir Bar · Oct 17, 2023 · 6:35 AM UTC

Amir Bar

@_amirbar

17 Oct 2023

Replying to @timnitGebru

Jews and Palestinians, whether they like it or not are semites. Closely related “cousins”. Anything else is your projection. Handcuffing, raping then burning your victims… the cruelty of Hamas does not exist in nature 💔

685

Amir Bar · Dec 5, 2024 · 6:11 PM UTC

Amir Bar

@_amirbar

5 Dec 2024

Replying to @_amirbar @AIatMeta

A lot of our work is also built on the work of others. Our Conditional Diffusion Transformer model (CDiT) extends DiT by @billpeeb and @sainingxie, and much of the data and inspiration is based on the works of our @berkeley_ai colleagues Noriaki Hirose and @shahdhruv_

1,674

Amir Bar · Oct 25, 2022 · 12:15 AM UTC

Amir Bar

@_amirbar

25 Oct 2022

Replying to @yoavgo

With transformers, NLP and CV have become "downstream tasks".

Amir Bar · Mar 26, 2023 · 8:50 PM UTC

Amir Bar

@_amirbar

26 Mar 2023

Replying to @avigrin10

חלאס עם ההתממות אבישי. חוקי נבצרות, מתנות, ביטול הפרת אמונים, חוק דרעי, פוליטיזציה של מינוי שופטים ועוד ועוד. שינוי כללי המשחק באופן קיצוני ללא הסכמה רחבה במדינה ללא חוקה וכשראש הממשלה תחת חקירה מפר את הסכם ניגוד העניינים שלו.

114

Amir Bar · Mar 6, 2023 · 2:25 AM UTC

Amir Bar

@_amirbar

6 Mar 2023

Replying to @ICCVConference

ALT Please Begging GIF

488

Amir Bar · Aug 18, 2025 · 9:41 PM UTC

Amir Bar

@_amirbar

18 Aug 2025

Replying to @bindureddy

🤣🤣🤣

121