Mike Shou · Dec 12, 2025 · 4:16 AM UTC

Mike Shou

Pinned Tweet

Mike Shou

@MikeShou1

12 Dec 2025

(1/6) X-Humanoid 🤖: Scaling up data for Humanoid Robots. We convert human daily activity videos (from Ego-Exo4D) into humanoid videos (i.e., Tesla Optimus) performing tasks like cooking or fixing a bike. This data can be potentially used to train robot policies and world models. 🔥 Project page: showlab.github.io/X-Humanoid… Paper link: arxiv.org/abs/2512.04537

417

87,883

Mike Shou · Aug 23, 2024 · 5:11 AM UTC

Mike Shou

@MikeShou1

23 Aug 2024

In early 2024, many people asked me whether go for AR or Diffusion? This pushes me to think about a deeper question - why AR and Diffusion conflict with each other? In theory, not really! This motivates us to explore how to integrate both into show-o, 1 single 1.3B LLM

@_akhaliq

23 Aug 2024

Show-o One Single Transformer to Unify Multimodal Understanding and Generation discuss: huggingface.co/papers/2408.1… We present a unified transformer, i.e., Show-o, that unifies multimodal understanding and generation. Unlike fully autoregressive models, Show-o unifies autoregressive and (discrete) diffusion modeling to adaptively handle inputs and outputs of various and mixed modalities. The unified model flexibly supports a wide range of vision-language tasks including visual question-answering, text-to-image generation, text-guided inpainting/extrapolation, and mixed-modality generation. Across various benchmarks, it demonstrates comparable or superior performance to existing individual models with an equivalent or larger number of parameters tailored for understanding or generation. This significantly highlights its potential as a next-generation foundation model.

440

148,428

Mike Shou · Dec 19, 2023 · 2:55 PM UTC

Mike Shou

@MikeShou1

19 Dec 2023

We are thrilled to release a 3-hours tutorial for beginners who would like to quickly get into Video Diffusion Models 😀👉 Check it out: piped.video/watch?v=0K56LA82…

325

42,416

Mike Shou · Aug 21, 2024 · 1:38 PM UTC

Mike Shou

@MikeShou1

21 Aug 2024

Very excited to have @JeffDean at NUS for the first time! Thanks for sharing Google’s perspective about multi-modal and video.

178

15,714

Mike Shou · Sep 2, 2024 · 1:22 PM UTC

Mike Shou

@MikeShou1

2 Sep 2024

Show-o update🔥: 1. We have released training codes on GitHub, including both pre-training and instruction tuning! 🔥 2. Add FlexAttention’s impl for great speed up. Thanks @cHHillee 🚀 github.com/showlab/Show-o/bl… 3. gradio demo up 🤗 huggingface.co/spaces/showla… Have fun!

Show-o/training/omni_attention.py at main · showlab/Show-o

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation. - showlab/Show-o

github.com

Mike Shou

@MikeShou1

23 Aug 2024

150

27,892

Mike Shou · Aug 22, 2023 · 9:01 AM UTC

Mike Shou

@MikeShou1

22 Aug 2023

This paper has been rejected by NeurIPS, ICLR, CVPR while finally accepted by ICCV.. Thanks to @jayzhangjiewu for never giving up and appreciate the help from a dozen of reviewers. TL;DR. A more realistic setting in Online Continual Video Object Detection with much less labels.

Mengmi Zhang @MengmiZhang

22 Aug 2023

ICCV paper in collaboration with Jay Zhangjie Wu from ShowLab @MikeShou1 is up! If you wanna know: how to use fast-slow complementary learning systems to continuously learn from video streams in a label-efficient manner. Check out our paper here: arxiv.org/pdf/2206.00309.pdf

119

27,513

Mike Shou · Sep 28, 2021 · 9:28 AM UTC

Mike Shou

@MikeShou1

28 Sep 2021

Wonder how to start your faculty career? grow in company? secure PhD offers? Check our #ICCV'21 workshop on Share Stories and Lessons Learned. As a warm-up, we have released some recorded talks from @dimadamen @deviparikh @xinshuoweng @zhoubolei and a few live talks upcoming!

Mike Shou · Apr 27, 2024 · 12:52 PM UTC

Mike Shou

@MikeShou1

27 Apr 2024

In the era of #Sora, it’s important to detect whether a video is AIGC and who is the owner. Happy to introduce RingID, a diffusion-based watermark identification method that can not only identify whether an image/video is generated or not, but also by who: arxiv.org/abs/2404.14055

11,741

Mike Shou · Apr 27, 2024 · 12:32 PM UTC

Mike Shou

@MikeShou1

27 Apr 2024

Honored and humbled to receive this award in my 3rd year at NUS. Time flies.. Many thanks to my mentors, students and collaborators 🙏

4,290

Mike Shou · Oct 7, 2023 · 11:15 AM UTC

Mike Shou

@MikeShou1

7 Oct 2023

Show Lab from National University of Singapore had a wonderful week at #ICCV2023 Merci @ICCVConference and goodbye Paris!

8,208

Mike Shou · Sep 29, 2024 · 8:49 AM UTC

Mike Shou

@MikeShou1

29 Sep 2024

Ciao🇮🇹 My first time attending #ECCV2024 and looking forward to catching up! I will be talking about our Show-o, one single transformer that unifies multimodal understanding and generation, at tomorrow 9am on Sep 30, room Amber 3, venue workshop.

3,660

Mike Shou · Sep 12, 2024 · 3:19 AM UTC

Mike Shou

@MikeShou1

12 Sep 2024

After the initial release, there are queries about where is diffusion? We are familiar with continuous diffusion like SD, while another type of diffusion is discrete. Today, we update our arxiv paper to add preliminaries about discrete diffusion🔥 Feedback would be appreciated :)

Mike Shou

@MikeShou1

23 Aug 2024

6,776

Mike Shou · Feb 6, 2023 · 8:49 AM UTC

Mike Shou

@MikeShou1

6 Feb 2023

Do you want to quickly turn pre-trained image models like #StableDiffusion and #DreamBooth into a text-to-video / GIF generation model? Excited to share our #TuneAVideo takes only 5-10mins on 1 A100 for training and is now open sourced: github.com/showlab/Tune-A-Vi… (~500 stars so far)

25,788

Mike Shou · Apr 7, 2024 · 9:08 AM UTC

Mike Shou

@MikeShou1

7 Apr 2024

Recently, I have heard of “World Model” many many times. But there seem to exist two major types of WM: (1/5) A more authentic and technical definition is from big figures like Jurgen and LeCun: future prediction + action + real world details below👇

10,609

Mike Shou · Aug 17, 2023 · 12:36 PM UTC

Mike Shou

@MikeShou1

17 Aug 2023

Happy to share our #ICCV2023 work by @weijiawu7 that turns stable diffusion into surprisedly good segmentation model for open-vocab words, like Eiffel Tower or Ultraman Paper: arxiv.org/abs/2303.11681 Code: weijiawu.github.io/Diffusion…

DiffuMask: Synthesizing Images with Pixel-level Annotations for...

Collecting and annotating images with pixel-wise labels is time-consuming and laborious. In contrast, synthetic data can be freely available using a generative model (e.g., DALL-E, Stable...

arxiv.org

weijia wu @weijiawu7

16 Aug 2023

"📚 Excited to share our work presented at ICCV23! We explored to leverage free cross-attention masks from Stable Diffusion for semantic segmentation labeling. The results not being very impressive, but intriguing. weijiawu.github.io/Diffusion… #ICCV23 #SemanticSegmentation"

8,500

Mike Shou · Jun 16, 2024 · 1:49 AM UTC

Mike Shou

@MikeShou1

16 Jun 2024

Sadly, I am unable to attend #CVPR2024🥹But ShowLabers are!🙌 Talk to @jayzhangjiewu @LingminR @Jia_Wei_LIU @YuchaoGu @Sierkinhane1 Eric on diffusion/generation and @ziteng_v @ZechenBai Stan on MLLM/understanding. BTW, our lab at #NUS has new PhD openings for *Jan 2025*😀

5,231

Mike Shou · Jun 25, 2021 · 8:31 AM UTC

Mike Shou

@MikeShou1

25 Jun 2021

How to go beyond 1s and make sense of a longer video? Join us in a few hours today (Jun 25 8am-12:30pm PST California time) at our @CVPR LOVEU workshop for the live talks and QA from our invited speakers and challenge winners! Agenda: sites.google.com/view/loveuc…

Mike Shou · Apr 7, 2024 · 4:43 AM UTC

Mike Shou

@MikeShou1

7 Apr 2024

#CVPR2024 oral/highlight decision was out. Some were already famous, but I always find some other "surprised" papers that share sth in common - no high numbers, no complex system/eng, but just a very interesting idea that excites and inspires me to keep thinking🤔

4,993

Mike Shou · Jul 4, 2022 · 5:10 AM UTC

Mike Shou

@MikeShou1

4 Jul 2022

Congrats to Benita, an undergraduate / Year-3 student for making her paper to ECCV, a top-tier AI conference! -- towards personal AI Assistant on smart glasses👀 vision-language, AR/VR, video. Still a lot of opportunities & large room for improvement: arxiv.org/abs/2203.04203

Mike Shou · Oct 12, 2023 · 10:33 AM UTC

Mike Shou

@MikeShou1

12 Oct 2023

Show-1 is finally open sourced! We marry the strengths of pixel-based and latent-based VDMs, producing high-quality videos of precise text-video alignment and high GPU memory efficiency during inference. Congrats to the awesooooome team @JunhaoZHANG19 @jayzhangjiewu @Jia_Wei_LIU

Jay Wu @jayzhangjiewu

12 Oct 2023

🔥 We're thrilled to announce the release of the code and model weights of Show-1! 🤖 💡 Unleash your creativity and embark on an exciting journey into the future of AI video generation with Show-1 now! 🚀 🔗 Code: github.com/showlab/Show-1; Model weights: huggingface.co/showlab

5,092

Mike Shou · Sep 1, 2023 · 8:59 AM UTC

Mike Shou

@MikeShou1

1 Sep 2023

Watching my student deliver his award-winning talk filled me with immense pride.. Congrats to @KevinQHLin for winning the PREMIA Best Paper Award 2023! Many thanks to the Pattern Recognition and Machine Intelligence Association for recognising Kevin’s EgoVLP work.

3,111

Mike Shou · Jul 3, 2024 · 7:10 AM UTC

Mike Shou

@MikeShou1

3 Jul 2024

List of recent papers for screen GUI automation with video-language-action model👇

Kevin Lin

@KevinQHLin

3 Jul 2024

💻We live in the digital era, where screens (PC/Phone) are integral to our lives. 🧐Curious about how AI assistant can help with computer tasks? 🚀Check out the latest progress in repo: github.com/showlab/Awesome-G… ✨A collection of the up-to-date GUI-related papers and resources.

ALT https://github.com/showlab/Awesome-GUI-Agent

3,825

Mike Shou · Oct 11, 2023 · 4:29 AM UTC

Mike Shou

@MikeShou1

11 Oct 2023

Code has been released as well: github.com/showlab/DatasetDM

GitHub - showlab/DatasetDM: [NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations...

[NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models - showlab/DatasetDM

github.com

weijia wu @weijiawu7

11 Oct 2023

Excited to introduce our paper at NeurIPS 2023: DatasetDM! 🚀(weijiawu.github.io/DatasetDM…) Our diffusion model produces an unlimited quantity of synthetic images, accompanied by perception annotations like depth, segmentation, and human pose. 🌐📊 #NeurIPS2023 #AI #Research

3,222

Mike Shou · Jul 14, 2023 · 11:35 PM UTC

Mike Shou

@MikeShou1

14 Jul 2023

Happy to introduce EgoVLPv2 to appear in #ICCV2023! Congrats to @Shramanpramani2 and the team. Project page: shramanpramanick.github.io/E…

Shraman Pramanick

@Shramanpramani2

14 Jul 2023

Thanks @_akhaliq for highlighting our work. EgoVLPv2 is accepted in #ICCV2023. Joint work with @PengchuanZ, @yalesong, @nagsayan112358, @KevinQHLin, @MikeShou1, and my advisor Prof. Rama Chellappa. More details on shramanpramanick.github.io/E…. @MetaAI @JohnsHopkins

4,189

Mike Shou · Aug 31, 2024 · 1:13 AM UTC

Mike Shou

@MikeShou1

31 Aug 2024

We have released code for RingID which safeguards diffusion models with multi-user watermarks github.com/showlab/RingID Looking forward to discussions at ECCV next month!

GitHub - showlab/RingID

Contribute to showlab/RingID development by creating an account on GitHub.

github.com

Mike Shou

@MikeShou1

27 Apr 2024

Replying to @MikeShou1

4,500

Mike Shou · Nov 3, 2023 · 11:13 AM UTC

Mike Shou

@MikeShou1

3 Nov 2023

Thanks to @RisingSayak from Huggingface to deliver a wonderful tutorial talk about diffusers at NUS. Nice to finally “see” the friend who we collaborated just online, and so happy for their great achievements!

1,759

Mike Shou · Sep 19, 2023 · 2:44 AM UTC

Mike Shou

@MikeShou1

19 Sep 2023

Congrats to @ZHHHYuan for getting his first paper accepted by IJCV 🙌 He proposes a highly parameter-efficient fine-tuning technique, which only needs to tune 0.11M additional parameters (0.13% of the full model). Paper: arxiv.org/abs/2309.08513 Code: github.com/showlab/SCT

Henry Henry Zhao

@ZHHHYuan

18 Sep 2023

Happy to share our #IJCV2023 work that proposes a straightforward technique for parameter-efficient fine-tuning. This work represents a remarkable reduction of 780 times in parameter costs compared to its full fine-tuning counterpart. Paper: arxiv.org/abs/2309.08513. @MikeShou1

1,229

Mike Shou · Jun 17, 2023 · 5:10 PM UTC

Mike Shou

@MikeShou1

17 Jun 2023

Hey @CVPR what is the wifi password? Thanks!

11,778

Mike Shou · Sep 15, 2022 · 4:16 PM UTC

Mike Shou

@MikeShou1

15 Sep 2022

Congrats to @KevinQHLin and co-authors! It's a nice start and we are keen to improve egocentric pre-training down the road -- welcome suggestions and further collaborations with the community!

Kevin Lin

@KevinQHLin

15 Sep 2022

I am excited to share that our work EgoVLP has been accepted to #NeurIPS2022!🎉The first paper in my first year of Ph.D.! Many thanks to my supervisor @MikeShou1 and collaborators: @dimadamen @mwray0 @mattia_soldan @BernardSGhanem Preprint: arxiv.org/abs/2206.01670 @NeurIPSConf

Mike Shou · Aug 23, 2024 · 5:32 AM UTC

Mike Shou

@MikeShou1

23 Aug 2024

Code and model have been open sourced at github.com/showlab/Show-o More features are on the way and we welcome collaborations.

GitHub - showlab/Show-o: [ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer...

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation. - showlab/Show-o

github.com

4,845

Mike Shou · Mar 29, 2024 · 6:37 AM UTC

Mike Shou

@MikeShou1

29 Mar 2024

VideoSwap released👇

YUCHAO GU @YuchaoGu

29 Mar 2024

Thank @_akhaliq previous featured our work. Videoswap is now accepted by CVPR 2024. The code is now available at github.com/showlab/VideoSwap. Welcome to try！

2,021

Mike Shou · Jul 5, 2024 · 7:51 PM UTC

Mike Shou

@MikeShou1

5 Jul 2024

Thanks a lot @dimadamen and @mwray0 for inviting and hosting me. I have learned and enjoyed so much from the 1:1s with MaVi members too!

Dima Damen @dimadamen

5 Jul 2024

It was great to have ⁦@MikeShou1⁩ ⁦@NUSingapore⁩ here ⁦@BristolUniEng⁩ ⁦@bristolcs⁩ for a great #MaVi seminar on his group’s efforts to advance understanding and generation in video!

2,150

Mike Shou · Aug 23, 2024 · 5:18 AM UTC

Mike Shou

@MikeShou1

23 Aug 2024

As we work along this direction, we realised this 1 single LLM can be even more general -- not only for generation, but can also support multimodal understanding. Interestingly, this 1.3B parameter LLM achieves comparable results with 10x larger other unified models.

2,377

Mike Shou · Aug 26, 2024 · 1:37 PM UTC

Mike Shou

@MikeShou1

26 Aug 2024

We created a discord space to facilitate discussions, feature requests, collaborations about Show-o! Join us at discord.com/invite/Z7xdzYDa A WeChat group link can be found in our GitHub page too, see you there 😊

Discord - Group Chat That’s All Fun & Games

Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.

discord.com

Mike Shou

@MikeShou1

23 Aug 2024

Replying to @MikeShou1

Code and model have been open sourced at github.com/showlab/Show-o More features are on the way and we welcome collaborations.

3,035

Mike Shou · May 10, 2023 · 2:24 AM UTC

Mike Shou

@MikeShou1

10 May 2023

Quantitative evaluation is notoriously hard for video editing -- at LOVEU@CVPR 2023, we provide a clean dataset & human annotators (free!) for participants🤩

Forrest Iandola @fiandola

10 May 2023

🚨Exciting news!🚨 A CVPR competition for AI-based video editing! 💻🎥 With a clean dataset and baseline code provided, there's no better time to dive into this challenge. Submissions due Jun 5. Don't miss out! 🤩 #CVPR2023 #AI #videoediting Details: sites.google.com/view/loveuc…

1,970

Mike Shou · Feb 6, 2023 · 8:54 AM UTC

Mike Shou

@MikeShou1

6 Feb 2023

Also shoutout to @huggingface @AK for creating a great model zoo library for #TuneAVideo models conceptualized and fine-tuned by Community at huggingface.co/Tune-A-Video-…

Tune-A-Video-library (Tune a video concepts library)

Org profile for Tune a video concepts library on Hugging Face, the AI community building the future.

huggingface.co

7,512

Mike Shou · Dec 1, 2023 · 7:19 AM UTC

Mike Shou

@MikeShou1

1 Dec 2023

🤩🤩🤩

AI at Meta

@AIatMeta

30 Nov 2023

Replying to @AIatMeta

1️⃣ Ego-Exo4D A new foundational dataset + benchmark suite to support research on video learning & multimodal perception, co-developed with 14 university partners. Details ➡️ bit.ly/47Wk8cF Core to the work is videos of skilled human activities, simultaneously capturing both first-person "egocentric" + multiple “exocentric” views.

1,172

Mike Shou · May 23, 2024 · 2:57 AM UTC

Mike Shou

@MikeShou1

23 May 2024

V2 is also on the way 🔜

Jürgen Schmidhuber

@SchmidhuberAI

21 May 2024

Counter-intuitive aspects of text-to-image diffusion models: only a few steps require cross-attention; most don’t. Skipping the extras gives a great speed-up! Many stars on GitHub :-) github.com/HaozheLiu-ST/T-GA… arxiv.org/abs/2404.02747

2,085

Mike Shou · Oct 3, 2023 · 10:20 PM UTC

Mike Shou

@MikeShou1

3 Oct 2023

TL;DR — Too Large; Data Reduction 😆

Jinpeng Wang

@awinyimgprocess

3 Oct 2023

Excited for #ICCV2023! Does more data truly mean better results in Vision-Language Pre-Training? Join me tomorrow for my presentation on 'Too Large; Data Reduction for Vision-Language Pre-Training' at Room Foyer Sud, 096 (10:30 AM - 12:30 PM). See you there! 📚🔬 #AI #Research

1,464

Mike Shou · Jun 18, 2024 · 1:05 AM UTC

Mike Shou

@MikeShou1

18 Jun 2024

Many thanks @y_m_asano for the summary and sharing! Paper links: arxiv.org/abs/2312.13108, fingerrec.github.io/visincon…

Yuki @y_m_asano

18 Jun 2024

Fun talk from @MikeShou1. From using LLMs in a structured way (planner/actor/critic) for vision-language tasks, to using rendered images of text to extend context in MLLMs in a funky way. Advantage of increasing context and reducing compute. Vibes from PIXEL and CLIPPO papers.

2,408

Mike Shou · Aug 23, 2024 · 5:20 AM UTC

Mike Shou

@MikeShou1

23 Aug 2024

For generation, show-o is much faster than AR based method because such a unified LLM uses full attention / diffusion for visual generation.

1,792

Mike Shou · Aug 24, 2024 · 2:55 AM UTC

Mike Shou

@MikeShou1

24 Aug 2024

This is fantastic!

Horace He

@cHHillee

24 Aug 2024

> I see a new model architecture on Twitter > I look to see if they have a new attention variant > They do! > I look at their code > They fully materialize the mask :( > I implement it in 4 lines with FlexAttention > It's 9x faster github.com/showlab/Show-o/is…

2,400

Mike Shou · Dec 19, 2023 · 2:56 PM UTC

Mike Shou

@MikeShou1

19 Dec 2023

Many thanks to Yang and Wu @jayzhangjiewu for helping with the slides!

1,418

Mike Shou · Oct 3, 2023 · 10:34 PM UTC

Mike Shou

@MikeShou1

3 Oct 2023

A NeurIPS news in the midst of ICCV… Congrats to @YuchaoGu on the code release of his NeurIPS paper called Mix-of-Show — we know how to use LoRA to augment diffusion models with new single concept; but how about multiple concepts? Check it out 👇

YUCHAO GU @YuchaoGu

3 Oct 2023

Happy to annonce Mix-of-Show is accepted by NIPS 2023. See you in New Orleans. We just release the codebase for community (github.com/TencentARC/Mix-of…). Welcome to try. Following are some results (left is reference concept, right is our generation results):

1,589

Mike Shou · Aug 23, 2024 · 5:26 AM UTC

Mike Shou

@MikeShou1

23 Aug 2024

More interestingly, show-o is more flexible than AR method for image in-painting and extrapolation, because it does not need predict token one by one.

1,461

Mike Shou · Feb 28, 2023 · 7:22 AM UTC

Mike Shou

@MikeShou1

28 Feb 2023

Thank you @dimadamen for your coordination and guidance - your emails (e.g. the one for next steps) were very helpful!

Dima Damen @dimadamen

27 Feb 2023

As a Senior AC for #CVPR2023 @CVPR want to thank all ACs I worked with, including first-time ACs, who have put tremendous effort to deliver their best set of decisions. I had the pleasure to do synchronous meetings (pics) with all triplets - meet old friends &new colleagues. 1/2

2,164

Mike Shou · Feb 17, 2024 · 8:48 AM UTC

Mike Shou

@MikeShou1

17 Feb 2024

Code released for X-Adapter, our first attempt on diffusion model upgradability 🎉🎉

Lingmin Ran @LingminR

17 Feb 2024

We released the inference code for X-Adapter: github.com/showlab/X-Adapter. Sorry for waiting.

1,908

Mike Shou · Jan 4, 2022 · 4:32 PM UTC

Mike Shou

@MikeShou1

4 Jan 2022

1st day of working in 2022.. So happy and humbled to receive the feedback for my lectures last semester 😀 educating the next-gen is really fulfilling!

Mike Shou · Oct 10, 2021 · 9:54 AM UTC

Mike Shou

@MikeShou1

10 Oct 2021

Join us tomorrow for the live talk by Manohar Paluri on sharing stories! Mano is a director at Facebook AI Research. He has led and grown Facebook AI teams over years to make impact in both research and products. Zoom link: google.com/url?q=https%3A%2F…

Mike Shou · Aug 23, 2024 · 5:28 AM UTC

Mike Shou

@MikeShou1

23 Aug 2024

When it comes to video keyframe generation, show-o is still temporally auto-regressive while full attention in space. Also when generating keyframe, we can generate interleaved textual description, naturally supporting mixed modality generation.

1,469

Mike Shou · Jul 4, 2022 · 1:55 AM UTC

Mike Shou

@MikeShou1

4 Jul 2022

Congrats to Kevin and the team 😀 Thanks to the EPIC organisers @dimadamen for providing a fun workshop experience!

Dima Damen @dimadamen

3 Jul 2022

Replying to @dimadamen

EPIC-KITCHENS 2022 Challenge Winners Multi-Instance Retrieval - 1st rank 🥇(Joint) @NUSingapore Kevin Qinghong Lin, A Wang, Rui Yan, Eric Zhongcong Xu, Rongcheng Tu, .., Mike Zheng Shou @MikeShou1 Talk: piped.video/watch?v=kLRn-Q48… 14/17

Mike Shou · Sep 1, 2022 · 2:36 PM UTC

Mike Shou

@MikeShou1

1 Sep 2022

large-scale models, representation learning, ViTs, AI for healthcare, continual learning, etc.

The AI Talks @TheAITalksOrg

1 Sep 2022

Replying to @TheAITalksOrg

We host talks in AI, machine learning, computer vision, and more. Please subscribe to our newsletter at tinyletter.com/aitalks. We'll send you news about the latest talks, including a zoom link with which you can join the talk and interact with our speakers.

Mike Shou · Oct 14, 2021 · 3:14 PM UTC

Mike Shou

@MikeShou1

14 Oct 2021

Happy to introduce #Ego4D dataset - 3K hours egocentric videos in-the-wild. Great pleasure to work with and learn from members across the globe in the academic consortium. Read more about our @NUSingapore team's part at news.nus.edu.sg/nus-facebook…

NUS, Facebook AI and other world-class universities collaborate to teach AI to understand the world...

There is a marked difference between viewing and interacting with the world as a third-party spectator, and experiencing the action intimately from a first-person point of view.This difference is...

news.nus.edu.sg

AI at Meta

@AIatMeta

14 Oct 2021

We’re announcing #Ego4D, an ambitious long-term project we’ve embarked on w/13 universities in 9 countries to advance first-person perception. This work will catalyze research to build more useful AI assistants, robots & other future innovations. ai.facebook.com/blog/teachin…

Mike Shou · Sep 29, 2021 · 1:17 PM UTC

Mike Shou

@MikeShou1

29 Sep 2021

Join us tomorrow for the live talk by Yaser Sheikh @subail on sharing the stories of FRL lab and tips for growing in company! Yaser founded and directs Facebook Reality Lab in Pittsburgh. More details of other talks & panels: sites.google.com/view/1st-ss…

Mike Shou · Aug 23, 2024 · 5:41 AM UTC

Mike Shou

@MikeShou1

23 Aug 2024

Kudos to the amazing team and collaborators! It is always such a very pleasant and unforgettable experience to work together with great teammates on something exciting. @Sierkinhane1 @maoweijia @ZechenBai @JunhaoZHANG19 @KevinQHLin @YuchaoGu

1,578

Mike Shou · Oct 17, 2021 · 1:50 PM UTC

Mike Shou

@MikeShou1

17 Oct 2021

Thank you Serge and big shout-out to our panelists and the organization team to make this hybrid ICCV workshop happen!

Serge Belongie @SergeBelongie

17 Oct 2021

Thank you @MikeShou1 & Co. for organizing a successful #ICCV2021 workshop on Sharing Stories and Lessons Learned! It was fun having hybrid Zoom/in-person participation in Shanghai.

Mike Shou · Jul 4, 2021 · 8:00 AM UTC

Mike Shou

@MikeShou1

4 Jul 2021

Thank you all for participating our workshop; the recordings & slides & reports can be found at sites.google.com/view/loveuc…

LOVEU@CVPR'21 - Program

Date: June 25, 2021 Time Zone: Pacific Time morning Location: online 08:00 - 08:05 AM: Kickoff - Mike Shou 08:05 - 08:15: Opening remark - Matt Feiszli [video] [slides] 08:15 - 08:45: Invited Talk 1...

sites.google.com

Mike Shou

@MikeShou1

25 Jun 2021

Mike Shou · Aug 22, 2024 · 10:33 AM UTC

Mike Shou

@MikeShou1

22 Aug 2024

Replying to @tinner_he @JeffDean

Looks like not publicly available but on YouTube you can find a similar talk Jeff gave at Purdue recently: piped.video/watch?v=L9CM1u-x…

242

Mike Shou · Oct 6, 2021 · 1:30 PM UTC

Mike Shou

@MikeShou1

6 Oct 2021

Join us tomorrow for the live talk by Rahul Sukthankar on sharing stories! Rahul is a distinguished scientist & senior director at Google Research, where he co-lead the Perception org. Prior to joining Google, he is a faculty member at CMU. Zoom link: google.com/url?q=https%3A%2F…

Mike Shou · Aug 25, 2022 · 8:10 AM UTC

Mike Shou

@MikeShou1

25 Aug 2022

Kindly consider submission to BigMM 2022: IEEE Int Conf on Multimedia Big Data 2022, deadline Sept 24, 2022. The conf will be December 5-7, 2022 at Naples, Italy (format: hybrid). Check more at bigmm.org/ and CfP: easychair.org/cfp/BigMM-2022

Mike Shou · Apr 7, 2024 · 9:09 AM UTC

Mike Shou

@MikeShou1

7 Apr 2024

(3/5) In LeCun’s version, action is another input for WM -- different actions would result in different futures; In Jurgen’s version, the action is predicted based on WM’s output -- humans predict future and then decide how to act.

1,389

Mike Shou · Dec 3, 2022 · 4:08 PM UTC

Mike Shou

@MikeShou1

3 Dec 2022

Mengmi has been doing wonderful research in the intersection of cognitive science and AI. Highly recommend if you are applying for PhD or PostDoc!

Mengmi Zhang @MengmiZhang

29 Nov 2022

I will be a tenure-track assistant professor in Nanyang Technological University (NTU), Singapore, starting from August, 2023. Our lab is currently recruiting (PhD students and postdocs): a0091624.wixsite.com/deepneu…

Mike Shou · Aug 23, 2024 · 8:56 AM UTC

Mike Shou

@MikeShou1

23 Aug 2024

Replying to @violet_zct @gazorp5

Congrats on the great work of Transfusion! We initially also considered this idea of continuous image + discrete text, but then opted to discrete diffusion, so that can have a more unified training loss format, same for image and text.

181

Mike Shou · Oct 5, 2021 · 6:46 AM UTC

Mike Shou

@MikeShou1

5 Oct 2021

Join us tomorrow for the live talk by Mandela Patrick on sharing his PhD journey! @mandelapatrick just finished his PhD in the VGG computer vision lab at Oxford. More details of other talks & panels: sites.google.com/view/1st-ss…

Mike Shou · Aug 23, 2024 · 8:33 AM UTC

Mike Shou

@MikeShou1

23 Aug 2024

Replying to @cHHillee

Thanks a lot for the pointer! This FlexAttention looks cool and we will dive more into it :)

468

Mike Shou · Oct 12, 2023 · 11:47 PM UTC

Mike Shou

@MikeShou1

12 Oct 2023

Replying to @VictorKaiWang1 @JunhaoZHANG19 @jayzhangjiewu @Jia_Wei_LIU

Thanks Kai!

256

Mike Shou · Oct 16, 2021 · 3:04 PM UTC

Mike Shou

@MikeShou1

16 Oct 2021

Thank you @3scorciav for all the nice summary!!!

Victor Escorcia @3scorciav

16 Oct 2021

Niiiice.... the Share Stories and Lessons Learned workshop in #iccv2021 took the 1st step on doing a hybrid event. Some panelists and attendees met up on-site others in Shanghai & others in zoom Happening now! sites.google.com/view/1st-ss…

Mike Shou · Apr 27, 2024 · 2:05 PM UTC

Mike Shou

@MikeShou1

27 Apr 2024

5,379

Mike Shou · Feb 17, 2024 · 1:14 AM UTC

Mike Shou

@MikeShou1

17 Feb 2024

Replying to @Haofan_Wang

Thanks very much for your talk, wonderful work!

177

Mike Shou · Sep 28, 2021 · 4:30 PM UTC

Mike Shou

@MikeShou1

28 Sep 2021

Replying to @zhoubolei

Thanks a lot Bolei for sharing! Good luck for NeurIPS!

Mike Shou · Jun 12, 2024 · 3:42 PM UTC

Mike Shou

@MikeShou1

12 Jun 2024

Replying to @mohitban47 @UNC

Heartiest congrats Mohit, well deserved!

198

Mike Shou · Apr 7, 2024 · 9:09 AM UTC

Mike Shou

@MikeShou1

7 Apr 2024

(5/5) This WM could be video generation models -- like SORA claims itself to be a world simulator. This WM can also be video understanding models -- like “World Model on Million-Length Video” which uses a LLM to do both long video understanding and generation.

1,051

Mike Shou · Apr 7, 2024 · 9:08 AM UTC

Mike Shou

@MikeShou1

7 Apr 2024

(2/5) Given current observation and memory/previous world state as inputs, WM predicts future world state -- some representation of the world in our mind. There are another two key components: action and real world, which will be changed by action, leading to new observation

922

Mike Shou · Apr 7, 2024 · 9:09 AM UTC

Mike Shou

@MikeShou1

7 Apr 2024

(4/5) A more general definition of WM, is a model that has learned a lot of knowledge about the world. This is enabled by learning from a lot of diverse videos. It does not have action or real world here -- more like “Video Foundation Model”.

869

Mike Shou · Apr 7, 2024 · 9:10 AM UTC

Mike Shou

@MikeShou1

7 Apr 2024

Most of the early WM works were conducted on game simulation systems because it is hard to conduct action in the real world. Recent works from Lecun made attempts to use real data (JEPA, V-JEPA). Once we have more mature robots (near future?), this would be easier.

1,031

Mike Shou · Apr 25, 2023 · 2:22 AM UTC

Mike Shou

@MikeShou1

25 Apr 2023

Replying to @VictorKaiWang1

Congrats Kai!

189

Mike Shou · Aug 28, 2024 · 2:30 AM UTC

Mike Shou

@MikeShou1

28 Aug 2024

Replying to @ShumingHu

Thanks for your interest! This is Discrete Diffusion. The book "Probabilistic Machine Learning: Advanced Topics" explains thoroughly about discrete and continuous diffusion. We will add a preliminary section related to this in the paper later this week. cc @Sierkinhane1

911

Mike Shou · Feb 6, 2023 · 8:52 AM UTC

Mike Shou

@MikeShou1

6 Feb 2023

Joint work with @jayzhangjiewu @YuchaoGu @ge_yixiao @_Xintao_ and we welcome further discussions and collaborations!

541

Mike Shou · Apr 13, 2021 · 8:05 AM UTC

Mike Shou

@MikeShou1

13 Apr 2021

Welcome to LOVEU! Our Kinetics-GEBD dataset has the largest number of boundaries which are in-the-wild, open-vocabulary, cover generic event change, and respect human perception diversity. We also have an innovation track that you can explore whatever downstream task you like!

AI at Meta

@AIatMeta

12 Apr 2021

Heading to #CVPR21? We're hosting a challenge. Check out our workshop LOVEU: Long Form Video Understanding sites.google.com/view/loveuc…. We annotated generic event boundaries on popular computer vision datasets. Details in our paper: arxiv.org/abs/2101.10511.

ALT generic event boundary detection

Mike Shou · Nov 9, 2023 · 4:42 AM UTC

Mike Shou

@MikeShou1

9 Nov 2023

Replying to @YangYou1991

Congratulations!

617

Mike Shou · Jun 27, 2021 · 3:24 PM UTC

Mike Shou

@MikeShou1

27 Jun 2021

Replying to @CSProfKGD

Thanks for the interest - Yes, we will post them in a few days.

Mike Shou · Jun 29, 2024 · 3:46 PM UTC

Mike Shou

@MikeShou1

29 Jun 2024

Replying to @serinachang5 @UCBerkeley

Heartiest congratulations Serina!

282

Mike Shou · Aug 26, 2024 · 11:18 AM UTC

Mike Shou

@MikeShou1

26 Aug 2024

Replying to @hyungjin_chung

Thank you for the feedback! Sure, happy to. Can you help post an issue on our github so that we can track requests? Thanks.

Mike Shou · Sep 28, 2021 · 4:29 PM UTC

Mike Shou

@MikeShou1

28 Sep 2021

Replying to @deviparikh

Thanks a lot Devi for sharing!

Mike Shou · Jul 12, 2024 · 11:17 PM UTC

Mike Shou

@MikeShou1

12 Jul 2024

Replying to @CTLimLab

Heartiest congratulations Prof Lim!

205

Mike Shou · Apr 7, 2024 · 12:18 PM UTC

Mike Shou

@MikeShou1

7 Apr 2024

Replying to @Ashkan28489079

Thanks for sharing!

111

Mike Shou · Oct 16, 2021 · 10:36 AM UTC

Mike Shou

@MikeShou1

16 Oct 2021

Join us today for 3 panels of sharing lessons learnt - respectively for early-career researcher/faculty, junior student, fresh graduate/company new hire! sites.google.com/view/1st-ss…

SSLL @ ICCV'21 - Workshop Panels

We cannot meet in person for ECCV, CVPR. But this time at ICCV, we will revive the in-person conference! At Oct 16, we will have concurrent online zoom meeting + onsite in-person meetup of 200...

sites.google.com

Dima Damen @dimadamen

16 Oct 2021

My schedule today 16Oct @ICCV_2021 8am - exciting panel at 1st W on Share Stories and Lessons learnt w @deviparikh, @SergeBelongie sites.google.com/view/1st-ss… V exciting initiative by @MikeShou1 &co-organisers My recorded talk on story of EPIC-KITCHENS: piped.video/RjmdzS1DNFI