Asst Prof at NUS. Forbes 30 under 30 Asia. Previously at Facebook AI and Columbia U. Passionate about video, multi-modal, AI assistant.

(1/6) X-Humanoid 🤖: Scaling up data for Humanoid Robots. We convert human daily activity videos (from Ego-Exo4D) into humanoid videos (i.e., Tesla Optimus) performing tasks like cooking or fixing a bike. This data can be potentially used to train robot policies and world models. 🔥 Project page: showlab.github.io/X-Humanoid… Paper link: arxiv.org/abs/2512.04537
23
71
417
87,883
In early 2024, many people asked me whether go for AR or Diffusion? This pushes me to think about a deeper question - why AR and Diffusion conflict with each other? In theory, not really! This motivates us to explore how to integrate both into show-o, 1 single 1.3B LLM
Show-o One Single Transformer to Unify Multimodal Understanding and Generation discuss: huggingface.co/papers/2408.1… We present a unified transformer, i.e., Show-o, that unifies multimodal understanding and generation. Unlike fully autoregressive models, Show-o unifies autoregressive and (discrete) diffusion modeling to adaptively handle inputs and outputs of various and mixed modalities. The unified model flexibly supports a wide range of vision-language tasks including visual question-answering, text-to-image generation, text-guided inpainting/extrapolation, and mixed-modality generation. Across various benchmarks, it demonstrates comparable or superior performance to existing individual models with an equivalent or larger number of parameters tailored for understanding or generation. This significantly highlights its potential as a next-generation foundation model.
11
84
440
148,428
We are thrilled to release a 3-hours tutorial for beginners who would like to quickly get into Video Diffusion Models 😀👉 Check it out: piped.video/watch?v=0K56LA82…
2
65
325
42,416
Very excited to have @JeffDean at NUS for the first time! Thanks for sharing Google’s perspective about multi-modal and video.
1
3
178
15,714
Show-o update🔥: 1. We have released training codes on GitHub, including both pre-training and instruction tuning! 🔥 2. Add FlexAttention’s impl for great speed up. Thanks @cHHillee 🚀 github.com/showlab/Show-o/bl… 3. gradio demo up 🤗 huggingface.co/spaces/showla… Have fun!
In early 2024, many people asked me whether go for AR or Diffusion? This pushes me to think about a deeper question - why AR and Diffusion conflict with each other? In theory, not really! This motivates us to explore how to integrate both into show-o, 1 single 1.3B LLM
2
23
150
27,892
This paper has been rejected by NeurIPS, ICLR, CVPR while finally accepted by ICCV.. Thanks to @jayzhangjiewu for never giving up and appreciate the help from a dozen of reviewers. TL;DR. A more realistic setting in Online Continual Video Object Detection with much less labels.
ICCV paper in collaboration with Jay Zhangjie Wu from ShowLab @MikeShou1 is up! If you wanna know: how to use fast-slow complementary learning systems to continuously learn from video streams in a label-efficient manner. Check out our paper here: arxiv.org/pdf/2206.00309.pdf
2
5
119
27,513
Wonder how to start your faculty career? grow in company? secure PhD offers? Check our #ICCV'21 workshop on Share Stories and Lessons Learned. As a warm-up, we have released some recorded talks from @dimadamen @deviparikh @xinshuoweng @zhoubolei and a few live talks upcoming!
2
22
85
In the era of #Sora, it’s important to detect whether a video is AIGC and who is the owner. Happy to introduce RingID, a diffusion-based watermark identification method that can not only identify whether an image/video is generated or not, but also by who: arxiv.org/abs/2404.14055
3
18
83
11,741
Honored and humbled to receive this award in my 3rd year at NUS. Time flies.. Many thanks to my mentors, students and collaborators 🙏
1
1
80
4,290
Show Lab from National University of Singapore had a wonderful week at #ICCV2023 Merci @ICCVConference and goodbye Paris!
5
78
8,208
Ciao🇮🇹 My first time attending #ECCV2024 and looking forward to catching up! I will be talking about our Show-o, one single transformer that unifies multimodal understanding and generation, at tomorrow 9am on Sep 30, room Amber 3, venue workshop.
1
76
3,660
After the initial release, there are queries about where is diffusion? We are familiar with continuous diffusion like SD, while another type of diffusion is discrete. Today, we update our arxiv paper to add preliminaries about discrete diffusion🔥 Feedback would be appreciated :)
In early 2024, many people asked me whether go for AR or Diffusion? This pushes me to think about a deeper question - why AR and Diffusion conflict with each other? In theory, not really! This motivates us to explore how to integrate both into show-o, 1 single 1.3B LLM
3
10
67
6,776
Do you want to quickly turn pre-trained image models like #StableDiffusion and #DreamBooth into a text-to-video / GIF generation model? Excited to share our #TuneAVideo takes only 5-10mins on 1 A100 for training and is now open sourced: github.com/showlab/Tune-A-Vi… (~500 stars so far)
3
14
67
25,788
Recently, I have heard of “World Model” many many times. But there seem to exist two major types of WM: (1/5) A more authentic and technical definition is from big figures like Jurgen and LeCun: future prediction + action + real world details below👇
1
7
66
10,609
Happy to share our #ICCV2023 work by @weijiawu7 that turns stable diffusion into surprisedly good segmentation model for open-vocab words, like Eiffel Tower or Ultraman Paper: arxiv.org/abs/2303.11681 Code: weijiawu.github.io/Diffusion…
"📚 Excited to share our work presented at ICCV23! We explored to leverage free cross-attention masks from Stable Diffusion for semantic segmentation labeling. The results not being very impressive, but intriguing. weijiawu.github.io/Diffusion… #ICCV23 #SemanticSegmentation"
11
63
8,500
Sadly, I am unable to attend #CVPR2024🥹But ShowLabers are!🙌 Talk to @jayzhangjiewu @LingminR @Jia_Wei_LIU @YuchaoGu @Sierkinhane1 Eric on diffusion/generation and @ziteng_v @ZechenBai Stan on MLLM/understanding. BTW, our lab at #NUS has new PhD openings for *Jan 2025*😀
4
48
5,231
How to go beyond 1s and make sense of a longer video? Join us in a few hours today (Jun 25 8am-12:30pm PST California time) at our @CVPR LOVEU workshop for the live talks and QA from our invited speakers and challenge winners! Agenda: sites.google.com/view/loveuc…
1
10
43
#CVPR2024 oral/highlight decision was out. Some were already famous, but I always find some other "surprised" papers that share sth in common - no high numbers, no complex system/eng, but just a very interesting idea that excites and inspires me to keep thinking🤔
1
2
36
4,993
Congrats to Benita, an undergraduate / Year-3 student for making her paper to ECCV, a top-tier AI conference! -- towards personal AI Assistant on smart glasses👀 vision-language, AR/VR, video. Still a lot of opportunities & large room for improvement: arxiv.org/abs/2203.04203
4
34
Show-1 is finally open sourced! We marry the strengths of pixel-based and latent-based VDMs, producing high-quality videos of precise text-video alignment and high GPU memory efficiency during inference. Congrats to the awesooooome team @JunhaoZHANG19 @jayzhangjiewu @Jia_Wei_LIU
🔥 We're thrilled to announce the release of the code and model weights of Show-1! 🤖 💡 Unleash your creativity and embark on an exciting journey into the future of AI video generation with Show-1 now! 🚀 🔗 Code: github.com/showlab/Show-1; Model weights: huggingface.co/showlab
8
31
5,092
Watching my student deliver his award-winning talk filled me with immense pride.. Congrats to @KevinQHLin for winning the PREMIA Best Paper Award 2023! Many thanks to the Pattern Recognition and Machine Intelligence Association for recognising Kevin’s EgoVLP work.
26
3,111
List of recent papers for screen GUI automation with video-language-action model👇
💻We live in the digital era, where screens (PC/Phone) are integral to our lives. 🧐Curious about how AI assistant can help with computer tasks? 🚀Check out the latest progress in repo: github.com/showlab/Awesome-G… ✨A collection of the up-to-date GUI-related papers and resources.
2
26
3,825
Code has been released as well: github.com/showlab/DatasetDM
Excited to introduce our paper at NeurIPS 2023: DatasetDM! 🚀(weijiawu.github.io/DatasetDM…) Our diffusion model produces an unlimited quantity of synthetic images, accompanied by perception annotations like depth, segmentation, and human pose. 🌐📊 #NeurIPS2023 #AI #Research
2
24
3,222
Happy to introduce EgoVLPv2 to appear in #ICCV2023! Congrats to @Shramanpramani2 and the team. Project page: shramanpramanick.github.io/E…
Thanks @_akhaliq for highlighting our work. EgoVLPv2 is accepted in #ICCV2023. Joint work with @PengchuanZ, @yalesong, @nagsayan112358, @KevinQHLin, @MikeShou1, and my advisor Prof. Rama Chellappa. More details on shramanpramanick.github.io/E…. @MetaAI @JohnsHopkins
3
21
4,189
We have released code for RingID which safeguards diffusion models with multi-user watermarks github.com/showlab/RingID Looking forward to discussions at ECCV next month!
4
20
4,500
Thanks to @RisingSayak from Huggingface to deliver a wonderful tutorial talk about diffusers at NUS. Nice to finally “see” the friend who we collaborated just online, and so happy for their great achievements!
2
18
1,759
Congrats to @ZHHHYuan for getting his first paper accepted by IJCV 🙌 He proposes a highly parameter-efficient fine-tuning technique, which only needs to tune 0.11M additional parameters (0.13% of the full model). Paper: arxiv.org/abs/2309.08513 Code: github.com/showlab/SCT
Happy to share our #IJCV2023 work that proposes a straightforward technique for parameter-efficient fine-tuning. This work represents a remarkable reduction of 780 times in parameter costs compared to its full fine-tuning counterpart. Paper: arxiv.org/abs/2309.08513. @MikeShou1
17
1,229
Hey @CVPR what is the wifi password? Thanks!
3
14
11,778
Congrats to @KevinQHLin and co-authors! It's a nice start and we are keen to improve egocentric pre-training down the road -- welcome suggestions and further collaborations with the community!
I am excited to share that our work EgoVLP has been accepted to #NeurIPS2022!🎉The first paper in my first year of Ph.D.! Many thanks to my supervisor @MikeShou1 and collaborators: @dimadamen @mwray0 @mattia_soldan @BernardSGhanem Preprint: arxiv.org/abs/2206.01670 @NeurIPSConf
1
16
VideoSwap released👇
Thank @_akhaliq previous featured our work. Videoswap is now accepted by CVPR 2024. The code is now available at github.com/showlab/VideoSwap. Welcome to try!
15
2,021
Thanks a lot @dimadamen and @mwray0 for inviting and hosting me. I have learned and enjoyed so much from the 1:1s with MaVi members too!
It was great to have ⁦@MikeShou1⁩ ⁦@NUSingapore⁩ here ⁦@BristolUniEng⁩ ⁦@bristolcs⁩ for a great #MaVi seminar on his group’s efforts to advance understanding and generation in video!
14
2,150
As we work along this direction, we realised this 1 single LLM can be even more general -- not only for generation, but can also support multimodal understanding. Interestingly, this 1.3B parameter LLM achieves comparable results with 10x larger other unified models.
1
1
15
2,377
We created a discord space to facilitate discussions, feature requests, collaborations about Show-o! Join us at discord.com/invite/Z7xdzYDa A WeChat group link can be found in our GitHub page too, see you there 😊
Replying to @MikeShou1
Code and model have been open sourced at github.com/showlab/Show-o More features are on the way and we welcome collaborations.
1
14
3,035
Quantitative evaluation is notoriously hard for video editing -- at LOVEU@CVPR 2023, we provide a clean dataset & human annotators (free!) for participants🤩
🚨Exciting news!🚨 A CVPR competition for AI-based video editing! 💻🎥 With a clean dataset and baseline code provided, there's no better time to dive into this challenge. Submissions due Jun 5. Don't miss out! 🤩 #CVPR2023 #AI #videoediting Details: sites.google.com/view/loveuc…
3
13
1,970
🤩🤩🤩
Replying to @AIatMeta
1️⃣ Ego-Exo4D A new foundational dataset + benchmark suite to support research on video learning & multimodal perception, co-developed with 14 university partners. Details ➡️ bit.ly/47Wk8cF Core to the work is videos of skilled human activities, simultaneously capturing both first-person "egocentric" + multiple “exocentric” views.
13
1,172
V2 is also on the way 🔜
Counter-intuitive aspects of text-to-image diffusion models: only a few steps require cross-attention; most don’t. Skipping the extras gives a great speed-up! Many stars on GitHub :-) github.com/HaozheLiu-ST/T-GA… arxiv.org/abs/2404.02747
13
2,085
TL;DR — Too Large; Data Reduction 😆
Excited for #ICCV2023! Does more data truly mean better results in Vision-Language Pre-Training? Join me tomorrow for my presentation on 'Too Large; Data Reduction for Vision-Language Pre-Training' at Room Foyer Sud, 096 (10:30 AM - 12:30 PM). See you there! 📚🔬 #AI #Research
12
1,464
Many thanks @y_m_asano for the summary and sharing! Paper links: arxiv.org/abs/2312.13108, fingerrec.github.io/visincon…
Fun talk from @MikeShou1. From using LLMs in a structured way (planner/actor/critic) for vision-language tasks, to using rendered images of text to extend context in MLLMs in a funky way. Advantage of increasing context and reducing compute. Vibes from PIXEL and CLIPPO papers.
2
12
2,408
For generation, show-o is much faster than AR based method because such a unified LLM uses full attention / diffusion for visual generation.
1
11
1,792
This is fantastic!
> I see a new model architecture on Twitter > I look to see if they have a new attention variant > They do! > I look at their code > They fully materialize the mask :( > I implement it in 4 lines with FlexAttention > It's 9x faster github.com/showlab/Show-o/is…
11
2,400
Many thanks to Yang and Wu @jayzhangjiewu for helping with the slides!
9
1,418
A NeurIPS news in the midst of ICCV… Congrats to @YuchaoGu on the code release of his NeurIPS paper called Mix-of-Show — we know how to use LoRA to augment diffusion models with new single concept; but how about multiple concepts? Check it out 👇
Happy to annonce Mix-of-Show is accepted by NIPS 2023. See you in New Orleans. We just release the codebase for community (github.com/TencentARC/Mix-of…). Welcome to try. Following are some results (left is reference concept, right is our generation results):
10
1,589
More interestingly, show-o is more flexible than AR method for image in-painting and extrapolation, because it does not need predict token one by one.
1
1
9
1,461
Thank you @dimadamen for your coordination and guidance - your emails (e.g. the one for next steps) were very helpful!
As a Senior AC for #CVPR2023 @CVPR want to thank all ACs I worked with, including first-time ACs, who have put tremendous effort to deliver their best set of decisions. I had the pleasure to do synchronous meetings (pics) with all triplets - meet old friends &new colleagues. 1/2
8
2,164
Code released for X-Adapter, our first attempt on diffusion model upgradability 🎉🎉
We released the inference code for X-Adapter: github.com/showlab/X-Adapter. Sorry for waiting.
1
9
1,908
1st day of working in 2022.. So happy and humbled to receive the feedback for my lectures last semester 😀 educating the next-gen is really fulfilling!
9
Join us tomorrow for the live talk by Manohar Paluri on sharing stories! Mano is a director at Facebook AI Research. He has led and grown Facebook AI teams over years to make impact in both research and products. Zoom link: google.com/url?q=https%3A%2F…
7
When it comes to video keyframe generation, show-o is still temporally auto-regressive while full attention in space. Also when generating keyframe, we can generate interleaved textual description, naturally supporting mixed modality generation.
2
1
7
1,469
Congrats to Kevin and the team 😀 Thanks to the EPIC organisers @dimadamen for providing a fun workshop experience!
Replying to @dimadamen
EPIC-KITCHENS 2022 Challenge Winners Multi-Instance Retrieval - 1st rank 🥇(Joint) @NUSingapore Kevin Qinghong Lin, A Wang, Rui Yan, Eric Zhongcong Xu, Rongcheng Tu, .., Mike Zheng Shou @MikeShou1 Talk: piped.video/watch?v=kLRn-Q48… 14/17
1
7
large-scale models, representation learning, ViTs, AI for healthcare, continual learning, etc.
Replying to @TheAITalksOrg
We host talks in AI, machine learning, computer vision, and more. Please subscribe to our newsletter at tinyletter.com/aitalks. We'll send you news about the latest talks, including a zoom link with which you can join the talk and interact with our speakers.
7
Happy to introduce #Ego4D dataset - 3K hours egocentric videos in-the-wild. Great pleasure to work with and learn from members across the globe in the academic consortium. Read more about our @NUSingapore team's part at news.nus.edu.sg/nus-facebook…
We’re announcing #Ego4D, an ambitious long-term project we’ve embarked on w/13 universities in 9 countries to advance first-person perception. This work will catalyze research to build more useful AI assistants, robots & other future innovations. ai.facebook.com/blog/teachin…
5
Join us tomorrow for the live talk by Yaser Sheikh @subail on sharing the stories of FRL lab and tips for growing in company! Yaser founded and directs Facebook Reality Lab in Pittsburgh. More details of other talks & panels: sites.google.com/view/1st-ss…
5
Kudos to the amazing team and collaborators! It is always such a very pleasant and unforgettable experience to work together with great teammates on something exciting. @Sierkinhane1 @maoweijia @ZechenBai @JunhaoZHANG19 @KevinQHLin @YuchaoGu
6
1,578
Thank you Serge and big shout-out to our panelists and the organization team to make this hybrid ICCV workshop happen!
Thank you @MikeShou1 & Co. for organizing a successful #ICCV2021 workshop on Sharing Stories and Lessons Learned! It was fun having hybrid Zoom/in-person participation in Shanghai.
5
Thank you all for participating our workshop; the recordings & slides & reports can be found at sites.google.com/view/loveuc…
How to go beyond 1s and make sense of a longer video? Join us in a few hours today (Jun 25 8am-12:30pm PST California time) at our @CVPR LOVEU workshop for the live talks and QA from our invited speakers and challenge winners! Agenda: sites.google.com/view/loveuc…
5
Replying to @tinner_he @JeffDean
Looks like not publicly available but on YouTube you can find a similar talk Jeff gave at Purdue recently: piped.video/watch?v=L9CM1u-x…
1
1
5
242
Join us tomorrow for the live talk by Rahul Sukthankar on sharing stories! Rahul is a distinguished scientist & senior director at Google Research, where he co-lead the Perception org. Prior to joining Google, he is a faculty member at CMU. Zoom link: google.com/url?q=https%3A%2F…
4
Kindly consider submission to BigMM 2022: IEEE Int Conf on Multimedia Big Data 2022, deadline Sept 24, 2022. The conf will be December 5-7, 2022 at Naples, Italy (format: hybrid). Check more at bigmm.org/ and CfP: easychair.org/cfp/BigMM-2022
1
1
3
(3/5) In LeCun’s version, action is another input for WM -- different actions would result in different futures; In Jurgen’s version, the action is predicted based on WM’s output -- humans predict future and then decide how to act.
2
3
1,389
Mengmi has been doing wonderful research in the intersection of cognitive science and AI. Highly recommend if you are applying for PhD or PostDoc!
I will be a tenure-track assistant professor in Nanyang Technological University (NTU), Singapore, starting from August, 2023. Our lab is currently recruiting (PhD students and postdocs): a0091624.wixsite.com/deepneu…
4
Replying to @violet_zct @gazorp5
Congrats on the great work of Transfusion! We initially also considered this idea of continuous image + discrete text, but then opted to discrete diffusion, so that can have a more unified training loss format, same for image and text.
3
181
Join us tomorrow for the live talk by Mandela Patrick on sharing his PhD journey! @mandelapatrick just finished his PhD in the VGG computer vision lab at Oxford. More details of other talks & panels: sites.google.com/view/1st-ss…
3
Replying to @cHHillee
Thanks a lot for the pointer! This FlexAttention looks cool and we will dive more into it :)
3
468
Thank you @3scorciav for all the nice summary!!!
Niiiice.... the Share Stories and Lessons Learned workshop in #iccv2021 took the 1st step on doing a hybrid event. Some panelists and attendees met up on-site others in Shanghai & others in zoom Happening now! sites.google.com/view/1st-ss…
3
Replying to @Haofan_Wang
Thanks very much for your talk, wonderful work!
2
177
Replying to @zhoubolei
Thanks a lot Bolei for sharing! Good luck for NeurIPS!
2
Replying to @mohitban47 @UNC
Heartiest congrats Mohit, well deserved!
1
2
198
(5/5) This WM could be video generation models -- like SORA claims itself to be a world simulator. This WM can also be video understanding models -- like “World Model on Million-Length Video” which uses a LLM to do both long video understanding and generation.
1
2
1,051
(2/5) Given current observation and memory/previous world state as inputs, WM predicts future world state -- some representation of the world in our mind. There are another two key components: action and real world, which will be changed by action, leading to new observation
1
2
922
(4/5) A more general definition of WM, is a model that has learned a lot of knowledge about the world. This is enabled by learning from a lot of diverse videos. It does not have action or real world here -- more like “Video Foundation Model”.
1
1
869
Most of the early WM works were conducted on game simulation systems because it is hard to conduct action in the real world. Recent works from Lecun made attempts to use real data (JEPA, V-JEPA). Once we have more mature robots (near future?), this would be easier.
1
1
1,031
Replying to @VictorKaiWang1
Congrats Kai!
1
189
Replying to @ShumingHu
Thanks for your interest! This is Discrete Diffusion. The book "Probabilistic Machine Learning: Advanced Topics" explains thoroughly about discrete and continuous diffusion. We will add a preliminary section related to this in the paper later this week. cc @Sierkinhane1
1
1
911
Joint work with @jayzhangjiewu @YuchaoGu @ge_yixiao @_Xintao_ and we welcome further discussions and collaborations!
1
541
Welcome to LOVEU! Our Kinetics-GEBD dataset has the largest number of boundaries which are in-the-wild, open-vocabulary, cover generic event change, and respect human perception diversity. We also have an innovation track that you can explore whatever downstream task you like!
Heading to #CVPR21? We're hosting a challenge. Check out our workshop LOVEU: Long Form Video Understanding sites.google.com/view/loveuc…. We annotated generic event boundaries on popular computer vision datasets. Details in our paper: arxiv.org/abs/2101.10511.
1
Replying to @YangYou1991
Congratulations!
1
617
Replying to @CSProfKGD
Thanks for the interest - Yes, we will post them in a few days.
1
Heartiest congratulations Serina!
1
282
Replying to @hyungjin_chung
Thank you for the feedback! Sure, happy to. Can you help post an issue on our github so that we can track requests? Thanks.
1
73
Replying to @deviparikh
Thanks a lot Devi for sharing!
1
Replying to @CTLimLab
Heartiest congratulations Prof Lim!
1
1
205
Replying to @Ashkan28489079
Thanks for sharing!
1
1
111
Join us today for 3 panels of sharing lessons learnt - respectively for early-career researcher/faculty, junior student, fresh graduate/company new hire! sites.google.com/view/1st-ss…
My schedule today 16Oct @ICCV_2021 8am - exciting panel at 1st W on Share Stories and Lessons learnt w @deviparikh, @SergeBelongie sites.google.com/view/1st-ss… V exciting initiative by @MikeShou1 &co-organisers My recorded talk on story of EPIC-KITCHENS: piped.video/RjmdzS1DNFI
1