Junyi Zhang · Mar 9, 2026 · 3:07 PM UTC

Junyi Zhang

Pinned Tweet

Junyi Zhang

@junyi42

Mar 9

𝗢𝗻𝗲 𝗺𝗲𝗺𝗼𝗿𝘆 𝗰𝗮𝗻’𝘁 𝗿𝘂𝗹𝗲 𝘁𝗵𝗲𝗺 𝗮𝗹𝗹. We present 𝗟𝗼𝗚𝗲𝗥, a new 𝗵𝘆𝗯𝗿𝗶𝗱 𝗺𝗲𝗺𝗼𝗿𝘆 architecture for long-context geometric reconstruction. LoGeR enables stable reconstruction over up to 𝟭𝟬𝗸 𝗳𝗿𝗮𝗺𝗲𝘀 / 𝗸𝗶𝗹𝗼𝗺𝗲𝘁𝗲𝗿 𝘀𝗰𝗮𝗹𝗲, with 𝗹𝗶𝗻𝗲𝗮𝗿-𝘁𝗶𝗺𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 in sequence length, 𝗳𝘂𝗹𝗹𝘆 𝗳𝗲𝗲𝗱𝗳𝗼𝗿𝘄𝗮𝗿𝗱 inference, and 𝗻𝗼 𝗽𝗼𝘀𝘁-𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻. Yet it matches or surpasses strong optimization-based pipelines. (1/5) @GoogleDeepMind @Berkeley_AI

445

3,392

561,709

Junyi Zhang · Oct 7, 2024 · 3:15 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

Excited to share MonST3R! -- a simple way to estimate geometry from unposed video of dynamic scene We achieve competitive results on several downstreams (video depth, camera pose) and believe this is a promising step toward feed-forward 4D reconstruction monst3r-project.github.io

138

723

131,546

Junyi Zhang · Feb 11, 2025 · 3:22 PM UTC

Junyi Zhang

@junyi42

11 Feb 2025

MonST3R is accepted by ICLR'25 as Spotlight! We have also added a fully feed-forward reconstruction mode that runs in real-time for video input (samples at: monst3r-paper.github.io/page…), check more details here: github.com/Junyi42/monst3r/p…

Junyi Zhang

@junyi42

7 Oct 2024

327

21,908

Junyi Zhang · Apr 21, 2025 · 4:30 PM UTC

Junyi Zhang

@junyi42

21 Apr 2025

Introducing St4RTrack!🖖 Simultaneous 4D Reconstruction and Tracking in the world coordinate feed-forwardly, just by changing the meaning of two pointmaps! st4rtrack.github.io

277

52,442

Junyi Zhang · Oct 21, 2024 · 5:08 PM UTC

Junyi Zhang

@junyi42

21 Oct 2024

Code for inference, visualization, training, and evaluation is released! - GitHub.com/Junyi42/monst3r

GitHub - Junyi42/monst3r: Official Implementation of paper "MonST3R: A Simple Approach for Estima...

Official Implementation of paper "MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion" - Junyi42/monst3r

github.com

Junyi Zhang

@junyi42

7 Oct 2024

224

21,518

Junyi Zhang · May 21, 2025 · 12:55 PM UTC

Junyi Zhang

@junyi42

21 May 2025

Very impressive! At VideoMimic.net, we already: learn from 3rd-person human videos + RL -- for locomotion. Excited to see where this path goes next!

Milan Kovac

@_milankovac_

21 May 2025

One of our goals is to have Optimus learn straight from internet videos of humans doing tasks. Those are often 3rd person views captured by random cameras etc.   We recently had a significant breakthrough along that journey, and can now transfer a big chunk of the learning directly from human videos to the bots (1st person views for now). This allows us to bootstrap new tasks much faster compared to teleoperated bot data alone (heavier operationally).  Many new skills are emerging through this process, are called for via natural language (voice/text), and are run by a single neural network on the bot (multi-tasking).  Next: expand to 3rd person video transfer (aka random internet), and push reliability via self-play (RL) in the real-, and/or synthetic- (sim / world models) world.  If you’re great at AI and want to be part of its biggest real-world applications ever, you really need to join Tesla right now.

207

17,872

Junyi Zhang · May 7, 2025 · 7:02 PM UTC

Junyi Zhang

@junyi42

7 May 2025

Humanoids need to perceive the environment in the real world Using 4D reconstruction techniques, we turn casual human videos into training data for an environment-aware humanoid policy Super excited to share: VideoMimic.net

Arthur Allshire

@arthurallshire

7 May 2025

our new system trains humanoid robots using data from cell phone videos, enabling skills such as climbing stairs and sitting on chairs in a single policy (w/ @redstone_hong @junyi42 @davidrmcall)

132

11,352

Junyi Zhang · Jun 11, 2025 · 2:18 AM UTC

Junyi Zhang

@junyi42

11 Jun 2025

Just arrived at Nashville for #CVPR25! 🥰 I'll present St4RTrack tomorrow morning (10:30–12:30) at the 4D Vision Workshop, poster #137 in Hall 104 B. Feel free to come and chat!

Junyi Zhang

@junyi42

21 Apr 2025

Introducing St4RTrack!🖖 Simultaneous 4D Reconstruction and Tracking in the world coordinate feed-forwardly, just by changing the meaning of two pointmaps! st4rtrack.github.io

8,862

Junyi Zhang · Mar 21, 2024 · 1:07 PM UTC

Junyi Zhang

@junyi42

21 Mar 2024

🚀Introducing “Telling Left from Right” at #CVPR2024 -🔍Identify the problem 𝐠𝐞𝐨metry-𝐚𝐰𝐚𝐫𝐞 semantic correspondence (SC) -📐Evaluate foundation model features’ geometric awareness -🏆Achieve SOTA with a lightweight post-processor 🔗 (w/ code!): telling-left-from-right.gith…

9,582

Junyi Zhang · Jun 16, 2024 · 9:00 AM UTC

Junyi Zhang

@junyi42

16 Jun 2024

On my way to Seattle ✈️ for my first ever #CVPR! Excited to meet old and new friends. 😄 I'll be presenting our work telling-left-from-right.gith… on Wed. (19th) morning at #284. If you're interested in how a plug-in processor can enhance the Geo-aware SC of SD+DINO, please stop by.

7,086

Junyi Zhang · Apr 24, 2025 · 1:40 AM UTC

Junyi Zhang

@junyi42

24 Apr 2025

I'll be presenting MonST3R at ICLR! 🇸🇬 Friday 25th, 10am-12:30pm Hall 3+2B #97 Come by if you are interested!

Junyi Zhang

@junyi42

11 Feb 2025

3,087

Junyi Zhang · Nov 28, 2024 · 5:19 PM UTC

Junyi Zhang

@junyi42

28 Nov 2024

The results are so cool! 4D reconstruction is a very challenging task - I tried to explore it before MonST3R but couldn't make it work. I'm thrilled to see MonST3R contributing a part to this reconstruction pipeline!

Rundi Wu @ChrisWu6080

28 Nov 2024

🚀 Introducing CAT4D! 🚀 CAT4D transforms any real or generated video into dynamic 3D scenes with a multi-view video diffusion model. The outputs are dynamic 3D models that we can freeze and look at from novel viewpoints, in real-time! Be sure to try our interactive viewer!

5,170

Junyi Zhang · Oct 7, 2024 · 3:52 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

Hard to see the details in the figure? Check it out for yourself 😍: monst3r-project.github.io/pa… We’ve created an interesting 4D online demo that you can easily explore!

7,008

Junyi Zhang · Mar 31, 2025 · 3:22 AM UTC

Junyi Zhang

@junyi42

31 Mar 2025

Nice work! Very cool results by carefully-designed generative inpainting on MonST3R's partial pointmaps. Glad to see MonST3R/dynamic 3d reconstruction is playing an important role.

Tianqi Liu @TianqiLiu664

30 Mar 2025

🔥Free4D creates explicit 4D Gaussian scene representations from a single image, enabling high-quality, controllable, and real-time rendering. 👉Project (with interactive demo): free4d.github.io/ Paper: arxiv.org/abs/2503.20785 Code (open-sourced): github.com/TQTQliu/Free4D

5,242

Junyi Zhang · Dec 10, 2023 · 6:42 PM UTC

Junyi Zhang

@junyi42

10 Dec 2023

Super excited to attend #NeurIPS2023 from 11th to 16th! 🥰 I'll be presenting our work 'A Tale of Two Features' (sd-complements-dino.github.i…) at the 'Thu Morning session #212'. Looking forward to meeting new and old friends in New Orleans! 🌟

3,736

Junyi Zhang · Dec 8, 2024 · 8:00 PM UTC

Junyi Zhang

@junyi42

8 Dec 2024

In "telling left from right", we showed it's important to make 2D semantic correspondence geometry-aware. In DenseMatcher, we further lift it to 3D and with this, we enable robots generalizable skills across categories! Fun collaboration with Joseph, Yuanchen, and the team!

Yuanchen Ju @ju_yuanchen

8 Dec 2024

🍌We present DenseMatcher！ 🤖️DenseMatcher enables robots to acquire generalizable skills across diverse object categories by only seeing one demo, by finding correspondences between 3D objects even with different types, shapes, and appearances.

3,737

Junyi Zhang · Jun 12, 2023 · 3:41 PM UTC

Junyi Zhang

@junyi42

12 Jun 2023

Appreciate the share @sstj389! Code is now live at: github.com/Junyi42/sd-dino. We exploit complementarity of SD and DINOv2 features, achieving superior results via a simple fusion. Surprisingly, fused features outperform supervised methods on SPair-71k. Welcome to explore further!

GitHub - Junyi42/sd-dino: Official Implementation of paper "A Tale of Two Features: Stable Diffus...

Official Implementation of paper "A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence" - Junyi42/sd-dino

github.com

Stefan Stojanov @sstj389

25 May 2023

A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence Super cool results on their webpage! arxiv: arxiv.org/abs/2305.15347 project page: sd-complements-dino.github.i…

3,304

Junyi Zhang · Oct 7, 2024 · 4:05 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

Huge thanks to my amazing collaborators! -- Charles Herrmann+, @JunhwaHur, @jampani_varun, @trevordarrell, @forrestercole, @DeqingSun*, and Ming-Hsuan Yang* Special thanks to @brenthyi for the support in setting up the online demo Model is released at: github.com/Junyi42/monst3r

1,645

Junyi Zhang · Apr 16, 2024 · 3:59 AM UTC

Junyi Zhang

@junyi42

16 Apr 2024

Very great study! This is a much more comprehensive analysis into the 3d/geometric awareness of vision models compared to our telling-left-from-right, while we focus more on correspondence scenarios and how to improve it.

Aran Komatsuzaki

@arankomatsuzaki

15 Apr 2024

Google presents Probing the 3D Awareness of Visual Foundation Models Visual foundation models can learn representations that encode the depth and orientation of the visible surface but struggle with multiview consistency possibly because they are learning view-dependent representations repo: github.com/mbanani/probe3d abs: arxiv.org/abs/2404.08636

2,864

Junyi Zhang · Jul 1, 2024 · 4:43 PM UTC

Junyi Zhang

@junyi42

1 Jul 2024

Great to see a much more clever unsupervised way to fuse SD & DINO features compared to concatenation. 😃 The smooth and globally consistent correspondence from these features is really nice!

Congyue Deng @CongyueD

1 Jul 2024

Our paper "Zero-Shot Image Feature Consensus with Deep Functional Maps" is accepted at #ECCV2024! @eccvconf Want better image correspondences with noisy and inaccurate features? Let's go to the spectral space with Laplacian eigenfunctions! ArXiv: arxiv.org/abs/2403.12038

3,767

Junyi Zhang · Oct 2, 2023 · 4:41 AM UTC

Junyi Zhang

@junyi42

2 Oct 2023

Just arrived in lovely Paris 🇫🇷🗼 for my first in-person conference #ICCV2023! Thrilled about the valuable input and networking ahead!! 🥰

1,469

Junyi Zhang · Oct 7, 2024 · 3:21 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

We handle dynamic videos with DUSt3R's pointmap representation: estimate xyz coordinates for two frames, aligned in the camera coordinate of the first frame ➡️ No constraint on dynamic/static scenes in the representation! But how does DUSt3R actually work for dynamic scenes?🤔

2,905

Junyi Zhang · Oct 7, 2024 · 3:43 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

What's more exciting is our "joint dense reconstruction & camera pose estimation" result, while being 10x faster than previous method We visualize the optimized global point cloud and estimated camera poses:

1,557

Junyi Zhang · Oct 7, 2024 · 4:00 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

Last, we also show the results of feed-forward pairwise pointmaps prediction, compared with DUSt3R: Row 1: we can still handle dynamic focals; Rows 2,3: we can do "impossible matching" in dynamic scenes; Rows 4,5: we can better estimate geometry in large scenes

2,148

Junyi Zhang · Oct 7, 2024 · 3:24 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

It doesn't work out-of-the-box But as this is primarily a data issue, we propose a simple approach to adapt DUSt3R to dynamic scenes, by fine-tuning on a small set of dynamic videos, which surprisingly works well

2,024

Junyi Zhang · Dec 12, 2023 · 4:29 PM UTC

Junyi Zhang

@junyi42

12 Dec 2023

Thank Björn Ommer for covering our work at the #NeurIPS23 opening keynote yesterday! Interested in more details? Check our work (sd-complements-dino.github.i…) and other concurrent works (diffusionfeatures.github.io, ubc-vision.github.io/LDM_cor…, diffusion-hyperfeatures.gith…) at this NeurIPS!😄

875

Junyi Zhang · Oct 7, 2024 · 3:28 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

For a video input consisting of more than two frames, we can aggregate all the pairwise pointmap results to build a global point cloud With this unified representation, we can simply pull out per-frame camera pose, intrinsics, and video depth

1,673

Junyi Zhang · Oct 7, 2024 · 3:34 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

We achieve competitive results compared to task-specific methods, e.g., DepthCrafter in video depth, LEAP-VO in camera pose estimation

1,571

Junyi Zhang · Jun 11, 2025 · 2:23 AM UTC

Junyi Zhang

@junyi42

11 Jun 2025

I will also be presenting VideoMimic at the Agents in Interaction workshop: Poster #182–#201 | June 12 (Thu), 11:45–12:15 | ExHall D @redstone_hong will also give a spotlight talk on VideoMimic on Thu — come check it out! More details ⬇️

Hongsuk Benjamin Choi

@redstone_hong

10 Jun 2025

Excited to present VideoMimic this week at #CVPR2025! 🎥🤖 📌 POETs Workshop "Embodied Humans" Spotlight Talk | June 12, Thu, -10:10 | Room 101B 📌 Agents in Interaction: From Humans to Robots Poster #182-#201 | June 12, Thu, -12:15 | ExHall D Come by and chat! #VideoMimic #Humanoids #Robotics

860

Junyi Zhang · Dec 5, 2024 · 7:56 AM UTC

Junyi Zhang

@junyi42

5 Dec 2024

This is a very simple, reasonable, and effective method to improve diffusion features! Nice gains over "telling left from right" and "tale of two features"!

Nick Stracke

@rmsnorm

4 Dec 2024

Replying to @rmsnorm

We show you can, with just 30 minutes of task-agnostic finetuning on a single GPU. 🤯 No noise. Better features. Better performance. Across many tasks. And no timestep searching headaches! 👇

1,035

Junyi Zhang · Apr 21, 2025 · 4:31 PM UTC

Junyi Zhang

@junyi42

21 Apr 2025

From static to dynamic, MonST3R reconstructs pointmaps at their own times. St4RTrack instead estimates both at the same moment—predicting how points in frame1 move to frame2 and reconstructing geometry of frame2. Same architecture, now for simultaneous tracking + reconstruction.

1,122

Junyi Zhang · Oct 3, 2023 · 5:17 AM UTC

Junyi Zhang

@junyi42

3 Oct 2023

🚀Excited about applying DM to heterogeneous data? Check out our #ICCV2023 work "LayoutDiffusion"! And it excels in graphic layout generation. 🗓️Presenting on Wed. (4th) at 2:30pm, Foyer Sud. Drop by and let's chat! Paper: arxiv.org/abs/2303.11589 Code: layoutdiffusion.github.io

798

Junyi Zhang · Jul 14, 2023 · 5:50 AM UTC

Junyi Zhang

@junyi42

14 Jul 2023

One paper accepted to #ICCV2023🥳 Hope to see you in Paris!!

946

Junyi Zhang · Oct 7, 2024 · 7:08 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

Replying to @Vinc3nt_Leroy

Thanks, Vincent! Big thanks to the DUSt3R team for providing a great foundation to build on!

706

Junyi Zhang · Oct 7, 2024 · 7:26 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

Replying to @JeromeRevaud

Thanks Jerome, DUSt3R work is amazing!

571

Junyi Zhang · Apr 21, 2025 · 4:35 PM UTC

Junyi Zhang

@junyi42

21 Apr 2025

We present a "more interactive" results on our webpage. Come and check it out! st4rtrack.github.io/page1

673

Junyi Zhang · Sep 27, 2023 · 7:23 AM UTC

Junyi Zhang

@junyi42

27 Sep 2023

Replying to @sainingxie @MokadyRon @bria_ai_

A reason seems to be given in Yang Song's ICLR21 work. 🤔 They term this as "Uniquely identifiable encoding" (encoding of the same input is only determined by the data distribution b/c the SDE doesn't rely on trainable params). They also provide an empirical example on CIFAR.

366

Junyi Zhang · Feb 15, 2024 · 9:01 PM UTC

Junyi Zhang

@junyi42

15 Feb 2024

🤯

Tim Brooks

@_tim_brooks

15 Feb 2024

Sora is our first video generation model - it can create HD videos up to 1 min long. AGI will be able to simulate the physical world, and Sora is a key step in that direction. thrilled to have worked on this with @billpeeb at @openai for the past year openai.com/sora

582

Junyi Zhang · Nov 20, 2023 · 5:48 PM UTC

Junyi Zhang

@junyi42

20 Nov 2023

So true 😂

Kosta Derpanis (sabbatical in Zurich)

@CSProfKGD

20 Nov 2023

Me after the #CVPR2024 paper deadline

732

Junyi Zhang · Apr 21, 2025 · 4:32 PM UTC

Junyi Zhang

@junyi42

21 Apr 2025

Such representation can be learned supervisedly from small-scale, synthetic 4D datasets. But to better generalize to real scenes, St4RTrack can also adapt to new videos *without any 4D labels*, using only 2D reprojection cues like trajectories & monocular depth.

682

Junyi Zhang · Feb 29, 2024 · 6:34 PM UTC

Junyi Zhang

@junyi42

29 Feb 2024

Fantastic work! It's also really great to see that fusing diffusion and DINOv2 features shines in other tasks! 😄

Niladri Dutt @niladridutt

28 Feb 2024

🧵I'm excited to share that our paper "Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features" has been accepted at @CVPR 2024! A big thank you to my co-authors. Project page- diff3f.github.io/ arXiv- arxiv.org/abs/2311.17024 #CVPR2024

736

Junyi Zhang · May 8, 2025 · 4:34 AM UTC

Junyi Zhang

@junyi42

8 May 2025

Replying to @ch3njus

We tried mimicking this sequence in simulation with RL. It does decent jumping off, but struggles with climbing up due to insufficient arm strength Unfortunately, we only have one G1 (without hands), so testing risky motions in real is quite limited for now 👀 @UnitreeRobotics

309

Junyi Zhang · Apr 21, 2025 · 4:36 PM UTC

Junyi Zhang

@junyi42

21 Apr 2025

Great collaboration with @HavenFeng (co-lead), @QianqianWang5, @yufei_ye, @pengcheng_147, @Michael_J_Black, @trevordarrell, @akanazawa 🥰

652

Junyi Zhang · Apr 24, 2025 · 1:43 AM UTC

Junyi Zhang

@junyi42

24 Apr 2025

I'll also be around the poster of DenseMatcher Friday afternoon at Hall 3+2B #569 with @hkz222! Check the poster @ju_yuanchen made 👇

Yuanchen Ju @ju_yuanchen

22 Apr 2025

#ICLR2025 Thrilled for our ICLR 2025 Spotlight: DenseMatcher🍌！📍 Hall 3 + Hall 2B #569, Fri 25 Apr, 3-5:30 AM EDT. Meet my awesome collaborators Junzhe, Junyi @junyi42 , Kaizhe @hkz222 & our advisor Huazhe @HarryXu12 to discuss! ☺️

947

Junyi Zhang · Mar 21, 2024 · 1:08 PM UTC

Junyi Zhang

@junyi42

21 Mar 2024

Current foundation model features (SD & DINO) deliver impressive SC results But their matching accuracy peaks at ~60% - a leap away from human level. What gaps remain?🤔 We've uncovered a key challenge: these feats struggle with geometric ambiguity, or "telling left from right"

320

Junyi Zhang · Apr 21, 2025 · 10:31 PM UTC

Junyi Zhang

@junyi42

21 Apr 2025

Check more insights about St4RTrack from Haven's thread!

Haven Feng

@HavenFeng

21 Apr 2025

When it comes to recovering a dynamic world, many researchers focus on extracting geometry and camera parameters—via SfM or SLAM—while others concentrate on motion estimation, whether through correspondences, optical flow, or point tracking. 🧵1/4

1,158

Junyi Zhang · Apr 21, 2025 · 4:33 PM UTC

Junyi Zhang

@junyi42

21 Apr 2025

More interestingly is our joint reconstruction and tracking results, even the fully feed-forward mode gives promising results ⬇️

484

Junyi Zhang · Oct 7, 2024 · 6:59 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

Replying to @vincesitzmann

Thanks for sharing, Vincent!! 🥰 I totally agree; data is definitely a key to reconstruction.

369

Junyi Zhang · Mar 21, 2024 · 1:09 PM UTC

Junyi Zhang

@junyi42

21 Mar 2024

We took a deep dive into "geometry-aware SC," by introducing a specialized geo-aware SC subset Our findings?📊A striking performance gap between our Geo. subset (dash bar) and the conventional Std. set (solid bar) in SOTA methods (Note these subset accounts ~50% of total kpts)

232

Junyi Zhang · Mar 21, 2024 · 1:30 PM UTC

Junyi Zhang

@junyi42

21 Mar 2024

More qualitative comparison w/ prior arts. Our method successfully establishes geometrically correct semantic correspondence even in cases of extreme view variation . Please see webpage for more results.

205

Junyi Zhang · Apr 21, 2025 · 4:33 PM UTC

Junyi Zhang

@junyi42

21 Apr 2025

To evaluate the method, we propose a benchmark, *WroldTrack*, for world coordinate tracking and dynamic 3d reconstruction

541

Junyi Zhang · Dec 6, 2024 · 6:01 AM UTC

Junyi Zhang

@junyi42

6 Dec 2024

This is really impressive!! Congrats on the great work @zhengqi_li!

Zhengqi Li @zhengqi_li

6 Dec 2024

Introducing MegaSaM! 🎥 Accurate, fast, & robust structure + camera estimation from casual monocular videos of dynamic scenes! MegaSaM outputs camera parameters and consistent video depth, scaling to long videos with unconstrained camera paths and complex scene dynamics!

857

Junyi Zhang · Jun 12, 2023 · 3:54 PM UTC

Junyi Zhang

@junyi42

12 Jun 2023

Huge thanks to the amazing collaborators! Charles Herrmann, @JunhwaHur, Luisa Polania, @jampani_varun, @DeqingSun, Ming-Hsuan Yang

329

Junyi Zhang · Apr 29, 2025 · 10:17 AM UTC

Junyi Zhang

@junyi42

29 Apr 2025

Replying to @amw7

Thanks Andrew! Very nice to meet you :-)

186

Junyi Zhang · May 8, 2025 · 4:36 AM UTC

Junyi Zhang

@junyi42

8 May 2025

Replying to @junyi42 @ch3njus @UnitreeRobotics

That being said, we believe there’s still huge potential for sim2real There’s plenty of juice left in the reconstruction pipeline that we haven’t fully squeezed yet — like handling more challenging motions and ego-view rendering. Excited about what’s ahead

270

Junyi Zhang · Mar 21, 2024 · 1:40 PM UTC

Junyi Zhang

@junyi42

21 Mar 2024

More on our webpage: - Proposed new large-scale, challenging SC benchmark for pretraining and evaluation - Found that large foundation features grasp the global pose of instances - Leveraged this info for improved correspondence - Details & analysis of our proposed Geo. subset..

481

Junyi Zhang · Mar 21, 2024 · 1:52 PM UTC

Junyi Zhang

@junyi42

21 Mar 2024

Huge thanks to my incredible collaborators for another round of interesting work😄: Charles Herrmann, @JunhwaHur, Eric Chen, @jampani_varun, @DeqingSun, and Ming-Hsuan Yang. Excited to delve deeper with this follow-up to our previous work at sd-complements-dino.github.i…!

492

Junyi Zhang · Oct 7, 2024 · 7:04 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

Replying to @hangg70

Thanks Hang! Totally agree on the point.

191

Junyi Zhang · Jun 13, 2025 · 3:56 AM UTC

Junyi Zhang

@junyi42

13 Jun 2025

Replying to @jianyuan_wang

Congrats Jianyuan!! Sorry to hear that you had to go through that..😅

371

Junyi Zhang · Apr 24, 2025 · 1:46 AM UTC

Junyi Zhang

@junyi42

24 Apr 2025

Replying to @junyi42 @hkz222 @ju_yuanchen

@HavenFeng and I are both around the conference. Feel free to talk to us if you are interested in our latest work, St4RTrack!

322

Junyi Zhang · Mar 21, 2024 · 1:16 PM UTC

Junyi Zhang

@junyi42

21 Mar 2024

Are these problems an innate failing of these features, or can they be alleviated through better post-processing? Yes, they can be! We developed a highly efficient post-processor 𝐟(·) that boosts the raw features with just 0.32% extra runtime

181

Junyi Zhang · Mar 21, 2024 · 1:29 PM UTC

Junyi Zhang

@junyi42

21 Mar 2024

Ours (above) A side-by-side comparison with the previous method (DIFT) shows: though it also generalizes well, but falls short on resolving geometric confusion (e.g., matching left/front leg to right/back leg) DIFT (below)

180

Junyi Zhang · Mar 21, 2024 · 1:24 PM UTC

Junyi Zhang

@junyi42

21 Mar 2024

We tried to make 𝐟(·) 𝑠𝑚𝑎𝑙𝑙 to best retain the raw feature information and ensure generalizability to OOD cases E.g., the processor is trained on real images yet generalizes to anime images; trained with keypoint annotation but extends to query points beyond supervision

167

Junyi Zhang · Jul 14, 2023 · 4:02 AM UTC

Junyi Zhang

@junyi42

14 Jul 2023

Replying to @ion_barrel

Thank you, time traveller 😂

827

Junyi Zhang · Feb 12, 2025 · 7:12 PM UTC

Junyi Zhang

@junyi42

12 Feb 2025

Replying to @Michael_J_Black

Thanks, Michael!!

226

Junyi Zhang · Mar 22, 2024 · 9:46 AM UTC

Junyi Zhang

@junyi42

22 Mar 2024

Replying to @ju_yuanchen

Thanks so much for sharing our work, Yuanchen!

160

Junyi Zhang · Oct 7, 2024 · 8:25 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

Replying to @ndsong95

Thanks for sharing, Chonghyuk!

138

Junyi Zhang · Jan 21, 2023 · 9:40 AM UTC

Junyi Zhang

@junyi42

21 Jan 2023

Reports of 2022

306

Junyi Zhang · May 1, 2025 · 1:35 PM UTC

Junyi Zhang

@junyi42

1 May 2025

Replying to @davidrmcall @the_legitamit

Congrats!!

125

Junyi Zhang · May 25, 2025 · 4:34 PM UTC

Junyi Zhang

@junyi42

25 May 2025

Replying to @ruilong_li @berkeley_ai @akanazawa @NVIDIAAI

Congrats, Ruilong!!

305

Junyi Zhang · Nov 6, 2023 · 7:40 AM UTC

Junyi Zhang

@junyi42

6 Nov 2023

Replying to @theo_gervet

Cool!! It seems that there's also a concurrent work with this idea: yanjieze.com/GNFactor/

141

Junyi Zhang · Oct 7, 2024 · 7:37 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

Replying to @C016SMITH @Vinc3nt_Leroy

Thanks, Chris! Currently, it only supports monocular video input since we treat different timestamps of a video as "multiview", but I think it is not challenging to adapt to multiple videos as input 🙂

105

Junyi Zhang · Oct 27, 2024 · 10:12 PM UTC

Junyi Zhang

@junyi42

27 Oct 2024

Replying to @Just_Me1313

Thanks for the feedback! There's a typo in the loading code. I just pushed a commit to the GitHub repo and it should be working now. :-)

Junyi Zhang · Oct 19, 2024 · 9:26 PM UTC

Junyi Zhang

@junyi42

19 Oct 2024

Replying to @PETEcemetery

Yes, the output .glb file contains both the point cloud and (optional) camera frustums.

133

Junyi Zhang · Oct 7, 2024 · 7:03 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

Replying to @janusch_patas

Thanks for sharing our work!!

Junyi Zhang · Oct 19, 2024 · 9:29 PM UTC

Junyi Zhang

@junyi42

19 Oct 2024

Replying to @AlxandreRufino

Thanks, Alexandre! We have released the code for inference, and the camera trajectory could be exported in a certain format (please refer to github.com/Junyi42/monst3r/i… for more details) 🙂

Can I export point clouds and camera values? · Issue #5 · Junyi42/monst3r

Can I export point clouds and camera values? Will I be able to use this data in other 3D tools?

github.com

387

Junyi Zhang · Apr 8, 2025 · 10:56 PM UTC

Junyi Zhang

@junyi42

8 Apr 2025

Replying to @Songwei_Ge @jbhuang0604 @akanazawa @tomgoldsteincs @reveimage

Congrats, Songwei!

288

Junyi Zhang · Oct 7, 2024 · 8:32 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

Replying to @ndsong95 @Nik__V__

Yes, we are currently limited by data annotation. We require ground truth depth, camera poses, and intrinsic for the training data, which limits the data we can use. Even the only real-world dataset we use, Waymo, is domain-specific to driving.

111

Junyi Zhang · Apr 26, 2025 · 12:53 PM UTC

Junyi Zhang

@junyi42

26 Apr 2025

Replying to @zhifan_zhu

Thanks for the question! Stereo4D is a great data contribution, and we believe training St4RTrack on it could further boost performance. We're excited to explore this in the future!

116

Junyi Zhang · Aug 6, 2022 · 2:14 AM UTC

Junyi Zhang

@junyi42

6 Aug 2022

Hi, new to twitter~ I'm looking for an AI MPhil/PhD position at 24fall, and welcome everyone to be friends🥰

Junyi Zhang · Oct 7, 2024 · 7:22 PM UTC

Junyi Zhang

@junyi42

7 Oct 2024

Replying to @ChuanxiaZ

Thanks for the kind words, Chuanxia!

Junyi Zhang · Aug 7, 2022 · 3:37 AM UTC

Junyi Zhang

@junyi42

7 Aug 2022

Replying to @ZeYanjie

呜呜想去加州晒太阳😭

Junyi Zhang · Jan 27, 2023 · 8:47 AM UTC

Junyi Zhang

@junyi42

27 Jan 2023

Replying to @LightQuantumhah

盲猜这学期的计算理论/密码学😇

401

Junyi Zhang · Oct 8, 2024 · 5:50 AM UTC

Junyi Zhang

@junyi42

8 Oct 2024

Replying to @HarryXu12

Thanks Huazhe! Will try to release the code soon 😃

228

Junyi Zhang · Apr 23, 2025 · 2:51 AM UTC

Junyi Zhang

@junyi42

23 Apr 2025

Replying to @baifeng_shi @HavenFeng

Thanks, Baifeng!!

139

Junyi Zhang · Sep 14, 2023 · 8:18 PM UTC

Junyi Zhang

@junyi42

14 Sep 2023

🤣🤣

Rylan Schaeffer @RylanSchaeffer

14 Sep 2023

Excited to announce my newest breakthrough project!! 🔥🔥 State-of-the-art results (100%!!) on widely used academic benchmarks (MMLU, GSM8K, HumanEval, OpenbookQA, ARC Challenge, etc.) 🔥🔥 1M param LLM trained on 100k tokens 🤯 How?? Introducing **phi-CTNL** 🧵👇 1/6

445

Junyi Zhang · Aug 30, 2022 · 4:24 PM UTC

Junyi Zhang

@junyi42

30 Aug 2022

Replying to @LightQuantumhah

政工还可以美美减训，真的很nice😇

Junyi Zhang · Jul 14, 2023 · 10:32 PM UTC

Junyi Zhang

@junyi42

14 Jul 2023

Replying to @ChiehHubertLin

Congratulations!