3D Computer Vision at Apple. I like 3D vision and training neural networks.

America summarized
7
31
589
95,702
Pretty fun paper, finetuning llama to produce blender code for synthetic renderings
6
87
604
64,032
> "we create a large-scale 3D dataset with over 1M images" > look inside > it's 4 slightly different rooms with 200k images each
8
9
237
22,090
Actually this is the future
4
20
199
25,226
The 3D vision community really hates the bitter lesson. Dust3r is what you get when you take the lesson seriously.
10
11
193
28,145
F.interpolate bilinear gang
11
7
139
15,370
Replying to @peterwildeford
Famously the tractor was created by predicting all recorded horse movements from the beginning of history, and then posttrained on horsing around satisfactorily.
2
101
6,432
👀 getting hyped 👀
Can't wait for Noah Snavely (@Jimantha)'s keynote talk: MegaScenes: Reconstructing All the World's Landmarks at our #USM3D workshop at @CVPR on June 17th!
2
12
98
18,176
DeDoDe is now on arxiv! 🥳 arxiv.org/abs/2308.08479
Say hi to DeDoDe 🎶! DeDoDe is a keypoint detector trained to detect 3D tracks. The reverse, DeDoDe, is a descriptor that matches the tracks. DeDoDe and DeDoDe are simple to train, and show great performance 📈 Code: github.com/parskatt/dedode
3
23
89
9,590
I dont understand why #CVPR2024 put all related papers in the same session. As an author it means I can't go to the posters I'm actually interested in.
5
6
93
17,244
github.com/jetd1/condense Nice method and numbers are seriously impressive
9
86
8,214
Replying to @frncswllms
2
15
69
4,978
Replying to @NastrondScourge
It's lucky for Japan that they do not have mountain ranges.
1
1
65
889
Rejected from #ECCV24 by AC with motivation that we can't add experiments (eval, requested by reviewers, not improvement of method) in the rebuttal. Pretty frustrating.
3
52
13,304
Replying to @eric_brachmann
The ugly template.
3
1
56
2,250
Finally managed to cite myself 100 times
2
50
2,985
Say hi to DeDoDe 🎶! DeDoDe is a keypoint detector trained to detect 3D tracks. The reverse, DeDoDe, is a descriptor that matches the tracks. DeDoDe and DeDoDe are simple to train, and show great performance 📈 Code: github.com/parskatt/dedode
7
12
50
15,185
At 10:30 we're presenting RoMa! Poster #25, see you there :D
2
4
48
3,675
I feel like the reason people do incremental work is that it's way easier to get accepted. Would be nice if we could tip the balance a little bit towards novelty (and novelty doesn't mean complicated).
4
2
51
5,516
Replying to @vikhyatk
"And you call it tf32 despite the fact that it's obviously 19 bits?"
2
46
2,552
Objaverse is the most important 3D vision paper in last 5 years, if google made street view data accessible that would be the most important in 10 years.
4
3
40
7,835
DL3DV looks good :) @LuLing26466911 thanks for making the colmap caches available!
1
5
42
5,929
And today I'm presenting this work at #CVPR2025! 🗓️ Date: 16:00-18:00, Fri, Jun 13 (Today) 📍Place: Poster #115 in Session 2 (ExHall D) 💻 Code: github.com/ericssonresearch/…
ColabSfM: Collaborative Structure-from-Motion by Point Cloud Registration @Parskatt , André Mateus, Alberto Jaenal tl;dr: in title, learning to register SfM point clouds. arxiv.org/abs/2503.17093
2
43
3,328
Image matching is pretty great to work on because there's like 5 people working on it, so any idea you come up with is probably new.
6
40
9,196
Yushan's work GMSF: Global Matching Scene Flow is accepted to NeurIPS 2023!🥳 We propose a simple but powerful approach to scene flow estimation through global matching that achieves state-of-the-art performance. Paper: arxiv.org/abs/2305.17432 Code: github.com/ZhangYushan3/GMSF
9
39
3,602
DeDoDe v2 coming soon with some improvements to the detector! Colab with @BokmanGeorg and @zhenjun_zhao
3
7
38
5,793
Replying to @gabriberton
Counterpoint: Even without further significant breakthroughs in capabilities, there will be significant value in getting an edge in efficiency/perf due to the size of the market.
1
38
6,543
No better feeling than dataset providing pose as 4x4 matrix in documented coordinate system.
3
2
35
2,177
First version of tiny roma available here: github.com/Parskatt/roma
7
5
37
6,158
DeDoDe now in kornia 😀
0.7.2 is out! - Added DeDoDe features (thanks @Parskatt ) - LightGlue models, available nowhere else - DeDoDe (B/G), KeyNet-HardNet - KMeans implementation - New augmentations: RandomGaussianIllumination, RandomLinearIllumination, RandomLinearCorner 1/2 github.com/kornia/kornia/rel…
1
4
33
3,485
At least in Sweden it can also mean that the place is too local, only regulars go there and rate the place highly. Typical example: local pizzeria.
33
1,604
There are seriously undervalued parts of 3D vision. Top among these are auto-calibration.
2
2
33
3,678
Trying out, let's see.
2
1
29
3,354
Never get tired of looking at depth maps.
2
36
1,785
RoMa can do MVS well :)
3
2
31
4,015
Checkout my "50% done" PhD seminar, where I talk about my recent works in 3D Reconstruction using Neural Networks! There are new things in there! Watch to the end! If you have questions, ask them in this thread :) Link: nextcloud.liu.se/s/fqj99km9S…
2
2
28
5,434
When I do nothing and end up as last author.
3
1
28
3,467
The real problem with COLMAP is that it's impossible to ctrl+F stuff. There's always 3 layers of abstraction for everything. Makes it so hard when you don't have 5 years of experience with it.
2
24
2,960
Choose your imports carefully lol. Maybe I should change name to RoMatch... Or they should change to RotMan
RoMa: an easy-to-to-use, stable and efficient library to deal with rotations and spatial transformations in PyTorch. Read all about this PyTorch Ecosystem Tool in our latest Medium post ⚡hubs.la/Q02hFsVf0
3
1
24
4,091
This is being slept on. MVS + Splats
RPBG: Towards Robust Neural Point-based Graphics in the Wild Qingtian Zhu, Zizhuang Wei, Zhongtian Zheng, Yifan Zhan, Zhuyu Yao, Jiawang Zhang, Kejian Wu, Yinqiang Zheng tl;dr: image restoration->reform the CNN-based neural renderer from NPBG arxiv.org/pdf/2405.05663
4
1
22
9,743
Replying to @andrew_n_carr
Big matrix make computer owow
23
2,686
Replying to @eliebakouch
Our data consists of different parts, and we use techniques to train models.
22
714
On Thursday I'll give a talk on image matching over in Prague. Hope to meet you there :)
1
4
20
2,884
>find bug in training code >fix implementation and rerun experiment >the performance is now worse
2
20
1,158
We need to make some simple 3D geometric benchmarks so 2D people can evaluate their new shiny backbones. Annoying to see the 15 benchmark eval where it's all 2D classification.
1
20
2,001
Ok, so to torch.compile DDP things, what you need to do is make some fake inputs and run forward once before you wrap in DDP. Trying to compile after DDP is completely broken, don't even try.
>1 year later and torch.compile still crashes :( Maybe my code is just cursed.
8
21
4,357
vkitti2 flow tracks the rotation of the wheels, might be somewhat challenging to match lol
3
20
2,194
This is probably because it converts images to be square, in the preprocessing step. Try it with a cropped image instead and see if it improves.
19
Replying to @cHHillee @the_aiju
* Thanks.
17
1,789
Can someone pls make better MVS than patchmatch, I'm begging you. Otherwise I'm making it this autumn.
6
18
4,041
Why do people store depth in float16 :( Give me some more bits, I'm begging you.
3
1
18
3,440
I'll be presenting this paper today at 13:00 at the Image Matching Workshop at #CVPR2025! 🗓️ Date: 13:00, Wed, Jun 11 (Today) 📍Room: 108 📖 Slides: github.com/Parskatt/storage/…
Less Biased Noise Scale Estimation for Threshold-Robust RANSAC @Parskatt tl;dr: estimate RANSAC th from data, almost optimal P.S."χ2 model holds pretty well for correspondences, which is lucky,as our life would otherwise be significantly more difficult" arxiv.org/abs/2503.13433
2
2
20
3,070
Replying to @jxmnop
No
1
17
1,984
If I just predict relative pose, I don't need matches.
If I get 10 Nature publications, I can get a tenure-track faculty job.
17
1,555
Coordinate system tier list based on how easy they are to make with your hand. Blender S-tier, COLMAP/OpenCV F-tier.
2
1
17
1,777
brb 15 min solving 3D vision
1
15
2,583
Using COLMAP partial reconstructions beyond 0 is like going to the second page of google search results.
1
14
830
Replying to @tenderizzation
To write autodiff from scratch you must first invent calculus.
15
1,523
COLMAP undistort randomly removed 2px from images. I removed the output and reran the exact same script. It's working now. I'm scared.
3
15
2,126
Roma presentation v2
3
14
1,184
Replying to @chrisoffner3d
Stop being so mean to it :(
1
15
1,223
Replying to @docmilanfar
Dust3r has some theories.
1
12
2,372
writing a new paper and found that I didn't do a single self-cite. not sure if I should be proud.
5
14
1,786
Got a solid 3 hours of sleep. Definitely ready for a long day of cvpr.
2
14
1,612
It do be like that
1
13
1,075
Replying to @AlexGodofsky
Hahaha omg it's reproducible
13
1,832
Replying to @MikeFHay @rakyll
Car goes on a nice highway, public transport consists of 3 different buses with 20 stops. Also you're lucky if they show up at all.
1
13
2,864
My mildy controversial take is that keypoint detectors aren't, and probably shouldn't be, scale invariant.
5
13
2,062
Replying to @francoisfleuret
Funny how these clips always end after about 5 seconds, wonder why.
3
14
445
While you were partying I studied the Fundamental Matrix
1
12
1,019
Replying to @pesarlin
I agree for senior PhDs who already has several papers published. I don't agree for new students. Your first paper wont be your best resesrch (usually), and you need practice through quantity.
2
13
522
Downside is you have to spend 1 week downloading megadepth.
1
13
775
After spending 5 hours on improving the loss
1
13
1,048
Wake up babe, new matching dataset just dropped
Our DL3DV-10K dataset paper has been accpeted by #CVPR2024🎉! It provides scene-level videos at 4K resolution, RGB-images, camera pose, and point coulds. The DL3DV-3K is currently available and more versions come soon. Feel free to check our project page: dl3dv-10k.github.io/DL3DV-10…
1
13
1,497
Replying to @ducha_aiki
Seen multiple labs seemingly very proud that they don't have any GPUs. And these are not poor labs.
2
13
2,458
Spring in Linköping
1
11
800
Replying to @refreshstream
Both nerfs and gsplats follow bitter lesson in the sense that they are simple and scale well with data.
1
13
1,758
Making the cutest little dense matcher
3
12
1,561
Quaternions should never be exposed outside of library, happy that pycolmap banned it. Coordinate systems are difficult enough as is.
4
11
2,359
Replying to @chrisoffner3d
Feature matching is a crucial component of structure-from-motion (SfM) that aims to match features. While traditional handcrafted methods~\cite[sift] are traditionally used, recently learning-based methods~\cite[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] are becoming increasingly po
12
244
Replying to @vikhyatk
Also shoutout to tf32 for ruining precision in my experiments due to deps enabling it on matmul by default. I really hate tf32.
1
12
1,790
Replying to @jon_barron
Just add enough hole papers until underflow
1
11
2,772
Replying to @docmilanfar
The method of combining rocks into a henge is simple, and while a simple method is not grounds for rejection, the method is not general. If the authors put some henges in Sudan and Thailand I might reconsider.
11
620