Unsupervised 3D perception (object detection) w/ 2D vision-language distillation #ICCV2023
tl;dr: generate amodal 3D boxes and tracklets (for static and moving objects) + distill VLM features from images to point clouds. Works well for closed & open set arxiv.org/abs/2309.14491
Oct 23, 2023 · 3:12 PM UTC
16
74



