Benlin Liu · Aug 6, 2024 · 3:03 PM UTC

Benlin Liu · Aug 6, 2024 · 3:03 PM UTC

Benlin Liu

Benlin Liu

@LiuBenlin

6 Aug 2024

🌟Introducing 𝐂𝐨𝐚𝐫𝐬𝐞 𝐂𝐨𝐫𝐫𝐞𝐬𝐩𝐨𝐧𝐝𝐞𝐧𝐜𝐞 (coarse-correspondence.github…), a 𝘀𝗶𝗺𝗽𝗹𝗲, 𝗴𝗲𝗻𝗲𝗿𝗮𝗹, 𝗮𝗻𝗱 𝗲𝗳𝗳𝗲𝗰𝘁𝗶𝘃𝗲 visual prompting method that elicits multimodal LLMs’ 𝐮𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠 𝐨𝐟 𝟑𝐃 𝐬𝐩𝐚𝐜𝐞𝐭𝐢𝐦𝐞! We believe AI’s understanding of the world should also be a joint understanding of 3D space and time, achieving the Spatial Intelligence @drfeifei envisions. For the first time, we demonstrate that 𝐚 𝐠𝐞𝐧𝐞𝐫𝐚𝐥-𝐩𝐮𝐫𝐩𝐨𝐬𝐞 𝐌𝐋𝐋𝐌 𝐟𝐨𝐫 𝟐𝐃 𝐢𝐦𝐚𝐠𝐞𝐬 can also develop a strong understanding of 𝟑𝐃 𝐬𝐜𝐞𝐧𝐞𝐬 𝐚𝐧𝐝 𝐥𝐨𝐧𝐠 𝐯𝐢𝐝𝐞𝐨𝐬, achieving 𝐒𝐎𝐓𝐀 results without task-specific model design or fine-tuning. And all this stems from the traditional wisdom of computer vision: correspondence. Details below🧵: (1/n)

Aug 6, 2024 · 3:03 PM UTC