Our latest work shows that learning to colorize videos causes visual tracking to emerge automatically! Blog: ai.googleblog.com/2018/06/se… Paper: arxiv.org/abs/1806.09594 @alirezafathi @kevskibombom @sguada @abhi2610
10
137
441
Fantastic conditional GAN results by Isola et al phillipi.github.io/pix2pix/
6
236
401
The future is hard to anticipate! In our latest #CVPR2021 paper, we introduce a framework for learning *what* is predictable in the future. Rather than committing up front to categories to predict, our approach learns how to hedge the bet. hyperfuture.cs.columbia.edu
9
44
269
Finding Tiny Faces -- had to zoom in quite a bit to parse how cool the results are! arxiv.org/pdf/1612.04402.pdf
1
136
246
Learning unsupervised machine translation is easier if you open your eyes! Image distributions create transitive relations between languages. This creates incidental supervision for learning multilingual representations on 50 unpaired languages arxiv.org/pdf/2012.04631.pdf @Surisdi
4
43
236
Neural networks fooled by unusual poses arxiv.org/pdf/1811.11553.pdf
7
81
211
Learning Features by Watching Objects Move by Pathak et al arxiv.org/pdf/1612.06370.pdf
3
100
217
amazing generations of the video future! Red border means output, green is input. sites.google.com/a/umich.edu…
1
93
205
SoundNet: Learning natural sound representations with convnets and 2 million unlabeled videos. web.mit.edu/vondrick/soundne…
1
67
165
Recognizing objects and scenes from sound only. Turn on your speakers! More visualizations: projects.csail.mit.edu/sound…
1
94
162
Unsupervised Learning by Predicting Noise by Bojanowski and Joulin. Cool yet simple idea that works quite well!! arxiv.org/pdf/1704.05310.pdf
3
59
143
What causes adversarial examples? Latest #ECCV2020 paper from @ChengzhiM and Amogh shows that deep networks are vulnerable partly because they are trained on too few tasks. Just by increasing tasks, we strengthen robustness for each task individually. arxiv.org/pdf/2007.07236.pdf
4
35
141
Oops! Dave+Bo introduce a dataset of unconstrained videos showing unintentional action. We study self-supervised approaches for learning video representations of intentionality. #CVPR2020 Poster 93, Tue 10am PST Website: oops.cs.columbia.edu Paper: arxiv.org/abs/1911.11206
2
23
114
Learning from Unlabeled Video (#LUV❤️‍🔥) starts today at 1:50pm EDT / 10:50am PDT! sites.google.com/view/luv202… You will LUV the speaker lineup and the curated papers! 😍 featuring @pathak2206 @akanazawa @SongShuran and more #CVPR2021 #CVPR21 #CVPR
2
14
109
predicting the future, with semantic segmentations! by Neverova and Luc arxiv.org/pdf/1703.07684.pdf
1
41
95
Cross-Modal Scene Networks: learning aligned representations across several different modalities cmplaces.csail.mit.edu/
1
46
90
CVPR workshop on negative results! negative.vision
61
82
Our predictive model is hyperbolic, which naturally encodes hierarchical structure. When the model is most confident, it will predict at a concrete level of the hierarchy. But when not confident, the *mean* solution automatically selects a higher level!
2
13
76
Learning from Unlabeled Video Workshop -- starting now! First up: Andrea Vedaldi (Oxford) on Learning Representations and Geometry from Unlabelled Videos. sites.google.com/view/luv202…
2
13
77
Got many replies. I don't believe the problem has to do with neural nets. The problem is the paradigm of supervised classification and closed datasets. We need models that learn from an open world, with self-supervision, never stop learning, and transfer between tasks.
Neural networks fooled by unusual poses arxiv.org/pdf/1811.11553.pdf
5
16
66
See, hear, read: deep representations shared over 3 natural modalities. Units activate on objects in each modality. goo.gl/kNVJr4
1
22
63
With just a few hours of experimentation in the physical world, a robot can learn on its own to design and throw paper airplanes further than a person, and even learn to build robot grippers out of cheap paper. No foundation models. No simulation. No language.
Humans can design tools to solve various real-world tasks, and so should embodied agents. We introduce PaperBot, a framework for learning to create and utilize paper-based tools directly in the real world. paperbot.cs.columbia.edu/
2
57
6,477
Videos of the full workshop are now available on YouTube: sites.google.com/view/luv202…. Thanks everyone, especially the speakers, for a great workshop!
Learning from Unlabeled Video Workshop -- starting now! First up: Andrea Vedaldi (Oxford) on Learning Representations and Geometry from Unlabelled Videos. sites.google.com/view/luv202…
8
61
Our new paper (w/@Surisdi,Dave) shows Transformers can meta-learn a process for language acquisition from vision. At inference, the policy adapts to new words and generalizes better. #CVPR2020 Paper: arxiv.org/abs/1911.11237 Talk: Mon 11:40am PST mindsvsmachines.com
16
58
Sssshhh!! There is so much noise in cities today. Ruilin and Rundi introduce a new approach that removes ambient noise from audio, letting the speech come through loud and clear. Let's have a listen... 🔊Turn on your speakers! 🔗cs.columbia.edu/cg/listen_to…
3
7
56
I am so excited to be part of this dream team. We will be investigating the next generation of ML and predictive models for truly planetary scale problems. If you are passionate about cutting-edge ML coupled with societal impact, please apply to Columbia for various positions!
Hurricane Ida made one thing clear: we are not prepared for the extreme weather caused by #climatechange. A new climate modeling center is designed to improve climate projections and encourage societies to plan for the inevitable disruptions ahead. bit.ly/3hhXJil @NSF
1
54
great high-res image manipulation by interpolating in feature space -- simple, no GAN required (Upchurch et al) arxiv.org/pdf/1611.05507.pdf
1
13
54
Network visualization, dissection, and interpretability by David Bau and Bolei Zhou at MIT! netdissect.csail.mit.edu/ @zhoubolei
24
56
Learn about Learning from Unlabeled Videos at #CVPR2019, Sunday in Room E, 9:00am Fresh posters and keynotes: Antonio Torralba, Noah Snavely, Andrew Zisserman, Bill Freeman, Abhinav Gupta, Kristen Grauman sites.google.com/view/luv201…
1
10
55
I had a joke about homography, but it was too plane.
1
49
Announcing the Workshop on Learning from Unlabeled Video at CVPR 2019. Come for dynamite speakers, and stay for the abstracts! Abstract deadline is March 4. Topics include self-supervised learning, sound and vision, visual anticipation, active vision, etc sites.google.com/view/luv201…
10
47
Self-supervised learning is prediction, and unsupervised learning is compression (in my view)
Replying to @jmhessel
“Self-supervised” is a rebranding for “unsupervised” to avoid confusing people who ask Qs like “how can LMs be unsupervised if you give them the next token to predict”? I dislike rebranding, but I dislike even more arguing about whether LMs are unsupervised. So,🤷‍♂️?
5
1
47
The "deep inversion" quiz by Oxford: how well do you understand neural network visualizations? robots.ox.ac.uk/~vgg/researc…
2
20
45
Don’t want pi day to end? Come to the hyperbolic world! In hyperbolic space, pi has no upper bound. You can eat pie for the rest of the year.
Replying to @cvondrick
Our predictive model is hyperbolic, which naturally encodes hierarchical structure. When the model is most confident, it will predict at a concrete level of the hierarchy. But when not confident, the *mean* solution automatically selects a higher level!
4
42
learning to recognize objects with only a few examples -- exciting 'low data' paradigm arxiv.org/pdf/1606.02819.pdf
12
43
Turn any container into a smart container — all you need is noise!
How can we tell "what is where" inside a container, after dropping something into it? Can we generate visual scenes from sound? Excited to share our latest work: The Boombox: Visual Reconstruction from Acoustic Vibrations. (boombox.cs.columbia.edu)
2
1
42
Important warning of non-peer reviewed papers: public can lose trust in science and research if too much low-quality work is posted.
2
10
42
Replying to @jbhuang0604
Conclusion is where you accidentally tell reviewers how to reject your paper!
1
1
39
"Do Good Research" by Fredo Durand thecomputationalphotographer…
1
22
39
CVPR should be "Computer Vision, Prediction, and Robotics"
1
35
cool idea to interactively reconfigure pretrained CNNs in order to recognize unseen classes, by Krishnan and Ramanan arxiv.org/pdf/1612.04901.pdf
17
34
The computer vision group at Columbia is looking for a postdoctoral fellow. Come wrangle pixels with us in the big city. More info: cs.columbia.edu/~vondrick/po…
15
29
Excellent piece, but I disagree we should give up our datasets. To get commonsense and generalization, we should create rich & diverse multi-modal datasets that span huge number of tasks. Probably need new data collection means, eg interaction and self-supervision (not MTurk)
5
25
released 35 million video clips! stabilized, natural video. 1 year! fun dataset for generative video models goo.gl/3CnFKR
15
22
Learning camouflaged QR codes
A nice paper from our lab on learning visual codes, to appear at NIPS sites.skoltech.ru/compvision…
7
23
Replying to @rzhang88
I should see a doctor ASAP! AI is going to save my life!
1
23
Learning visual and auditory representations simultaneously from video!
My first paper at DeepMind: What can be learnt by looking at and listening to a large amount of unlabelled videos? arxiv.org/abs/1705.08168
2
21
Learning to find moving objects irrespective of camera motion, by Tokmakov et al. arxiv.org/pdf/1612.07217v1.p…
1
8
19
Predictive models on physical robots learn rich features about their surroundings -- they learn about obstacles and even the policy of other robots. Latest paper with @BoyuanChen1 and @hodlipson, out today!
Can a robot be empathetic? @MechCU Prof @hodlipson thinks so: his lab has created a robot that learns to visually predict how its partner robot will behave. engineering.columbia.edu/pre… @Columbia
3
19
Most predictive models operate in Euclidean space. However, when there is uncertainty or multiple modes, the optimal solution is to regress the mean, which often lacks any interpretation. Our idea: Let’s make the mean mean something!
1
2
17
Hyperbolic geometry for machine learning and computer vision is a young and rapidly growing area. We are not the first to work with this geometry, and we will not be the last! Code, models, data, visuals, and links to tutorials are on our project website: hyperfuture.cs.columbia.edu
1
16
what do we visualize when we visualize ConvNets? important question: arxiv.org/pdf/1606.04801.pdf
7
16
Photos with the smart phone removed ericpickersgill.com/Removed
5
15
"Person analysis using cheap and large-scale synthetic data" in Learning from Synthetic Humans by Varol et al arxiv.org/pdf/1701.01370.pdf
5
16
preprint for "Generating Videos with Scene Dynamics": adversarial nets for video generation & learning & prediction web.mit.edu/vondrick/tinyvid…
1
7
13
Low-power vision mode that produces lower-quality image data suitable only for computer vision -- by Buckler et al arxiv.org/pdf/1705.04352.pdf
2
14
Replying to @tetraduzione
We are optimistic we can offer reviewers a relatively low load, which we hope translates into high quality reviews.
2
15
659
Replying to @haldaume3
you can silently and randomly add questions that have a single, well-defined answer that you also know. then, discard all workers that fail those "quiz" questions
1
12
NIPS 2014: 3 reviews, 6k char rebut. 2015: 4 reviews, 5k char rebut. 2016: 6 reviews, 3k char rebut. 2017: 9 reviews, tweet rebuttal ?!
4
13
learns nice convolutional filters for raw waveforms, without ground truth labels
3
2
12
Last keynote: Alyosha Efros and Allan Jabri on learning space-time correspondences, starting in 5 min sites.google.com/view/luv202…
1
1
12
For example, a hidden unit automatically emerges for dogs. It activates on images of dogs, sentences about dogs, or sounds of barking
11
Dear twitter, how do you take notes and jot down ideas for research? Do you use an app, pen/paper, memory?
10
2
11
Saturday at #ICML2019!
Excited to announce our #ICML2019 Workshop on Self-Supervised Learning! Covering- Vision, NLP, Audio, Robotics, RL ... sites.google.com/view/self-s… Submissions now open - deadline April 25! Speakers: @ylecun, @chelseabfinn, Andrew Zisserman, Alexei Efros, Jacob Devlin, Abhinav Gupta
11
The main idea: Natural audio will contain intervals of silence, which we can leverage as incidental supervision for learning to denoise. By learning to first detect these pauses, we can estimate a profile for the noise, and suppress it throughout the audio.
1
10
Here’s an example. As the model observes more of the video, the future becomes more and more predictable. Our model makes increasingly specific forecasts of the future.
1
10
Andrew Zisserman on leveraging temporal coherence and sound to learn from video!
1
2
9
Fine-grained sound recognition, plus a fun dataset collection!
2
10
Learning to see in the dark: super cool results! piped.video/watch?v=qWKUFK7M…
2
9
Just by changing predictive models to work in hyperbolic space instead of Euclidean space, the model automatically learns to select the right level of abstraction under uncertainty!
2
10
Replying to @dimadamen
There’s also multiple modes, eg multimodal prediction
2
10
Great results from the Scene Parsing Challenge with Places Database csail.mit.edu/csail_computer…
1
9
Generative video models facing the physical world 👇 #CVPR2024
Recently released video generation models are amazing😍 How can we use them in robotics to learn generalizable visuomotor policies? Come find out in my talks at these 4 CVPR workshops next week, where I will talk about recent works in 3D, generative models, and robotics! RHOBIN (rhobin-challenge.github.io/s…) Holistic Video Understanding (holistic-video-understanding…) AI3DG (ai3dg.github.io/index.html) 3D Foundation Model (cvpr.thecvf.com/virtual/2024…) Shout out to all the amazing organizers!
9
2,517
Cool paper from Berkeley: learn 3D flow from unlabeled stereo videos
1
9
Congratulations Dídac!! @Surisdi
Didac Suris (@Surisdi), one of our PhD students, won a Microsoft Research Fellowship (@MSFTResearch)! Learn more about him and his PhD experience here - bit.ly/PhDDidacS
9
Videos are coming soon!!
1
8
While each language represents a bicycle with a different word, the underlying visual representation remains consistent. A bicycle has a similar appearance in the UK, France, Japan, and India. We leverage this natural property for translating unpaired languages.
1
1
8
Good bye X 👋. Join me on BlueSky! bsky.app/profile/cvondrick.b…
3
9
3,750
basic idea is: visual recognition networks teach networks for sound, enabling learning from tons of unlabeled video
1
2
7
Antonio Torralba on multi-modal learning and self-supervised learning
1
7
Video should be on YouTube next week. Thanks everyone for attending and great questions, and especially Yale Song for leading the behind the scenes!
7
To be fair, I’m a human who also needs instructions to solve a Rubik cube!
6
It learns how to translate individual words across 50 fifty languages... even without paired language supervision
1
5
Since hyperbolic space is continuous, the hierarchy is actually continuous as well! This lets us work with hierarchies of any depth. Here’s 3 levels deep.
1
1
6
The approach finds very interesting transitive paths between languages via vision, which we show below. When there is a strong path, the final score is high (top row), and it's low when the path is not aligned well (bottom row)
1
6
Starting now: Ivan Laptev from INRIA sites.google.com/view/luv202…
1
2
5
Poster happening now on Zoom! cvpr20.com/event/oops-predic…
Oops! Dave+Bo introduce a dataset of unconstrained videos showing unintentional action. We study self-supervised approaches for learning video representations of intentionality. #CVPR2020 Poster 93, Tue 10am PST Website: oops.cs.columbia.edu Paper: arxiv.org/abs/1911.11206
1
5
We can also translate sentences, not just individual words! Of course, it works best on concrete visual concepts
1
5
Replying to @CSProfKGD @alfcnz
I ended up using Screenflow, and I found it fantastic. It jointly records your screen, audio, and webcam. There is a simple UI to create different scenes.
1
5
We show pairwise performance between source and target languages. As you might expect, languages within the same family are easier to translate between. But our approach is language agnostic, and makes no assumptions on grammar or vocab. The full dataset is available online!
1
5
The predictions are initially near the origin of the space, which corresponds to predicting the “root” node of the hierarchy. But over time, the prediction moves closer to the boundary of the space, corresponding to more specific forecasts.
1
4