Former CEO and co-founder of @udiomusic. ex Google DeepMind

Excited to be launching udio at long last! 1/6
Introducing Udio, an app for music creation and sharing that allows you to generate amazing music in your favorite styles with intuitive and powerful text-prompting. 1/11
29
12
156
47,367
Definitely did not intend to create a standup comedy model!
Incredible things happening in json chat
17
20
178
66,955
The code for our paper, "Object-based attention for spatio-temporal reasoning" (arxiv.org/abs/2012.08508), is now available: github.com/deepmind/deepmind…. In addition to model code, we provide trained parameters so anyone can re-run our model.
CLEVRER introduced difficult reasoning tasks that previous deep neural nets couldn’t solve, leading to calls for a paradigm-shift to neuro-symbolic AI. This work shows that, in fact, an object-based transformer significantly outperforms NS models, 75% vs 46% on the hardest task.
1
17
110
From day 1, we wanted to ensure that our platform is useful for professional creators. Inpainting is a first step in this direction, more to come!
Today we're delighted to launch Audio Inpainting, an innovative feature that allows you to seamlessly edit and refine your audio tracks. With Audio Inpainting, you can select a portion of a track to re-generate based on the surrounding context. This makes it easy to edit single vocal lines, correct errors, or smooth over transitions, so you can create the perfect track. The interface is experimental, and will continue to be updated over the next few weeks. Inpainting is available for subscribers starting today (only on desktop). We're excited to see you make the best of it. 1/5
14
9
110
14,184
Definitely appreciate the support we received to scale up @udiomusic!
love watching the openai team rally to provide great support for a customer
4
6
83
25,793
CLEVRER introduced difficult reasoning tasks that previous deep neural nets couldn’t solve, leading to calls for a paradigm-shift to neuro-symbolic AI. This work shows that, in fact, an object-based transformer significantly outperforms NS models, 75% vs 46% on the hardest task.
Can neural networks learn to perform explanatory & counterfactual reasoning? Researchers find that an object-centric transformer substantially outperforms leading neuro-symbolic models on two reasoning tasks thought to be challenging for deep neural nets: bit.ly/3mmMLrl
7
12
69
Hats off to our engineers, who worked around the clock to make this possible 🫡
We're rolling out new features to help you create longer, more coherent tracks. First, extensions now use a context-window of up to two minutes, increased from 30 seconds. This means verse and chorus structures are more consistent. Check out the example below! (h/t user DrGoldenPants)
5
2
55
7,178
Some exciting new features, such as 2-minute model and following your favorite creators. Less flashy but just as useful is the ability to control the random seed.
Today we’re delighted to launch more new features, starting with a model capable of two-minute generations, udio-130. This makes it much easier to create tracks with long-term coherence and structure. Check out the track below for a taste of what’s possible using a single prompt. udio-130 will launch alongside our existing model. For now, two-minute generation is an experimental feature available at a discounted credit-rate to pro-subscribers only, but we will be rolling it out more broadly over the coming weeks. 1/4
3
2
31
3,026
Also really grateful to our support team at @googlecloud for working around the clock to provision more compute. Our launch wouldn’t have been possible without them!
Definitely appreciate the support we received to scale up @udiomusic!
2
1
26
7,103
Udio’s new audio prompting feature allows you to transform your solo piano playing into a full song with vocals, orchestra, …Featuring our cofounder @conormdurkan, who’s not only an amazing researcher but also a talented pianist!
Audio-prompting, live now on Udio. Show us how you're using it below 👇
4
2
26
1,535
Wagnerian operas coming soon!
Replying to @udiomusic
Next, you can now extend tracks up to a maximum of 15 minutes. This is ideal for creating longer mixes, ambient tracks, or prog-rock epics. udio.com/songs/27cf6708-09f2…
4
23
3,682
Excited that this is out! When Kazuya and I started training the first versions of this model last year, we had no idea that it would turn out this good, the progress that the team has made in the last year was incredible. Thanks @avdnoord for being a fantastic team lead!
Thrilled to share #Lyria, the world's most sophisticated AI music generation system. From just a text prompt Lyria produces compelling music & vocals. Also: building new Music AI tools for artists to amplify creativity in partnership w/YT & music industry deepmind.google/discover/blo…
2
18
5,405
Since we launched @udiomusic, people have been using it in creative ways. Some of my favorite uses of Udio: 1. Replying to Drake.
6
4
17
7,737
The amazing team behind udio: my fellow co-founders @conormdurkan, @charlietcnash, @yaroslav_ganin, @avincentsanchez, as well as our engineering team: @justinkchen, Anthony C., Bernhard F. 5/6
2
16
1,136
🤣
ai will probably most likely lead to the end of the world, but in the meantime there will be great stand-up comedy
1
16
1,657
Will has been giving us amazing feedback, and we look forward to continued collaboration with him to make this a top-tier tool for artists and producers!
The best tech on earth!!! And the company is really aiming to be an ally for creatives and artist… wow wow wow wow
16
1,512
Previously, systematic search only worked for problems where the dynamics model was known ahead of time and easy to simulate. MuZero extends it to learned models of the env, bringing the power of search to a much wider class of problems. Exciting work!
In 2016, AlphaGo was introduced. Two years later, its successor - AlphaZero - showed significant progress in Go, chess and shogi. Today in @Nature, our team describes MuZero, a significant step forward in the pursuit of general-purpose algorithms: go.nature.com/38qb2I8
1
2
14
The last four months have been an incredible journey: putting together an amazing team, starting a new codebase, late night debugging, and watching the product progress from infancy to its current state. I’m incredibly proud of what we have built and the team we’ve assembled.
1
12
3,140
It’s great seeing creators make amazing music on Udio and getting recognition from legends!
I wasn’t aware king but thank you for your contribution to history Y’all show this man some love 🙏🏾
2
10
2,960
Another one of our talented cofounders, the bass guitarist of the Foo Birds! @charlietcnash
Endless creativity through audio-prompting. Live now on Udio.
1
11
984
Moreover, the approach is flexible -- with minimal tweaks, the model also worked for CATER, an object-tracking dataset, getting 90% vs 70% top-5 accuracy for object tracking even when the camera constantly moves.
7
Replying to @FelixHill84
At some point the difference between "reasoning" and mere "pattern matching" is purely philosophical. As scientists we should focus on concrete tasks probing the abilities of neural nets, not abstract debates on whether a certain model can really “reason”.
2
9
Stay tuned for further updates! 4/6
1
8
899
Amazing work by @Aaditya6284 demonstrating how to adapt vision language models to different listeners *without* supervision!
Ever made an inside joke? Humans adapt their language to each person they talk to. We take exciting steps towards agents who can do the same in "Know your audience: specializing grounded language models with the game of Dixit" (arxiv.org/abs/2206.08349). Read on 🔎⏬
6
As well as part-time contributors and closed-beta testers who helped us push it over the finish line!
7
1,054
There is still a lot to do for us to achieve our vision of empowering ordinary people to make extraordinary music. We want to give people more control over their music, to iterate on tracks they’ve already created, to edit individual stems, and more. 3/6
1
7
835
Sander’s blogs are always a joy to read, full of insight but also accessible. Highly recommended for anyone who’s interested in diffusion models!
The noise schedule seems like a pretty important design choice for any diffusion model, but I have sometimes found this concept to be a greater source of confusion than insight😵‍💫 In this blog post, I try to explain why. sander.ai/2024/06/14/noise-s…
8
1,114
Perceiver IO is a scalable architecture that can handle diverse inputs and outputs with minimal domain knowledge. To make it easier for you to use Perceiver, we released our model code, training pipeline, and colabs to play around with our pre-trained models.
To tackle all the challenges we meet while solving intelligence, we need tools that are as adaptable as possible. Announcing the paper & code for Perceiver IO, an architecture that handles a wide range of data and tasks, all while scaling gracefully: dpmd.ai/perceiver-IO 1/4
1
1
4
Trump has been relentlessly punishing CA for daring to vote against him -- first by withholding wildfire aid, now by withholding healthcare aid. Disturbing to see this era of revenge politics!
Nothing like the “pro-life” party eliminating healthcare during a GLOBAL PANDEMIC. California will survive without this $$ for now -- but their frail, pathetic patriarchal system they are so desperate to protect won’t.
5
Replying to @VojtechKulhavy
We are working towards more advanced tooling!
1
1
3
368
Replying to @LumaLabsAI
Congrats, this looks super impressive!
3
1,377
Maybe a difference is: with symbolic, you're designing the knowledge nodes yourself, whereas with distributed representations you design how the knowledge is to be acquired, which can be more flexible?
2
3
Replying to @MicahBerkley
Congrats, Micah, glad you're finding our tool helpful!
2
106
4. Using Udio with other tools to create music videos.
MAXX The Robot An adopted Robot named Max struggles to fit in with his new idealistic family in an Indiana suburb in the 1950s. Music by @udiomusic Images using @midjourney Video using in @runwayml #ai #aiart #generativeai #sitcom #1950s #midjourney
2
2
637
This feature is released for all subscribers, so you can try it out for yourself. Can’t wait to hear what y’all produce!
Today we’re announcing a set of updates, starting with a new experimental feature for paid subscribers, audio uploads. You can upload an audio clip of your choice, and extend this clip either forward or backward by 32 seconds using up to 2 minutes of context. Audio uploads greatly enrich your prompting vocabulary. You can use audio to set tempo and mood, and explore from there. Maybe you’ve got a great intro but don’t know where to go next, or a full mix that’s missing the perfect bridge–in both cases, Udio can provide inspiration. Check out the video below for some examples (we’ve had a lot of fun with this).
1
3
436
“Harvard’s generous financial aid program was cited as a factor by student and families in Zoom sessions as a reason for applying” — Indeed, really grateful for the generous aid I received while a student!
Under the early action program, Harvard College has accepted 747 students to the Class of #Harvard2025 hrvd.me/earlyaction20t
3
One way to ensure your preferred models remain SOTA -- reject all papers that go against your viewpoint.
Replying to @yaringal
We had a paper rejected with 8,7,6,6, with thorough reviews and lots of discussion. The one-sentence reason for rejection -- that training on data is the wrong way to instill knowledge in an algorithm -- feels like something out of AAAI 1993. openreview.net/forum?id=QHUU…
2
In fact, you can find the claim in the abstract of the paper :) "While these models [previous neural nets for VQA] thrive on the perception-based task [...]", from the CLEVRER paper, Yi et al (2020).
2
Thanks for the reference -- very nice work, and I enjoyed your talk about it at NeurIPS!
2
By fixing the seed, you can tweak the prompt and sampling hyperparameters of a sample that you like, and generate close variants. It’s another way of iterating on your song until you get exactly what you’re looking for.
2
229
3. Creating new interpretations of songs.
Have yall heard the original version of Pound Town???? 🤯🫣
1
2
664
Eg Stockfish plays very good chess, but it still plays on human judgements like "doubled pawns are bad". Whereas AlphaZero was sacking knights left and right based on its own novel positional judgement.
2
Replying to @minchoi @udiomusic
Thanks for highlighting some of the amazing songs people have made! I’m continually amazed by the talent of our creators
1
2
1,446
Congrats Jeff! So happy to see this work released!
2
153
No we did not, partly because we didn't have ground truth object segmentations to train the Mask-RCNN (these weren't provided in the initial upload of the dataset). I agree it would be interesting to try!
2
2
(we covered this in more detail in Appendix B of our paper, arxiv.org/pdf/2012.08508.pdf)
2
Right, it turns out that about half the CLEVRER counterfactual q's could be answered just by looking at what happened in the video, e.g. if the q asks "what happens if the red cube is removed" but the red cube did not collide with anything in the video.
1
2
Creating new datasets and tasks is super important for advancing the field. Really appreciate the great work your team has done and looking forward to the new dataset!
Cool work from DeepMind! ! SOTA results from the object-centric attention model indicate the bias on our own CLEVERER dataset. We are working on CLEVERER v2 with more rich physics and better-controlled bias. Stay Tuned!
2
Cool results on training on a really interesting dataset!
We train a self-supervised net "through the eyes" of one baby across 2 years of development. At #NeurIPS2020, Emin Orhan shows how high-level visual representations emerge. Paper & pre-trained net github.com/eminorhan/baby-vi… Poster Thurs Noon EST neurips.cc/virtual/2020/prot…
2
Replying to @recursus
Do you think this is due to a fundamental flaw of Bayesian epistemology, or is it a practical problem of how hard it is to estimate credences and utilities?
1
1
Indeed, and when we move to real world datasets, "what is an object" depends on "what is the task".
1
1
Congrats! Amazing to see this work released!
1
1
157
Re generalization, the symbolic part of the NS model is custom designed for CLEVRER questions. Applying to a new task would require at minimum re-designing the symbolic module along with re-training. Here, just retraining obtains SOTA results on CATER.
1
2. Audio beyond music: standup comedy, sports commentary, nature sounds, ....
Tried to see what I could get Udio to generate beyond music. It can do comedy, speeches, npc dialogue, sports analysis, commercials, radio broadcasts, asmr, nature sounds, etc. It’s basically an AI audio engine. Pretty wild. Watch for examples of each.
2
1
355
MAGA: Make AI Great Again
1
So... basically another lockdown? The flip flopping is pretty confusing
From Sunday 20 December, some areas in England will enter Tier 4: Stay At Home. [Tap to expand the poster]
1