I seek to understand intelligence, agency and awareness, and build AI aligned with compassion, freedom, universal human empowerment, and progress science.

London, England
The Llama 3 paper is a must-read for anyone in AI and CS. It’s an absolutely accurate and authoritative take on what it takes to build a leading LLM, the tech behind ChatGPT, Gemini, Copilot, and others. The AI part might seem small in comparison to the gargantuan work on *data* and *scale engineering*. I hope professors in distributed systems, high performance computing, algorithms, databases, HCI, etc use it as an example of bleeding edge CS in their classes. So many exciting open problems! @UBC_CS @CompSciOxford @berkeley_ai @Cambridge_Eng @WitsUniversity @NSERC_CRSNG @NSF @ERC_Research @UKRI_News
Why do 16k GPU jobs fail? The Llama3 paper has many cool details -- but notably, has a huge infrastructure section that covers how we parallelize, keep things reliable, etc. We hit an overall 90% effective-training-time. ai.meta.com/research/publica…
11
281
1,787
160,310
Hmmm, from what I see my colleagues in AI at Google London work bloody long ours and are extremely committed. This guy once came to London and told us to abandon Torch and use TensorFlow. That set the field of AI back by at least 6 months.
44
98
2,421
415,446
I’ve walked through poor neighbourhoods in India, Africa and LatAm many times. Yet, I recently walked through one of the most depressing ones in terms of poverty, drug abuse, and sheer hopelessness: San Francisco. Giant tech AI companies promise to make the world a better place, but their backyard is in a deplorable human state. Self-driving Waymo Jaguar luxury cabs roam through a city full of homeless people on the sidewalks. If we can’t fix this, what hope do we have for the future? It’s time for ALL Big Tech to take this more seriously. Excuses like weather, etc, are rather mediocre explanations. Also, it’s beyond excuses. Maybe time for big tech and the many million dollar startups to demonstrate some responsibility, and illustrate the values, which they enforce on employees, by example. Do the right thing. Maybe a conference in SF with all big tech CEOs, government bodies, and a few people from the streets could be a good start. A commitment to solve the problem is the first step. That would restore hope. From Google images:
380
124
1,821
683,183
I believe I have written more papers than Alan Turing + John Nash! Numbers of papers alone is a wrong misleading metric. Please focus instead on writing good papers that advance the field, help the world, and that you’ll be proud of when you look back in 20 or 50 years.
Yes, @GoogleAI (well, all of @AlphabetINC) produces a lot of awesome AI research, but @Stanford + @MIT together produce more (judging by @NeurIPSConf papers!), and @Stanford + @MIT + @UCBerkeley + @CarnegieMellon produces more than @AlphabetINC + @Microsoft + @facebook
22
359
1,774
It’s time to say thank you and goodbye to @GoogleDeepMind. I had the immense fortune of working there for 10 years. They were undoubtedly the most exciting years in the history of AI, and I feel that I grew beyond all my expectations thanks to my uniquely smart, generous and helpful colleagues. DeepMind has been the epicentre of AI in terms of innovation, but it is also the place from which notable researchers left to found @OpenAI @MistralAI @xai @udiomusic @inflectionAI and more. In fact, ex-DeepMinders are at the heart of most successful AI companies, including @AnthropicAI @cohere and more. I believe that no organisation has been more influential in technological innovation since Xerox Parc. DeepMind has truly made history and created a new future. At DeepMind I never felt alone. My first manager @demishassabis was an enormous source of inspiration, scientific freedom, and caring support. I will never forget all the support I received from him and Helen King when I was going through very difficult personal loss ❤️ I really hope Demis, John and team get their much deserved Nobel Prize soon. They strongly deserve it. I am so proud and thankful for having been part of the ML team. You gave me so much happiness and made the dreams become reality. I learned so much from you. I also learned so much from my generous colleagues all over the organisation, from the AlphaCode team, the AlphaGo team, and so many more. Thank you 🙏 I also must thank the AVS and GenMedia teams. I treasured being able to serve you. You achieved so much in such a short time: Lyria, Imagen, Veo and more. I always looked up to most of you, and I can’t wait to see the amazing things you’ll produce over the next few months and years. You are truly exceptional, talented, hard working and generous. I was very lucky to have been part of your teams. Thank you 🙏 I am very sad, and admittedly even a bit tearful as I write this, but as I posted recently “In order to grow and to improve you have to be there a bit at the edge of uncertainty.” (Mallman). It is time for me to embrace a bit of discomfort and a new episode. Love you all DeepMinders! Good luck and thank you ❤️
64
31
1,453
129,475
Can AI researchers please tweet: I am against racism, sexism, bullying and cancelling, and I believe in improving diversity, equity and inclusion in our AI community. We need to hear your voices! The students and the public need to know what most believe.
86
327
1,355
RL is not all you need, nor attention nor Bayesianism nor free energy minimisation, nor an age of first person experience. Such statements are propaganda. You need thousands of people working hard on data pipelines, scaling infrastructure, HPC, apps with feedback to drive benchmarks and data, tons of research and engineering on generative models, data mixtures, ablations, RL/selftraining, etc etc and we will probably need lots of people working hard to figure out safety, causal world models, awareness, models that create abstractions comparable to infinity and zero and use these to predict the existence of things like black holes and suggest experiments to verify such hypothesis, or come up with novel engineering designs to generate energy more efficiently, robotics, etc etc. It takes thousands of people and many ideas. In the end some simple ideas might become obvious but such obviousness only happens in retrospect. Yes, there is a bitter lesson but if we had followed it, we’d still be doing linear regression with RL. Let’s not oversimplify, but rather honour the research and engineering of thousands of people. Also, people keep rewriting history. When our language understanding start up (darkbluelabs) was acquired by Google about 10 years ago, we joined DeepMind, where the AGI documents were all about concepts, RL, episodic memories and made it clear that there was no room for language. To be honest, back then such a position wasn’t so crazy. Now it seems silly, but only because of the benefit of hindsight. There’s no 1 or 10 heroes in the history of AI. There’s many 1000s of hard working students, profs, engineers, operations and support people, product folks, managers, even hedge funds among others. Let’s honour the whole community and not just ceos or the philosophers of Bayes, RL, deep learning, etc. I look forward to learning from the next generation and seeing what they will achieve. To them: Don’t buy the existing narratives blindly, innovate. Remember that just like mathematics, AI will advance one grave at the time.
30
193
1,395
114,333
I still remember that 2013 @NeurIPSConf party with Mark Zuckerberg. He had a bottle of water at that first Neurips corporate party. I thought it was out of character for a Neurips party - what was the matter with this kid? And why did he speak like that? We were so naive! … but it turns out they were even more naive that us. Had they been smart, they could have hired ALL of the Neurips scientists, perhaps minus a small British startup called DeepMind, which was sought after by Google, at that party. Instead, they hired only a small group - but that group included one of the greatest and most influential engineers of all time: @ylecun. Would I have said yes to a 500K offer? Hell yes!! As a (high-paid) Oxford prof I made 85K and was struggling to get a mortgage for my growing family. AI people until then didn’t do it for the money. Now we hear about people making 10 and 20 million per year - still nothing comparable to what the corporate executives make. That party changed everything. I’m happy to see Yann moving on to a new chapter. He’s so creative. I’m looking forward to seeing what he does next 🙂 When I taught my course at Oxford that year, explaining why neural networks were modular like Lego and how to use automatic differentiation (backprop) to get global consistency from local messages, it became super popular in YouTube. I thought my students and I were the first to teach this amazing generality with Torch. It turns out Yann had done it before - it just wasn’t as easy to find. Yann was not only a pioneer of convnets but also the software we all use to this very day. We owe him a lot, a lot more than most realise.
38
87
1,237
274,958
View point invariance is an important inductive bias in how we perceive objects - here tested to the limit by a smart artist.
5
313
1,070
There appears to be a mismatch between publishing criteria in AI conferences and "what actually works". It is easy to publish new mathematical constructs (e.g. new models, new layers, new modules, new losses), but as Apple's MM1 paper concludes: 1. Encoder Lesson: Image resolution has the highest impact, followed by model size and training data composition. 2. Vision-Language (VL) Connector Lesson: Number of visual tokens and image resolution matters most, while the type of VL connector has little effect. 3. Data Lesson 1: Interleaved data is instrumental for few-shot and text only performance, while captioning data lifts zero-shot performance. 4. Data Lesson 2: Text-only data helps with few-shot and text-only performance. 5. Data Lesson 3: Careful mixture of image and text data can yield optimal multimodal performance and retain strong text performance. 6. Data Lesson 4: Synthetic (caption) data helps with few-shot learning. I suspect it would be very hard to publish a paper that says "we made the images bigger and got better results". There is great value in careful ablations, scaling studies (as this paper shows when determining learning rates), data science, data engineering and engineering in general. Huge work goes into constructing the data pipelines. Engineering for a long time had a poor reputation in AI conferences - many papers were rejected with "this is just engineering! ". In the end, just engineering is working beautifully. There is room for scientific invention, but just engineering also allows for an incredible amount of innovation. arxiv.org/pdf/2403.09611.pdf
13
188
1,005
312,746
Mike Jordan defending ML engineering! Good engineering gave us the GPU convnets, the transformers, torch, numpy, etc. The popular diminishing meme “it’s-just-engineering” is silly, and holds us back. I ❤️ creative, rigorous, robust, safe engineering. flip.it/epDjxP
9
145
976
I’ve joined @Microsoft AI to advance the frontier of large scale multimodal AI research and to build products for people to achieve meaningful goals and dreams. The MAI team is small, but well resourced and ambitious. We are now looking for exceptional ICs, who like to ship. If you you’re interested in multimodal AI, both recognition and generation, love to collaborate and empower others, believe in diversity and inclusion, have a growth mindset, and want to impact the future of AI in a positive and profound way, please message me directly. I believe this is a rare and unique opportunity to join a new AI team that will shape the future. @black_in_ai @WiMLworkshop @_LXAI
62
46
955
108,657
Game over. Scale is essential to AI.
29
83
862
Someone’s opinion article. My opinion: It’s all about scale now! The Game is Over! It’s about making these models bigger, safer, compute efficient, faster at sampling, smarter memory, more modalities, INNOVATIVE DATA, on/offline, … 1/N thenextweb.com/news/deepmind…
103
207
837
Dear friends, this year I will not attend #NeurIPS2019. This is the last year before my littlest daughter starts going to school. It is important for dads to spend time with their little ones! I wish you a great conference.
7
12
842
This is by far the best non-technical Natural and Artificial Intelligence book anyone could read. This comprehensive, well-researched, crisply clear, sharply focused and illuminating book is a thing of beauty. It is the book I wish I had had when I started my AI career 30 years ago. The book tells the story of steering, emotions, reinforcement, world models, generative intelligence, counter factual thinking, planning, awareness, theory of mind, tool use, language, GPT4 and more. It is not only a history of intelligence but also a beacon for the future of AI. Thank you @maxsbennett for this jewel. Thanks @serkancabi for sharing it. abriefhistoryofintelligence.…
18
126
840
188,241
Beautiful explanation of transformer neural networks jalammar.github.io/illustrat…
4
212
737
I would like to hire exceptional engineers (code, math, science, games, video) . Essential: people who want to transform the world in positive ways by advancing AI. Email me at: JoinAITeam@microsoft.com Preference for generalists who can work with data, model ablations, inference and evals. Exceptional coding skills a must.
30
50
761
92,791
Last decade in AI was about solving big associative learning: Language models, image labelling, speech recognition, lipreading, Starcraft, etc. A triumph! The recipe was always the same: 1. Big net 2. Massive curated dataset 3. Many iterations. Building the dataset is underrated.
22
160
715
The OpenAI letters: lesswrong.com/posts/5jjk4CDn… Some of what is said here is absolutely shocking. The politics, hysteria, incompetence, power hunger, gaslighting, etc are beyond any HBO show. I was a leading researcher at DeepMind at the time reporting to Demis. Most of what is said about DeepMind in these letters is absolute rubbish. We were simply , scientists, figuring out intelligence, trying to figure out how to do good things with it. Anyone else could do the same and we loved competition. We opened our labs to all these people, especially Elon, and they abused our openness. I made that clear to Greg Brockman at ICLR where I also tried to patch up the newly created animosity among researchers, among colleagues, among friends. This is all the more reason for open AI. The people must decide, and not a bunch of billionaires playing with scientists. Ilya got it prophetically righ: we did all the training, ideas and code, but now our only power left is to protest in apps owned by the same billionaires. But protest we will. AI must be for the people, for all nation states.
15
93
722
88,160
This is such an important tweet for new researchers: A Turing award winner’s public admission that failure is ok. It’s through trying and failing (ie falsifying some hypotheses) that we make scientific progress. Thanks Geoff for setting a brilliant example.
I thought I had a very good idea about perceptual learning and accepted several invitations to give talks about it next week. But I have just discovered a fatal flaw in the idea, so I am cancelling all those talks. I apologize.
4
115
716
For anyone wanting to make their lectures freely available to everyone . It’s highly rewarding when people approach me at conferences and tell me they got into machine learning by watching my UBC and Oxford lectures. Every course should be online; translated to all languages.
12
100
707
Most RL for LLMs involves only 1 step of RL. It’s a contextual bandit problem and there’s no covariate shift because the state (question, instruction) is given. This has many implications, eg DAgger becomes SFT, and it is trivial to design Expectation Maximisation (EM) maximum likelihood solutions that do exactly the same as RL. Of course, RL and multiagent systems will be needed as the picture illustrates.
22
60
703
96,479
Our field lacks diversity. This is the biggest danger of AI. As we witnessed this week, it is not easy to tear the chains of history. Few of us are able to rise above our environments and see our biases. Fortunately colleagues like @timnitGebru have bravely helped us 1/
7
152
626
The only bitter lesson is that LLMs have succeeded beyond any expert expectations. Underpinning LLMs is the idea of scaling, which is too often misunderstood as more parameters. Scaling is about using massive compute effectively to maximise the throughput of data ingestion into the learning process to obtain more capable models. We are still far from hitting the limits in this. We are still compute hungry because there is a ton more we could achieve if only we had more compute, from experimental ablations to data acquisition and curation. Scaling is largely about data and evals. The models are now trained on almost all the web and equally large (but growing) self generated synthetic data. sifting through such vasts quantities of data (the whole of the human creation) requires formidable engineering and intelligent ideas. This is what differentiates most models. AI is finally in the hands of billions of users, and with it come billions of tasks - every reasonable user need. This scaling in tasks and evaluations is many orders of magnitude larger than pre-LLMs. Having the right architecture matters, but we know several alternatives could all work well, eg replacing attention in Transformers for RNNs and interleaving such layers with local layers. What matters is fine ablations to maximise hardware usage. This is the realm of sophisticated high-precision engineering. It encompasses semiconductor design, datacenter design, distributed systems, MFU, etc. There is fascinating work on flow matching, JEPA, sparser MoEs, etc, that is all consistent with scaling. I’m terrible at predictions, but in this we have stayed the course. There’s been pleasant surprises like the effectiveness of reasoning, which while allowing for less parameters, still demands even more compute. Sparser multimodal MoEs also will allow for better continual learning. This is an old idea, eg arxiv.org/pdf/1108.3298, which is finally being done at scale. Successful scaling is mostly about organising people into effective teams for research, development and production. They have to be teams of happy and ambitious people who put the team first. Yes, tech VCs and CEOs: work life balance matters to achieve prologued success, something I think @demishassabis did really well at @GoogleDeepMind and which I promote at @MicrosoftAI. Bitter lesson: it really is all about scaling and hard work by thousands of amazing people. Hardly bitter, but hopeful and inspiring.
You were never alone, Gary, though you were the first to bite the bullet, to fight the good fight, and to make the argument well, again and again, for the limitations of LLMs. I salute you for this good service!
39
73
682
195,515
We are hiring star research and data engineers to invent the future of AI. JoinAITeam@microsoft.com If you’re finishing your undergrad or PhD at Imperial, Cambridge, Oxford, UCL, Toronto, MIT, MILA, UBC, ETH, Stanford, Caltech, UCLA, Berkeley, CMU, UW, NYU, Princeton, Columbia, Harvard, Yale or any other top school in STEM, please apply too. I love working with energetic people, who are prepared to work on what is needed to shape AI, make it safe, make it brilliant, make it creative, and make it useful in math, science, healthcare, education, energy and environment.
Look forward to having @satyanadella & @sama on @BG2Pod tomorrow. The deal. The skeptics. The re-industrialization of America. Power. Chips. Models. Agents. AGI. Regulation. Jobs. And more… 🧐🚀🇺🇸
45
68
674
222,989
Some companies can turn the most motivated scientists and engineers into unproductive, complacent hamsters. They do this by introducing a large number of levels and process-heavy performance reviews several times a year. People become obsessed with their level, obsessed with comparing against the level of their peers, they choose not to solve hard problems because they rather do something easy and get promoted to the next level. Managers benefiting from this pretend these levels are real. Managers who’d rather focus on products and engineering, have to stop working for a few weeks to write performance reviews, mostly with LLMs these days. The whole thing is toxic and an aberration of real feedback, learning and motivation.
The way that Jensen Huang runs Nvidia is wild: 40 direct reports. No 1:1s. No formal planning cycles. And no status reports. In a recent interview, he went in-depth on his Leadership style. Every entrepreneur must understand why it works:
11
68
611
244,408
Let us please talk more about mental health in the AI community. I was shocked and reminded of this by the sad and tragic death of this young colleague with so much talent. Many of the people in our community are likely on the spectrum; ADHD, autism, Asperger’s and so on. This rich neural diversity is likely responsible for great progress in AI, but these people, including myself, are also very vulnerable. AI used to be a small community, where universities provided shelter. But, now, the stakes are very high. There is huge competition among AI corporations in the AI race leading to routine mergers and reorgs, which cause great uncertainty and disruption. Stressed executives apply pressure and pass the stress down to the ICs. Researchers no longer enjoy the freedom to publish at most corporations, which is a huge change to what they did before. I’m not judging whether this is good or bad, just that it is a huge change. Researchers get paid a lot. They are sometimes told by managers to go on the job market to ascertain their value before applying for promotion, they are forced to sign 6-month to 1-year non-competes and notice periods to be able to accept a deserved promotion. This appears to me as a modern form of feudalism. Put simply, people are treated just as any other resource, as stuff. The financial stakes are very high for everyone. Many AI scientists are now media stars. They enjoy huge media exposure, and thousands of followers in social media, but many crave more fame. The potential for huge negative or positive impact also raises the stakes. AI ICs often see themselves as game pieces in a game among nations and corporations that is fraught with uncertainty and power trips. It is hard to tell right from wrong because laws often lag behind. Working in AI is a privilege. I repeat, it is a huge privilege. Yet, when people suffer depression, endure micro-aggressions, or get suicidal thoughts, it doesn’t feel like any of the privileges matter. If you’re feeling any of this, please find a therapist. It may take a few tries, and it may take time. It is worth it. Take it from one of your colleagues who has benefited a lot from PTSD therapy. I believe it has made me a more productive researcher, a more effective collaborator, and someone who appreciates work-life balance and differences in working styles. Please use the help that exists proudly, you’re not alone - you are special and you are loved. Rest in peace, Suchir Balaji. Thank you for everything you gave us ❤️
OpenAI whistleblower Suchir Balaji, who accused the company of breaking copyright law, found dead in apparent suicide
19
76
650
189,880
Predicting the next word "only" is sufficient for language models to learn a large body of knowledge that enables then to code, answer questions, understand many topics, chat, and so on. This is clear to many researchers now, and there are nice tutorials on why this works by @ilyasut resorting to compression ( piped.video/watch?v=AKMuA_TV… ) and by @geoffreyhinton ( piped.video/watch?v=iHCeAotH… ). However, the emergence of types of understanding is not unique to language models. In arxiv.org/pdf/1804.06318.pdf by @notmisha and @brandondamos the authors trained models to predict the next few time stems of over a hundred robot hand sensors (Touch, Gyro, Accelerometer, Joint Info, Actuator Info, etc.). They ten found out that they could regress the shape of the thing the hand was touching from the activations of the neural networks using probes. That is, the model developed an internal representation of shapes even though it was simply used to predict "only" the next few senses. Awareness follows from simple predictions and interaction with the world.
9
124
641
133,084
Building a Deep Neural Network to play FIFA 18 codementor.io/deepgamingai/b…
6
182
578
What role can the AI community play in a world where bullies attack peaceful democratic countries 🇺🇦 and threaten the world? I’m really curious to hear from everyone.
125
48
540
For the first time in my life I can explain what the physics @NobelPrize is about! In fact, if you’d like to learn what is a Hopfield net and how it relates to NP hard satisfiability, Boltzmann machines, autoencoders, score matching, Maxwell demons, maximum likelihood, generative AI, quantum computing, unsupervised learning and neural networks, see these slides and video lectures from a course I taught at @ipam_ucla 2012 helper.ipam.ucla.edu/publica… piped.video/watch?v=XYEs7k… piped.video/watch?v=JlONAaoW… piped.video/watch?v=t9sXdA…
9
70
577
96,794
27,600 GPUs, 1/2 PB data, and a neural net with 220,000,000 weights. More please! arxiv.org/pdf/1909.11150.pdf
20
102
543
It is remarkable that anyone can now train a 124M parameter LLM in about real-time on a MacBook M3. So easy to experiment. This would have been the stuff of dreams when I was in school. I ❤️ training neural nets, but I really admire the people who build the hardware.
16
27
526
46,816
⁦Yann @ylecun⁩ is a visionary who advocated for SGD, GANs, convnets, contrastive losses, deep nonlinear models, autodiff modular software, etc when most thought he was joking. As a scientist, don’t just follow the crowd, but innovate, think, test. amp.timeinc.net/fortune/2019…
3
90
499
Replying to @airkatakana
I think you’re missing a lot of history. @ylecun also championed online learning (SGD) - yes it sounds crazy but there was a time when this wasn’t the majority view. Yann also spearhead the modular approach to NN training, which led to torch, PyTorch, etc. He championed energy based methods and Siamese nets. He was one of the first few to push for training nets with GPUs. So it wasn’t just back prop for convnets, but numerous contributions over the years as well as mentoring many impactful researchers. Yann is simply of the greatest engineers of our time, and when he speaks, I suggest you listen… or live to regret it as I have in the past.
12
19
503
158,564
I’m with @ylecun and @AndrewYNg on this. I feel it is more responsible to devote greater effort to solving today’s problems (e.g. climate, health, energy, poverty, safety, communication, bias and discrimination, education) than to cultist AI long term speculation.
26
54
479
148,364
“I think the brain isn’t concerned with squeezing a lot of knowledge into a few connections, it’s concerned with extracting knowledge quickly using lots of connections.” Geoff Hinton. wired.com/story/googles-ai-g…
4
134
469
Feeling pressure to work more than 5 days per week? Don’t. I’ve been super productive all my life and I never worked on the weekend unless I was excited about something. I don’t regret a single day of vacation. Quite the opposite. Life is too short. Don’t let CEO talk and corporate bullshit ruin your life and that of your loved ones. I make sure to work with bosses that care about what I deliver and not how hard I should be working. Between 17 and 21 I used to work 14 hours per day, 7 days per week. I was underpaid and got very little out of it. Working smarter has been more effective. Weekends are awesome 🤩
12
23
486
30,743
The poster has no affiliation because in Guatemala AI research is what committed researchers do at night on their own, after a day of work. Touching and inspiring ⁦@Khipu_AI
9
62
473
Dear @GoogDeepMind ers, First, congrats on the new impressive models. Every week one of you reaches out to me in despair to ask me how to escape your notice periods and noncompetes. Also asking me for a job because your manager has explained this is the way to get promoted, but I digress. Please don’t reach out to me. Rather reach out to each other. Your leads are responsible for this. Talk to them. @koraykv and @douglas_eck have both said they’re against it, so maybe start there. Above all don’t sign these contracts. No American corporation should have that much power, especially in Europe. It’s abuse of power, which does not justify any end.
20
42
458
86,666
Work life balance is a top priority. Yes, there was nearly a decade in my life when I worked 98 hours, but I did it out of need and aspiration, not because anyone forced me. During my Cambridge PhD I published more than anyone around me, but I never worked a single weekend. I focused, worked smart, and delivered. I’ve continued delivering at Berkeley, UBC, Cifar, Oxford, DeepMind, MAI, but I only work hard when I want to achieve something important. I love holidays and my social life. Only my wife is allowed to ask me to work harder 🥰 It’s about results, focus, attention to detail, vision, dreams, collaboration, and not bloody hours. Hours management is pathetic and easy to game and a guaranteed path to mediocrity. If you like my philosophy, we’re hiring engineers of all kinds - message linkedin.com/company/microso… The ‘9-9-6 Work Schedule’ Could Be Coming To Your Workplace Soon via @forbes forbes.com/sites/bryanrobins…
16
21
460
49,658
Our meta-learning approach is state-of-the-art in Text-2-Speech, and does it in 5mins instead of 4 hours. This shows that neural nets can work with few data, when we embrace many tasks, and work better! Now a poster 😊 openreview.net/forum?id=rkzj…
3
120
432
I recommend this paper with theoretical and algorithmic insights on metalearning to researchers interested in hierarchical Bayes, MAML, and Reptile. It addresses the idea of learning reusable fixed and adaptive modules across many tasks. ⁦⁦ arxiv.org/abs/1909.05557
106
434
Deep learning and TF, without a Ph.D. by Martin Görner. Brilliant teaching resource @DeepIndaba cloud.google.com/blog/big-da…
2
145
436
If you have 10K data instances, would you: 1. SFT an LLM with 10K data, or 2. Learn a reward with 5K, and RL the LLM on the remaining 5K with the learned reward 3. Other (explain)?
84
20
446
177,380
I'm agnostic and not a fan of religion, but I refuse to visit a country that introduces bans on Muslims and refugees. #nips2016 #nips2venues
23
116
412
At 17 from Caracas to Joburg, I stopped in Rio for 3 days. I bought a book. It changed me. It was my company in dark hardworking days selling beer for a living. At 21 I went to university dreaming of understanding the universe. Thanks for writing the book Stephen Hawking. RIP
1
55
434
Why do we label emotions as positive and negative? It’s unsatisfactory that such a key component of cognition is modelled with such a crass binary classifier. In AI we need to get more serious about this.
37
110
422
Where can we find other cool examples of continuous systems like this one being used to output discrete symbols? Could be potentially nice for neural networks. Thanks!
Geneva Drive: A mechanism that converts continuous motion into discrete motion. The name was derived from its usage in mechanical watches which were popularized in Geneva. This mechanism can also be found in movie projectors, banknote counting machines...
14
67
408
This will be a long thread. It represents my views solely. Many are puzzled by why I feel it possible to support both @JeffDean and @timnitGebru so I’d like to explain. I will start by saying that this in no way denies any current or past injustices. 1/n
12
33
425
I agree. This was a phenomenal paper. I’m hoping it will inspire researchers to probe further.
Looking back over the year, the one paper that gave me the best "aha" moment was... Reconciling Modern Machine Learning and the Bias-Variance Tradeoff: arxiv.org/abs/1812.11118 The "bias-variance" you knew was just the first piece of the story!
1
75
418
The @OpenAI paper is excellent. It is an achievement and it highlights the biggest AGI challenges (perception, long-horizon, exploration, motor control, compositionality, meta and continual learning, embodiment). What humans find obvious are the hardest things to solve in AGI 1/2
We've trained an AI system to solve the Rubik's Cube with a human-like robot hand. This is an unprecedented level of dexterity for a robot, and is hard even for humans to do. The system trains in an imperfect simulation and quickly adapts to reality: openai.com/blog/solving-rubi…
4
92
402
ICLR 2019 lessons thus far: The deep neural nets have to be BIGGER and they’re hungry for data, memory and compute. GANs, Res-blocks, LSTMs, convnets, & multiagent tricks are doing the job.
6
111
410
My slides for the #NeurIPS2018 Meta-Learning are now up. Big thanks to the organisers! metalearning.ml/2018/
2
115
399
It is beyond any doubt that over the next few years we will perfect the technology for automatically generating a video of anyone saying anything we type, with the right voice too. What implications do you think this will have? What are the applications? How do we mitigate risks?
42
171
388
What happens to a company is not only the result of how good and committed their tech people are, but it is greatly influenced by the business decisions, the leadership, and the operating environment. Work life balance is important whether in a company or a startup (speaking from experience in both). Would you want to miss an important event in your child’s life so that a decade later the boss for whom you worked so hard comes out and blames you going home in time for dinner as the reason why the company is doing poorly? This is not denying that work from home policies aren’t influential. However, it is only one of many factors that impact the ranking of a company in a leaderboard. From what I see, Google continues to be extremely pioneering and impactful. Its scientists and engineers are truly exceptional and worthy of admiration, and a bit more respect from previous leads.
7
10
392
30,280
For anyone interested in meta-learning / learning to learn / continual learning / robotics / imitation, I’ll be giving a talk covering these topics at the Turing Institute in London.
“The illiterate of the 21st century will not be those who cannot read and write, but those who cannot learn, unlearn, and relearn.” - Alvin Toffler We welcome @NandoDF, whose talk will focus on building tools that learn how to learn. MORE INFO: bit.ly/NandoTalk
7
63
381
The @OpenAI o1 models represent one of the smartest advances in AI in a long time. Having just joined @Microsoft AI, one of the things I really look forward to is being able to contribute to some of these fruitful ideas to advance OpenAI’s mission. The opportunity to work together with the many clever engineers and researchers at OpenAI and Microsoft and contribute to their projects is a huge privilege, and humbling 😅
We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math. openai.com/index/introducing…
7
14
366
78,044
One of the most important papers of the year.
🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping 97% OCR accuracy at <10×. 📄 Outperforms GOT-OCR2.0 & MinerU2.0 on OmniDocBench using fewer vision tokens. 🤝 The vLLM team is working with DeepSeek to bring official DeepSeek-OCR support into the next vLLM release — making multimodal inference even faster and easier to scale. 🔗 github.com/deepseek-ai/DeepS… #vLLM #DeepSeek #OCR #LLM #VisionAI #DeepLearning
12
25
383
72,075
Machines that can predict what their sensors (touch, cameras, keyboard, temperature, microphones, gyros, …) will perceive are already aware and have subjective experience. It’s all a matter of degree now. More sensors, data, compute, tasks will lead without any doubt to the “I think therefore I am” moment for computers, and we’re not ready for it yet. arxiv.org/pdf/1804.06318 share.google/kxx6WyqHpwPmo6Q…
52
60
380
174,554
It amazes me that so many (even tech) people still don't get the transformative power of generative AI. The best text to image and video models in existence today all use synthetically generated captions (see e.g. Dalle 3 paper). Human generated data is substantially inferior. Generated text, images, video, touch, sound, radar, graphs, etc will simply become the data for training new more powerful foundation models. Moreover, if one has finetuned (or prompted) foundation models to evaluate the generated data, then RL (aka self learning) is trivial to implement. Finally, if one can imagine what will happen, one can control. This is the basis of model predictive control (MPC). So, we have only started to witness the impact of Gen AI for iterative improvement and control. This is the AlphaGo story repeating itself, but now with all types of human tasks, knowledge and modalities.
Scaling Synthetic Data Creation with 1,000,000,000 Personas - Presents a collection of 1B diverse personas automatically curated from web data - Massive gains on MATH: 49.6 ->64.9 repo: github.com/tencent-ailab/per… abs: arxiv.org/abs/2406.20094
14
52
320
79,349
Replying to @ylecun
I partly disagree, Yann. First, I called on both big tech and government to be clear. Second, Tech corporations are partly responsible and cannot be excused. SF is where they operate at large, and they have huge influence on who gets elected and what laws are passed. They lobby and have a strong voice. Why in a city where so many are either in tech or unemployed, tech companies try to excuse themselves? Also, why do corporations advertise missions like “make the world a better place” and do nothing about the problems in their neighbourhood? Corporations are exploiting entities unless employees have a say. For example, California companies enforce outrageous 6 month non-competes for junior AI scientists and engineers in London even though they cannot do it their Bay Area home. The consequence of this behaviour for the future of AI is profound: The AIs in London will enforce non-competes, the AIs in some countries will be homophobic, and so on. There’s complex ethical issues here that we are ignoring, like the homeless in SF. Tech is influential enough that if they choose to do something about it, it will improve. It will also be better for them in the long run because it’s their home.
54
7
340
47,981
Many people work through the weekend, without time to announce it on Twitter. I spent my Sunday happily with my family and friends, supporting and sharing love. I feel as a result energised and I’m looking forward to a productive week. Having time off is wonderful for physical and mental health. I too spent yesterday learning because I enjoy it. I don’t even think of it as work and I can’t wait to apply it next week. Yet, I do feel for the millions who worked through the weekend because they had no other option. I look forward to a world where more people can have the weekend off and to do something meaningful.
4
11
367
38,894
So glad to see the value of scientific software frameworks being properly recognised.
The NumPy paper is out! nature.com/articles/s41586-0…
2
44
348
I agree with @DrJimFan. Life, with all its midblowing structure, is about creating order in a universe of increasing disorder, see e.g. newscientist.com/article/232… for an easy intro. Like a cell, a neural network during training takes energy to minimise disorder, that is to predict and generalise better. In fact we even call the loss negative entropy. Like life, the net is part of a bigger environment that gives it data and feedback. Like life, the process results in a lot of disorder for the universe (TPU and GPU heat). In summary we have all the ingredients for intelligence (an emergent property of life), including our understanding of physics. I’d be thankful if someone makes a crisper version of this argument. The only way a *finite sized* neural net can predict what will happen in any situation is by learning internal models that facilitate such predictions, including intuitive laws of physics. Given this intuition, I cannot find any reason to justify disagreeing with @DrJimFan. With more data of high quality, electricity, feedback (aka fine tuning, grounding), and parallel neural net models that can efficiently absorb data to reduce entropy, we will likely have machines that reason about physics better than humans, and hopefully teach us new things. Incidentally, we are the environment of the neural nets too, consuming energy to create order (e.g. increasing quality of datasets for neural net training). These are old ideas going back to Boltzmann and Schrodinger among others. They provide the theoretical foundations. Now, it’s about building the code and conducting the experiments, and doing so *responsibly* and *safely* because these are very powerful technologies.
I see some vocal objections: "Sora is not learning physics, it's just manipulating pixels in 2D". I respectfully disagree with this reductionist view. It's similar to saying "GPT-4 doesn't learn coding, it's just sampling strings". Well, what transformers do is just manipulating a sequence of integers (token IDs). What neural networks do is just manipulating floating numbers. That's not the right argument. Sora's soft physics simulation is an *emergent property* as you scale up text2video training massively. - GPT-4 must learn some form of syntax, semantics, and data structures internally in order to generate executable Python code. GPT-4 does not store Python syntax trees explicitly. - Very similarly, Sora must learn some *implicit* forms of text-to-3D, 3D transformations, ray-traced rendering, and physical rules in order to model the video pixels as accurately as possible. It has to learn concepts of a game engine to satisfy the objective. - If we don't consider interactions, UE5 is a (very sophisticated) process that generates video pixels. Sora is also a process that generates video pixels, but based on end-to-end transformers. They are on the same level of abstraction. - The difference is that UE5 is hand-crafted and precise, but Sora is purely learned through data and "intuitive". Will Sora replace game engine devs? Absolutely not. Its emergent physics understanding is fragile and far from perfect. It still heavily hallucinates things that are incompatible with our physical common sense. It does not yet have a good grasp of object interactions - see the uncanny mistake in the video below. Sora is the GPT-3 moment. Back in 2020, GPT-3 was a pretty bad model that required heavy prompt engineering and babysitting. But it was the first compelling demonstration of in-context learning as an emergent property. Don't fixate on the imperfections of GPT-3. Think about extrapolations to GPT-4 in the near future.
18
64
331
155,100
It is incredibly transformative when a tech leader like @JeffDean participates at events like @Khipu_AI and @DeepIndaba He gives hope, builds confidence, and helps create new opportunities and collaborative communities across the world. #inspiring #masakhane @GoogleAI
I just met the amazing @JeffDean from @GoogleAI at @Khipu_AI. He was so nice to hear about my project and to agree in taking a picture with me! Thank you so much! This moment is one of the bests in my life.
4
38
333
Diffusion as a neural net, a language model in jax, attention and transformers — some slides from my ⁦@Khipu_AI⁩ tutorial
10
52
340
42,575
Google Brain’s new super fast and highly accurate AI: the Mixture of Experts Layer. medium.com/@thoszymkowiak/go…
1
156
342
Two years in the making by a talented, collaborative, and fun team, and with enormous help and support from many others at @DeepMind. No better place to be! Congrats @scott_e_reed on this step.
Gato🐈a scalable generalist agent that uses a single transformer with exactly the same weights to play Atari, follow text instructions, caption images, chat with people, control a real robot arm, and more: dpmd.ai/Gato Paper: dpmd.ai/Gato-paper 1/
17
26
328
This type of intelligence always amazes me.
When scientists put slime mold over a map of Tokyo, with food used to represent urban areas, and after a day the mold created a network nearly identical to Tokyo's rail network: all this without any brain ow.ly/7CA730o4Sjk
6
64
321
It would help the discussion if everyone first 1. reads a causal inference book, eg oapen.org/download?type=docu…, 2. watches a deep learning course emphasising modularity, compositionality and automatic differentiation, 3. implements the CI book examples in eg @PyTorch
I am retweeting this reply, for it crystallizes my position in the latest conversation on the relationships between DL (deep learning) and CI (causal inference) with @tdietterich , @ylecun, @GaryMarcus, @rodneyabrooks and significant others. #Bookofwhy
3
68
324
Why don’t you compete with ChatGPT? Last time I checked Gemini app was an order of magnitude behind. More seriously, this arrogance and comparative disparaging is toxic and not needed in our community. Do the best work you can to help those with less privilege. That’s it.
Competing with ourselves is getting a bit boring
28
8
332
102,628
Thanks @osanseviero and @huggingface for inviting me to a wonderful AI dinner, where I had the pleasure of catching up with old friends, @sarahookr @neilzegh @laurentsifre @ylecun, meet new amazing people, and do one of the things I absolutely love: brainstorm about AI with people who are passionate about it and its impact. 🤗
11
14
313
109,001
What a amazing authoritative tutorial on deep reinforcement learning at #nips2016 - a must read set of slides! people.eecs.berkeley.edu/~pa…
123
309
#ICLR 2019 could happen in Cape Town, South Africa. It’s time for an ML conference to go to Africa. It’s the right thing to do, and we all know it.
8
57
307
Here is a great challenge for quadruped robotics — Super Cat Intelligence. 🐭 risky, but nonetheless good benchmark tasks
9
43
299
This is one of the most thought provoking and transforming books I have read recently. Solid research, heart-warming. Highly recommend it.
“This book had a profound impact on my day-to-day life.” Order #EmotionalAgility here: buff.ly/2c8uYnm
2
44
310
Neural Ordinary Differential Equations .... blog explains blog.acolyer.org/2019/01/09/…
83
303
I find it funny folks are focusing on the symbolic challenge. The big challenge is attaching that hand to a moving controllable robot arm, and preferably having two coordinated hands learning diverse behaviours by RL, from sensors, with low sample complexity and in a safe manner.
Since @OpenAI still has not changed misleading blog post about "solving the Rubik's cube", I attach detailed analysis, comparing what they say and imply with what they actually did. IMHO most would not be obvious to nonexperts. Please zoom in to read & judge for yourself.
12
47
298
The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) jalammar.github.io/illustrat…
2
72
297
I loved this research paper on Flow Matching, the most popular approach for video gen. TLDR: More data means harder to fit any specific data (say image), better generalisation, greater coverage of learned concepts. Pretraining: Use Billions of data to learn many things and none too well. Postraining: Use few data, eg to quickly overfit to a certain style. Too few or too many won’t work. Question for Twitter: Does this apply to text LLMs? What have you experienced? arxiv.org/abs/2506.03719
7
24
311
34,516
Today I felt very positive about the work I’ve been doing with students at UBC and Oxford, and with colleagues at DeepMind when (1) a complete stranger approached me at a Camden gym and said “thank you for your lectures” - a nice gesture that made my day, and
9
12
294
One of the most important deep learning papers of the year, thus far.
Excited about our new paper on relational reasoning, with @dnraposo @PeterWBattaglia and others at DeepMind. arxiv.org/abs/1706.01427
3
100
301
This is a superb and inspiring artificial intelligence talk. The best I’ve heard this year. Anyone interested in vision, control, robotics or managing AI projects should watch this. Well done @karpathy piped.video/g2R2T631x7k via @YouTube
70
290
I used a metaphor that upset some people. I welcomed the feedback, and I have deleted the tweet. I also apologise to anyone I might have offended. The rest of the tweet is about something very important that we scientists and engineers in AI need to talk about. California corporations are forcing existing employees in the UK and Europe to sign 6 month to 1 year non-competes. These contracts are not signed at the start, but after many years when the companies have disproportionate power over the employees. Imagine receiving a promotion because of merit and hard work, being happy about it because you’re starting a new family, but being told that you won’t get it unless you sign a 1 year non-compete. It is not easy to leave the job then. It’s hopeless for the scientist, engineer or any other worker. So you sign. You then cannot leave because you won’t be able to do AI research for a long time, and some of your promised money isn’t paid. It is emotionally abusive and terrible for scientific progress and technological development. The AI companies that enforce this in London and European capitals don’t do it in California. This puts Europe at a huge disadvantage too. It’s non-competitive. These are the companies claiming they are building the good AIs. Good? No, they are building AIs to exploit the laws of countries, created for other purposes, to exploit people. @GoogleDeepMind is particularly bad at this. They have been forcing researchers going up for promotion based on merit to sign 6 month non-competes, 1 year non solicits, and 6 month notice periods (garden leaves). This has made it clear that they don’t trust their own employees and will do whatever it takes to stifle their competition. Researchers leaving have been forced to comply. The ironic thing is that they can’t enforce this in their California home. The executives don’t believe in it. They are exploiting our laws to their benefit. Colleagues and friends (@icmlconf @NeurIPSConf @theinformation). Do NOT sign these contracts. They can’t make you do it. Without us engineers and AI people, their stock is worthless. Don’t let them abuse you. This applies to people joining my team, do not sign these restrictions on your freedom to do research and work. @GoogleDeepMind has wonderful people and very dear friends of mine who I look up to, but this practice of @Google should be illegal. It should be illegal for other corporations too because it stifles research in AI and progress. It is also pathetic. @GoogleDeepMind can do much better: Do the right thing and make us proud please. The European Union @vestager should do something about this. It should be illegal and companies should be punished if they choose to stifle competition in Europe.
22
32
286
77,408
If you’re advertising a machine learning or AI scholarship or job on Twitter, please consider announcing it to @QueerinAI @AiDisability @black_in_ai @Khipu_AI @DeepIndaba @_LXAI @WiMLworkshop @women_in_ai and other groups who care about diversity and inclusion. Thanks
1
51
273
This is EPIC!! It will go in history as one of the best #neurips talks of all times. ⁦@isbellHFh⁩ and colleagues 💜💙💚💛🧡❤️ on: Can’t Escape Hyperparameters and Latent Variables: Machine Learning as a Software Engineering Enterprise nips.cc/virtual/2020/public/…
2
35
273
Once upon a time at a #NeurIPS party. So much changed. Thanks @sirbayes for finding this.
6
2
285
58,499
After 55, one reflects on life. For me, mathematics was the most beautiful world I encountered.
19
6
284
25,515
Shaking the foundations: delusions in sequence models for interaction and control. I learned so much from Pedro Ortega in this thought-provocative AI project. Great way to spend time with a friend at a London pub. arxiv.org/pdf/2110.10819.pdf
5
31
273