Professor: CMU/@acmi_lab, Cofounder: @AbridgeHQ, Creator: @d2l_ai & approximatelycorrect.com, Relapsing 🎷

San Francisco, CA
deep learning research was the original vibe math
14
23
312
74,233
Many ML practitioners have no idea where to start getting a foothold in learning theory. I certainly didn't. "Understanding Machine Learning" by Shai Shalev-Shwartz & Shai Ben-David is an absolute treasure. Beautiful hardcover & free download @ cs.huji.ac.il/~shais/Underst…
26
362
1,764
I don’t know why so many people need to hear this, but if you actually want to do a PhD, don’t only apply to “top 10” schools. The cutthroat-ness of admissions falls off rapidly but the quality of faculty does not.
49
106
1,352
By default, logistic regression in scikit-learn runs w L2 regularization on and defaulting to magic number C=1.0. How many millions of ML/stats/data-mining papers have been written by authors who didn't report (& honestly didn't think they were) using regularization?
47
198
1,147
pro·fes·sor — noun ***the member of a research lab with the deepest intuitions about how LaTeX \vspace, page breaks, and figure placement work.***
21
112
1,090
Well here goes. Our ICML Debates paper is live: "Troubling Trends in Machine Learning Scholarship". If anyone needs me, I'll be in witness protection. 🙄 dropbox.com/s/ao7c090p8bg1hk…
29
354
928
So much fucking respect to @ylecun for being the one big-tech blue chip name to step up & champion open research, open source & the startup community. Unflinchingly. For remembering what made this moment possible in the first place & putting his name on the line to protect it ❤️
7
57
935
113,447
The field is already self-correcting. Good departments/labs are clearing their eyes, caring less about paper count, seeing through the noise. Don't worry so much about the ICML deadline. Slow down, relax, try to do work you're proud of, submit when it's ready.
8
134
926
Our ***free deep learning book*** (d2l.ai) now has a ***modern NLP chapter draft***, complete w. the omnipresent BERT & friends. All drafted via interactive Jupyter notebooks.
We have re-organized Chapter: NLP pretraining (d2l.ai/chapter_natural-langu…) and Chapter: NLP applications (d2l.ai/chapter_natural-langu…), and added sections of BERT (model, data, pretraining, fine-tuning, application) and natural language inference (data, model).
8
183
866
When an entire sub-field of ML has failed (thus far) to produce any method that could actually be used, there is no "state of the art", only the "state of the game".
13
45
415
142,356
Tomorrow, we launch "The Art of the Paper", a course on the principles, mechanics, & culture of scientific writing. While core to our work, this material seldom gets a formal treatment. I'm excited, nervous, & grateful to @mldcmu for the creative freedom. github.com/acmi-lab/cmu-1071…
4
142
833
Thrilled to release our greatest paper yet in Nature Machine Learning: ***Multimodal Multitask Multidomain Multi-Attention for Multilabel Classification with Multiple Adversaries*** arxiv.org/pdf/12343212343212…
41
75
749
If you follow known fraudster & plagiarist @sirajraval, ***please take a moment to unfollow him***. Especially prominent researchers, journalists, and institutions. Don’t confer credibility on this scoundrel and thereby help to defraud his students.
38
168
744
Instead of receiving a boring paper diploma at the end of their PhDs, Google PhD graduates receive a priceless note handwritten by Jeff Dean on a napkin that says, “You have a PhD now.” medium.com/halting-problem/g…
15
83
746
Just landed in Switzerland, "home of the @Toblerone, @rogerfederer, & the LSTM".
8
45
720
Type I and type II errors are right up there with “bandits” as the most obnoxious, least descriptive technical terms that have ever been adopted.
29
34
699
The dominant practice of the applied machine learnist has shifted from ad-hoc feature hacking (2000s) to ad-hoc architecture hacking (2010s) to ad-hoc pre-training hacking (2020s).
20
60
699
If you want to go into CS academia, first be a lousy jazz musician for 10 years. You will never find the pay low (even in PhD), the amount to learn onerous (even in mature subfields), the feedback cruel (even from R2), or the the rejection rate high (even at NeurIPS/ICML/COLT).
16
37
659
***In the first thrilling installment, the Superheroes of Deep Learning find themselves up against an ordinary problem. Will their extraordinary abilities carry the day?*** New on Approximately Correct! @AndrewYNg @drfeifei approximatelycorrect.com/202…
22
131
657
Guys, I have a burning question. What's the best language for data science, Python or R?
265
71
624
For as long as I've been a computer scientist, I've been convinced that I am the worst possible mathematician one can be while still hoping to do meaningful ML research. This feeling hasn't changed at all as my abilities have fluctuated.
17
31
643
Physics is searching for a *theory of everything.* Deep learning is searching for a *theory of anything*
24
42
626
91,884
***ML Tragic Pattern*** (1) Propose algorithm that makes no sense. (2) Ignore obvious examples that illustrate the incoherence. (3) Implement it anyway (4) Fool reviewers @ a top conference. (4) Celebrate. (5) "Discover" method has problems. (6) Write "critical" paper.
16
68
635
If I ever shared how abusive & toxic my experience of the business technologies group at Tepper was, the group would cease to exist. It still rattles me every day. I couldn’t even share my joy when I escaped from that hellhole. Too traumatic to acknowledge I ever worked there.
26
41
639
***OpenAI Trains Language Model, Mass Hysteria Ensues*** New Post on Approximately Correct digesting the code release debate and media shitstorm. approximatelycorrect.com/201…
13
166
566
The double descent phenomenon is described in ~1000 papers & talks over past year. It's featured in at least 1 slide per talk @ last summer's Simons workshop on Foundations of DL. Why is this @OpenAI post getting so much attention as if it's a new discovery? Am I missing smtg?
A surprising deep learning mystery: Contrary to conventional wisdom, performance of unregularized CNNs, ResNets, and transformers is non-monotonic: improves, then gets worse, then improves again with increasing model size, data size, or training time. openai.com/blog/deep-double-…
24
68
583
BREAKING NEWS: Many of you are panicking about where to find more toilet paper. Turns out every general audience AI book from the last 5 years contains 250+ sheets. 🧻🚽
16
62
553
When I grow up, my dream is to become the i-th (3 < i < N) author of a major @Google paper for an intern-level contribution, then set $1B dollars on fire. Later, I plan to play Call of Duty in a villa bought w secondary sales while employees pay the cost as it all implodes.
12
33
562
157,555
10x researchers Professors, if you ever come across this rare breed of researchers, grab them. If you have a 10x researcher as part of your lab, you increase the odds of winning a Turing award significantly. OK, here is a tough question. How do you spot a 10x researcher?
16
84
553
Here’s a puzzling fact about the mainstream ML/AI community: a bunch of ostensibly creative dreamers were handed job security, massive amounts of cash, and unprecedented creative license—and the only thing most ppl can think to do with this freedom: train bigger networks. 🤣
How does a university based researcher keep feeling relevant in this fast changing and compute driven field? Some good discussion here ➝ teddit.net/comments/iezgsc
23
65
543
Anyone want to guess what deep learning hotshot drives this hot rod?
16
32
541
It’s hard to understate just how strange the state of “explainable AI” is today. For nearly all “local explanation” techniques, the only people who understand them would never use them, and the only people who use them do not understand them.
15
77
512
Overjoyed to announce that I have N papers accepted to the [name of international conference on X]! [no details about the work or coauthors follow] #conferencenameYYYY
11
33
604
There’s been a rash of deep learning & VC bozos saying things like “with enough data, correlation is causation”. This is incorrect, damaging to the discourse, & makes us (empirical ML researchers/practitioners & tech broadly) look like idiots to the science-literate world.
17
54
480
Curious fact abt the LLM era: it's strangely difficult to distinguish the methodological innovations of elite research labs from the efforts of random bloggers.
17
26
485
107,502
New life plan: (1) launch a fledgling AI startup; (2) somehow get publicly listed with no business plan & few routes to solvency; (3) convince every hedge fund to short the shit out of my stock; (4) befriend /u/DeepFuckingValue
10
29
481
I'm all for a "slow movement" in ML publishing (see my tweet suggesting 1-yr paper moratorium last wk). But is the right avatar for such a movement a man who ostensibly authored *92* papers last year? Let's discuss (& pls keep it constructive everyone). yoshuabengio.org/2020/02/26/…
35
58
486
Awesome ML blog run by @lilianweng has fantastic exposition, clear illustrations and covers a wide spectrum of topics in classic and modern ML: lilianweng.github.io/lil-log…
5
92
490
Overall the ML community at present does not have this problem...
Don't let perfect be the enemy of submitted.
12
17
470
Why is ML "taking over" so many adjacent fields, stealing the spotlight & overshadowing (often-comparable) work? The rapid pace of innovation? The talent? The tooling? I'm inclined to think the major (unsung?) factor might just be the universality of ***open-access publishing***.
29
52
466
I think most of us find Einstein's diaries upsetting, notably for indulging in ugly stereotypes when discussing his travels in Asia. At the same time, I'm conflicted about the decisions to publish dead people's private diaries. Thoughts? bbc.com/news/science-environ…
68
52
427
The precarious state of “interpretable deep learning” is that we should be far more scared upon hearing that a hospital or government deploys any such technique than upon hearing that they haven’t.
17
60
450
Just spent a full day shadowing radiologists reading mammography and diagnostic breast cancer scans (MR, ultrasound, etc). Huge opportunities to use ML to *help* but ***wow wow wow*** are some people underestimating the work radiologists do.
19
115
438
Maybe @openAI can cure #covid19 by feeding the prefix "the cure for coronavirus is ____" to GPT-2?
4
37
449
***What were the most interesting, scientifically insightful, and coherent papers that you read in 2019??*** Rules: 1) No restriction on methodology or aim (theoretical, experimental, & applications papers welcome). 2) Do not post your own papers.
32
85
453
In most ML textbooks (& intro-y stats books), distributions fall out the sky, w/o full derivations of their precise functional forms (e.g. why the gamma function appears in the Beta). Does anyone have a great intro prob/stats books that works everything up from first principles?
24
58
444
“If you haven't run a baseline of logistic regression, you are committing algorithmic malpractice.” –@scorbettdavies at @ucsantabarbara Responsible ML panel. YES.
5
80
433
Deep learning: exponential or sigmoid?
22
41
412
I've seen a lot of BERT tutorials, not a single one that you could read and thereafter reproduce anything close to BERT (without consulting source code). Is there a more complete tutorial out there I'm missing?
23
45
417
If @overleaf ever truly dies, I will quit research, move to Greece, and spend the rest of my days baking under the sun, reading sci-fi novels, & drinking away the pain.
22
23
411
Insisting that authors include a LIME/SHAP/IG/TCAV/GRADCAM saliency map in a paper shd be a disqualifying offense for reviewers. Including any such map without a powerful disclaimer (“this means nothing”) shd be a disqualifying offense for authors Extra true for ML in healthcare
28
52
404
People often ask if academia can keep up in research with industry given the superior resources there. Honestly, 99% of *interesting* academic work *even in DL* requires < 16 GPUs. This is maybe 1/100th the cost of an electron microscope. Resources are a bit over-emphasized
19
65
404
***Uncontroversial opinion***: Geoff Hinton & Jurgen Schmidhuber both made remarkably creative (and prescient) contributions to deep learning. The credit assignment process is broken and biased, with pockets of work overlooked, but not necessarily due to malice of any individual.
10
26
396
Hey everyone. Stoked to report that we've blown away recent NLP benchmarks with a new sentence embedding: "Efficient Recurrent Neural Inverse Embeddings" Idea's simple: iteratively embed & invert sentences gains thru meta-learning magic! arxiv.com/abs/xFgq14149b6r #NLP #deeplearning
19
56
406
Do pretraining’s big wins in NLP really involve “knowledge transfer”? Are upstream corpora even needed? *Not always!!* My students @kundan_official & @saurabh_garg67 show that self-pretraining (from scratch) often rivals “foundation model” performance. arxiv.org/abs/2209.14389
7
82
399
Advice to a young academic in 2016: get on Twitter. Advice to a young academic in 2021: get off Twitter.
13
20
387
Absolutely delighted to join the Operations Research group (joint w. MLD) at @teppercmu. With ML now applied in the wild, we must address both *predictions* & *decisions*. This nexus is where I see my research headed & I couldn't hope for stronger colleagues on either side.
12
13
389
Dear @overleaf, if you implement a feature that allows us to search for projects by collaborator names, my PhD student @dkaushik96 will tattoo "@overleaf" on his biceps and name his first child @overleaf.
10
10
382
Don't go into a career in research for the fast cash that may come & go w. industry interest. Go into research for the freedom to wake up every day & decide that the most important thing in (your) world is to read about topic X, then do it, and then annoy people about it.
3
68
384
Super excited to announce that “Dive Into Deep Learning” now officially supports PyTorch! d2l.ai
Dive into Deep Learning now supports @PyTorch. The first 8 chapters are ready with more on their way. Thanks to DSG IIT Roorkee @dsg_iitr, particularly @gollum_here who adapted the code into PyTorch. More at D2L.ai. @mli65 @smolix @zacharylipton @astonzhangAZ
9
53
381
Quarantine has made one thing clear: The real reason for conferences is to force weeklong breaks from meetings.
5
16
361
Anyone who tells you they are not jaw-on-the-floor surprised that you can have this sort of interaction with a chatbot today is lying.
12
8
366
69,251
Me: [rambles while introducing an idea, apologizes] @dkaushik96: "no, keep going. this is doing wonders for my impostor syndrome". 🤣🥳😅 — The Joys of PhD Advising, 2020
4
5
358
Ali Rahimi delivered a rare critical talk at #nips2017, likening modern ML to alchemy. Examples: brittleness of SGD to implementation changes & mystical claims like "batch norm works by reducing internal co-variate shift". Full talk now on YouTube: piped.video/watch?v=Qi1Yry33…
3
136
353
After a decade of carefully maintained social media silence, our dear visionary @geoffreyhinton breaks his twitter silence… to gripe about retail banking customer service.
Does HSBC UK have any ML people? HSBC will not comply with my written instructions to transfer money within the UK. Fraud detection says it must be authorized by high value transfers. High value transfers say they cannot authorize it. 7 hours on the phone so far. Help!
10
16
361
Can someone give me tenure already so that I can write an honest book about the 2010s AI boom?
17
13
346
I can't imagine being a woman in CS seeing this week's news. I feel shocked and devastated. Foremost for the victims. But also for all the other women who have to put their guard up higher, doubting the authenticity of the professional attention they receive. You deserve better.
3
17
337
Academia is not special because it's perfect or because it always lives up to ideal. It's easy to point to BS work and bad processes to trash the institution. But the 5% of work that is both magical & unlikely to happen anywhere else makes the entire enterprise worth it.
5
31
344
eight yrs ago i was a jr phd student over the moon to be sharing beers with the inventor of the lstm (@HochreiterSepp) at neurips in montreal w my first deep learning collaborator @davekale. hope you get to share a human moment w someone you admire this week.
3
13
281
89,135
Importance weighting is widely used but ***may have no effect on deep nets*** modulo choices regarding early stopping and weight decay. This "science-y" paper with my student Jonathon Byrd identifying this phenomenon was just accepted at #ICML2019 arxiv.org/abs/1812.03372 (1/2)
5
50
351
For all the criticism that I direct towards ML, I'm grateful to belong to a community open-minded enough to laugh at itself and flexible enough to change. Having experienced more conservative, closed-minded corners of academia, I know it's a luxury not to be taken for granted.
7
10
350
Business school professors get paid 1.5–2x CS. Asked why, university mumbles abt outside options. But CS profs have 2-5x outside earning power. Time's long nigh to re-calibrate & cut B-school pay.🔪
32
15
342
229,232
Interviewing Elon Musk and Mark Zuckerberg for expert opinions on AI is like interviewing the president of Switzerland abt particle physics
17
102
329
Notable: while LLMs are the singular force shaping the discipline & dominating the discourse in NLP, I saw absolutely nobody from the major LLM shops (@OpenAI, @AnthropicAI, @MosaicML) make an appearance at #acl2023.
19
26
337
158,477
Is the best Variational Autoencoder (VAE).... not actually a VAE at all?!? This paper is wild. Extremely simple idea: regularized deterministic AE + ex-post density estimation. Results somehow both unsurprising—intuitive!—& shocking—how did people not know this for years?!?
but that uses adversarial learning in one variant (and MMD in another). Let me shamelessly plug RAEs openreview.net/forum?id=S1g7… The catch is that ex-post density estimation on the latent space gets you better aggregate posterior estimation and hence sample quality
4
64
341
The perennial Q of "how to ensure the US continues to lead in AI?" is so easy to answer. Fully fund all PhDs in strong programs outright, grant visas to best students in the world, & let them stay afterwards. It would be the cheapest grand initiative in the history of government.
4
51
319
The reading list for pilot run of 10721: Philosophical Foundations of Machine Intelligence has been finalized and presenter notes from groups to date are up on the course GitHub for anyone interested in reading along: github.com/acmi-lab/cmu-1072…
13
63
330
Note to all ML authors: w k-fold cross validation, you *must* have an independent test set. Especially in healthcare, I've seen many papers this year with no test set. This qualifies as a desk reject.
16
78
329
One interesting (but incomplete) measure of the state of the deep learning bubble. While PyTorch has caught up with TensorFlow, aggregate interest in DL frameworks seems to be maybe half what it was at the peak. I can imagine many interpretations, share yours.
45
38
306
Idea of the week: ***One year moratorium on papers.*** For the entire community. A whole year of thinking without sprinting/hustling/spamming towards deadlines.
20
15
318
In latest ad campaign, @IBM says solution to homelessness is [wait for it...] AI. The virtuosos have hit a new peak for shameless cynical bullshit.
17
28
308
Excited to announce some personal news: After 3 yrs as science advisor to @AbridgeHQ, I’m jumping in as Chief Scientific Officer. Our team of NLP/ML researchers, designers, & engineers are tackling some of the biggest pain points facing doctors & patients. More to come…
23
10
321
55,776
Biden is very likely going to win, but going forward, can we stop acting like @NateSilver538 knows more than any other semi-smart blogger with lumpy prose and an undergraduate's knowledge of statistics?
28
15
313
Before the media blitz & retweet party get out of control, this idea exists, has been published, has a name, and a clearer justification. It is called ***Counterfactually-Augmented Data*** and here's the published paper (spotlight at #ICLR2020). arxiv.org/abs/1909.12434
7
57
312
Are those airplanes all in the air simultaneously? That's terrifying.
8
8
280
Help me Twitter. Best (reasonably technical) math/science podcasts?
33
40
310
Prompts are not Lipschitz. There are no “small” changes to prompts. Seemingly minor tweaks can yield shocking jolts in model behavior. Any change in a prompt-based method requires a complete rerun of evaluation, both automatic and human. For now, this is the way.
12
37
312
52,114
This fall I'm leading "CMU 10732: Robustness & Adaptation in Shifting Environments" focusing on causally structured shift, adversarial attacks, strategic classification, online adaptation, & feedback loops in recsys. Follow our readings & lecture notes: github.com/acmi-lab/cmu-1073…
6
54
306
"It's me Siraj, let's talk about quantum computing." => "It's **Maine** Siraj, let's talk about quantum **cipher**" github.com/paubric/python-si…
30
29
309
There's a handful of ML ideas that just *feel right*---perhaps due to evoking some aspect of human learning?---that keep recurring but never seem to have panned out convincingly. Here's two: (1) curriculum learning; (2) hierarchical reinforcement learning. (Dis)agree? Got others?
30
26
306
BTW, deep learning libraries are full of this—e.g., undeclared weight decay. All kinds of things lurk that seem like good ideas using the "works better in general" argument but actually create a sloppy environment where scientists don't really know what model they are using.
By default, logistic regression in scikit-learn runs w L2 regularization on and defaulting to magic number C=1.0. How many millions of ML/stats/data-mining papers have been written by authors who didn't report (& honestly didn't think they were) using regularization?
13
37
305
Why have a "standard route" at all? It seems that the greatest poxes facing the ML community are intellectual homogeneity & group-think. Having everyone take too-similar paths to PhD appears (to me) the very pathology implicated. Granted, as a strange-path freak, I'm biased.
13
25
289
Across all political & ideological commitments, there's a movement today away from civil discourse & towards signalling, dunking, & mindless pile-ons. It's aided (or caused) by the amplification dynamics encoded in our platforms, and it's ruining us all. We can & must do better.
15
23
285
Before you write the word "novel" 15 times in a paper, think: are you planning to work in ML for more longer than this hype cycle? Do you anticipate respectable (read *scientifically literate*) ppl reading your paper or is your audience only the VC you're presently defrauding?
13
48
299
Amazing how many in the deep learning community can simultaneously decry the closed-minded monoculture that spurned NN research in the 90's and celebrate an even narrower monoculture in the 2020's.
3
31
279
If twitter dies, where are 70% of the world's greatest scholars going to spend 5 hours per day posting pallid platitudes?
15
14
290
Steps to get hired as an "AI researcher" in silicon valley: 1. Download a deep learning library
16
49
277
Excited to join CMU & also wistful abt leaving UCSD. They took me in for PhD 4yrs ago when I was jazz musician w/o research experience, a life-changing opportunity that I fear might become rarer as field grows more competitive & PhD programs are swamped w experienced applicants.
7
8
283
Is there any known case of anyone accessing “harmful capabilities” of an LLM that didn’t consist of knowledge already freely available and clearly described in documents on the open web? Is the fear that we are basically just getting what we would already have if Google / Bing didn’t censor their web search results?
33
41
282
269,799
I live in Pennsylvania & so can you. PhD & MS applications to @mldcmu & @teppercmu open now! *Move to Pittsburgh, do great research, have a voice.* If you work on fairness, healthcare, distribution shift, feedback loops, or NLP (but more than BERT) consider applying to @acmi_lab
9
29
279