Pinned Tweet
I’m happy to share that I’m starting a new position as Senior Research Scientist at @nvidia! Looking forward to open science for speech full-duplex models :)
After 2 wonderful years, I left Meta this week. During this time, I worked on several projects related to speech and LLMs: - Built the first multi-channel audio foundation model with M-BEST-RQ (arxiv.org/abs/2409.11494) - Made ASR with SpeechLLMs faster (arxiv.org/abs/2409.08148) and more accurate (ieeexplore.ieee.org/document…) - Shipped the first production-ready full-duplex voice assistant (about.fb.com/news/2025/04/in…) - Improved Moshi’s reasoning capability with chain-of-thought (arxiv.org/abs/2510.07497) I am grateful to my managers for having my back on critical projects, and fortunate to have collaborated with several brilliant researchers and engineers during this time. As to what's next, I am still in NYC and continuing to do speech research. More on that later!
52
10
523
32,035
H1B lottery ❌ It was less than a 1 in 3 chance, but sucks anyway!
106
37
910
1,732,078
Any #SpeechProc person who hasn't been living under a rock for the last few years has definitely come across the "transducer" model (formerly known as the RNN-Transducer). They look like this: (1/n)
3
36
249
I feel like everyone that followed me after the H1B post will be supremely disappointed when I resume posting long threads about neural transducers 👀
15
226
54,908
Thanks all for your support and suggestions. Fortunately, I still have other options that I'll be pursuing with my employer. I have restricted replies to this post now so I can get back to doing research in peace :)
1
3
212
49,757
5 years, 4 months, and 26 days. Thank you, @JohnsHopkins!
23
2
211
10,924
Replying to @the_transit_guy
I am quite fond of taking the MARC train from Baltimore to DC for $9.
6
2
175
Thank you @jhuclsp and @JohnsHopkins for a memorable 5.5 years of my life! Excited to pursue new challenges at @AIatMeta 🎉
Congratulations to @rdesh26 (and adviser Sanjeev Khudanpur) on successfully defending his PhD thesis: Listening to Multi-Talker Conversations: Modular and End-to-End Perspectives. Next stop? @AIatMeta desh2608.github.io
22
4
158
12,352
📢📢 **Defending my PhD in a week** Date & Time: January 26, 2024, 9 to 11 AM EST Committee: Sanjeev Khudanpur, Dan Povey, Jinyu Li Dissertation Title: "Listening to multi-talker conversations: Modular and end-to-end perspectives" DM me for a Zoom link if interested 😀
15
7
157
14,963
ASR is speech-to-text. Today, let me tell you about "target-speaker ASR" (and our papers accepted at @ieeeICASSP). When we have several people talking, we may want to transcribe JUST one of them. E.g., to suppress background speech in noisy environments, etc. (1/n)
3
21
152
24,670
If you work on speech/NLP, you must have come across the quote: "Every time I fire a linguist, the performance of the speech recognizer goes up." This quote is attributed to Dr. Frederick Jelinek.
3
6
119
19,578
📢📢 I am thrilled to be selected as one of the inaugural AI2AI fellows for 2022-23 under the JHU+Amazon "Initiative for Interactive AI". 🎉 Eternally grateful to my advisor and collaborators! Congratulations to my fellow, um, fellows: ai2ai.engineering.jhu.edu/20…
18
8
120
I passed my GBO (qualifying exam) today and officially became a Ph.D. candidate! 🥳
16
117
Yesterday, the dean informed me that I have been selected as the latest recepient for the Fred Jelinek fellowship! I am extremely honored by this recognition, and I'm aware that it puts me in esteemed company. I will keep working hard to keep Jelinek's legacy alive!
19
5
112
7,365
1.5B Whisper model trained on 680k hours of speech gets 36.9% WER on AMI SDM. 34M Kaldi model trained on 100 hours of AMI train set gets 35.1%. Adaptation for multi-talker room audio conditions is very much an open problem.
1
3
105
After getting straight rejects for the last 2 years, I finally received some love from @INTERSPEECH2021. Congratulations and loads of gratitude to my co-authors! Can we fly to Brno already?! ✨
7
1
99
It's my annual reminder to myself that I got 12/12 rejections the first time I applied for PhD. You only need 1 person to believe in you! (For me that person was Dan Povey)
As PhD rejections start, just to normalize your expectations I went: 0 / 13 my first time applying 3 / 11 the second time 1 / 2 after dropping out of a physics PhD and switching to controls and honestly my life is so much better for those rejections
2
90
The first time I applied to PhD programs (in my senior year of undergrad), I got rejected everywhere. On Saturday, I got 2/2 papers rejected at Interspeech.
Academics: It happens to all of us, but we generally only project our triumphs and victories -- share a time you failed or got rejected. #AcademicChatter #AcademicTwitter
2
1
86
That's a fair point, but the same argument could be made for doing a PhD vs. getting a job straight out of undergrad. 5-6 years x ~50k per year = ~300k USD. Is it worth it? Depends on what you get out of the PhD (and not just what job you land after).
4
62
First paper accepted as a PhD student :) @asru2019 "Probing the information encoded in x-vectors"
7
5
73
📢 4 papers (3 first author, 1 other) accepted at IEEE SLT 2021 😄 Thanks to all my wonderful collaborators and reviewers! Will be putting up the papers (and code) on ArXiv in the next few days. If you are interested in diarization, separation, or ASR, do take a look :)
6
2
78
If you watch this space, you already know my love for the neural transducer. I skimmed through all 21 papers relating to transducers that were presented at #INTERSPEECH2023, and wrote a summary blog: desh2608.github.io/2023-08-2… Summary in 5 bullets:
5
18
77
4,386
📢 I will spend summer '22 interning with the AI Speech team at Meta (formerly Facebook), in Menlo Park, California 🌞
2
1
75
10 years ago, WFST-based methods were the norm for speech processing (think, Kaldi). Since then, end-to-end models have become quite the rage --- they are simple, do not require much domain expertise, and you can train a PyTorch model for a new task over a weekend. ⚡️ 1/n
4
10
69
10,012
📢 Our tutorial on "Training Efficient Transducers with Large Data using Open-source Tools" has been accepted at InterSpeech 2023 @ISCAInterspeech: interspeech2023.org/tutorial… Time for a short 🧵 1/
2
12
74
7,602
In 2024, I want to: - wrap up my PhD - finish reading the Wheel of Time - climb v5 bouldering problems - speak more French
2
73
13,660
Now that @WavLab has created an open-science alternative to Whisper (called OWSM), I hope researchers building/analyzing whisper-based systems switch to OWSM instead!
3
10
70
7,531
Busy morning at @ieeeICASSP today presenting posters on SSL for multi-talker ASR, and my thesis work for the Rising Stars session! Thanks for showing up and asking great questions :)
1
2
67
2,818
📢 Summer update 📢 I will be interning with the awesome #speech people at Microsoft. I'll work on cool transducer-based streaming models for multi-speaker ASR. P.S.: Please send me your fav neural transducer paper recommendations 😀
4
65
10 days to go for our #interspeech2023 tutorial on next-gen Kaldi!
3
4
63
3,426
My friend Vipul invited me on his podcast to to talk about speech, PhD, and more!
🎙️ Ep 12 - Dive into the world of Speech Recognition with @rdesh26 #TheDistributedFabricPod We're delving into: - Automatic Speech Recognition 🗣️ - Self-Supervised Learning 🧠 - Navigating life as a PhD student 📚 - And much more awesomeness! 🎧 piped.video/watch?v=_uUj3BNO…
2
7
57
7,644
🚿 thoughts: Training an NN is like training in the gym. Initial gains are high and then slowly plateau; auxiliary objectives such as diets are useful; you can converge faster if you start with a pre-trained body; different architectures have different scaling laws (genetics).
10
3
60
7,421
Replying to @abhish_eksharma
Possibly
6
51
59,571
Thanks @ISCAInterspeech for the acknowledgment :)
2
1
57
3,574
So much work has happened in E2E ASR in the last decade. Will spend my weekend with these awesome review papers: 1. arxiv.org/abs/2111.01690 by Jinyu Li 2. arxiv.org/abs/2303.03329 by Rohit Prabhavalkar, Takaaki Hori, Tara Sainath, Ralf Schluter & @shinjiw_at_cmu
4
9
54
5,504
F1 visa renewal process. Time spent in: - filling application: ~ 1 hour - trying to book interview slot: > 2 weeks - standing in line at embassy: ~ 2 hours - interviewing: < 1 minute
4
1
52
I skipped ASRU to attend my friend's wedding. Great decision!
1
1
53
9,994
Replying to @math_rachel
I don't know much about images, but anyone who thinks speech is a solved problem is welcome to participate in the upcoming Chime-6 challenge :-)
3
2
50
"Everyone wants to do the model work ⚙️, not the data work 🗃️." Throughout my PhD, I mostly published modeling papers. These are the papers that identify a problem, propose a modeling solution, and show results on standard benchmarks. 1/n
1
1
47
8,748
I am convinced that US immigrant brains are 80% useful stuff and 20% random visa-related information.
1
4
51
5,584
I attended @ieeeICASSP last week, and here are my 3 main take-aways: desh2608.github.io/2021-06-1… 1. Self-training and contrastive learning are here to stay 2. Transducer models + T-S learning = streaming ASR 3. Speaker diarization is wide-open (clustering, EEND, separation ...)
1
6
50
It seems several groups have recently been looking at extending/generalizing ASR objectives. Baidu proposed W-CTC (openreview.net/forum?id=0RqD…) which extends CTC for training with data that contains missing labels on the ends.
2
13
46
Academia, as in life, sometimes brings you bittersweet days. On the same day that I gave a talk at the @jhuclsp seminar for the first time, I also got a paper rejected at #icassp2021. Nevertheless, I celebrated both with 🍷 at the end of the day :)
1
1
48
2) 🥳 I have been selected for ✨ICASSP Rising Stars in Signal Processing✨ Please join us on June 9 in the poster session where I will talk about my thesis work.
7
4
47
5,246
Ready for the poster session at #interspeech2023!
1
3
48
4,214
This work has been accepted at @ISCAInterspeech 🥳
📢📢 New preprint just dropped 📢📢 "GPU-accelerated guided source separation for meeting transcription" Paper: arxiv.org/abs/2212.05271 Code: github.com/desh2608/gss 1/n
1
2
47
3,382
After 3.5 years in the US, I have grown used to driving in mi/h and weighing stuff in lbs, but still can't wrap my head around Fahrenheit 🧐
4
1
48
I spent at least 1 hour yesterday looking at the beautiful snow-covered trees outside my window while I played my guitar. Gonna clock it under "software development" hours because mental software got pretty damn developed.
1
44
Sure, ChatGPT is cool. But is it cooler than an all-flannel @jhuclsp line-up? (ca. 2020)
2
2
44
4,125
What @INTERSPEECH2021 says: Submission deadline is Mar 26; papers can be updated by Apr 2. What I read: Submission deadline is Apr 2; must make a dummy submission by Mar 26.
2
1
43
Wins Best Paper award @asru2019 Congratulations! @jhuclsp
Replying to @rdesh26
MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe ADV: ASR in Adverse Environments Sunday, 15 December, 16:00 - 17:30 arxiv.org/abs/1910.06522
6
44
It's a beautiful day to do some @icmlconf reviews 📝
2
44
7,180
Presenting at the NSF CIRC meeting today
3
4
42
2,615
Meeting up with Boston #speech folks over BBQ and beer 🍻 (minus @JonathanLeRoux)
2
1
43
5,041
Co-author: We'll need to remove this section. There's no space. Me (a 5th year PhD student): There's always space. #icassp
2
1
42
The JSALT summer school wraps up tomorrow. On Monday, the workshop starts. Looking forward to 6 weeks of WFSTs + speech! Pic from: jsalt2023.univ-lemans.fr/en/…
2
2
39
3,315
This is now accepted for publication at @ieeeICASSP 2022 🎉
Replying to @rdesh26
How do you create a hybrid ASR system for a new language X with only 15 mins of transcribed speech? Answer: Use XLSR-53, transcribed speech from other languages, and extra text from language X
1
2
38
"Vasudhaiva kutumbakam" is a Sanskrit phrase which means "the world is one family." Featured: Dinner with other young researchers at @ieee_slt.
1
2
36
2,066
Reviewing an ICLR paper and came across this gem: "LMs have been widely used in ASR in the last 2 decades."
4
1
35
9,405
A friend who works on analysis of brain signals asked me if there were some ASR techniques that they could use. I briefly explained SSL methods, but mentioned that it would require tons of data. "How much data do you have?" Friend: "Like 3 hours. That's a lot, right?" 😅
2
37
Now published at IEEE Transactions on Audio, Speech, and Language Processing (TASLP). Early access here: ieeexplore.ieee.org/document…
🥁 New pre-print 🥁 "SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition" abs: arxiv.org/abs/2306.10559 website*: sites.google.com/view/surt2 *includes recipes and pre-trained models A ~short~ thread 👇 1/
1
35
2,733
Hermann Ney's talk at JSALT was intense and enlightening! Dr. Ney been at the fore-front of data-driven ML for 45 years now, and hearing his views on the field really puts things in perspective.
2
4
37
2,866
Pleasantly surprised to get a complimentary NeurIPS registration for reviewing services. (Have other travel plans on those dates unfortunately, but will check out the virtual program) More conferences should do this! @ISCAInterspeech 👀
1
34
3,661
My go-to debugging technique when I'm stuck on an issue working late night is to stop working and go to bed. During the night, programming angels drop in and fix the bug, so I wake up to a working code.
4
34
The last time I went to a #SpeechProc conference in person was ASRU'19 in Singapore as a 1st year grad. This time around at @ieee_slt, as a mentor at the SLT-CODE hackathon and then a volunteer, I got to experience the conference from a very different perspective!
1
34
2,169
📢A new tool and a blog post to make diarization evaluation simple+fast. Code: github.com/desh2608/spyder Post: desh2608.github.io/2021-03-0… - Implemented in C++ for ~5x speedup over md-eval - To install: `pip install spy-der` - Use from within your Python program / command line
3
33
Next up in the JSALT summer school program this morning is a talk by @ryandcotterell.
1
32
2,854
Received some more love in the form of an ISCA Travel Grant! Thanks @INTERSPEECH2021 ✨ Now begins the struggle for a visa 😅
After getting straight rejects for the last 2 years, I finally received some love from @INTERSPEECH2021. Congratulations and loads of gratitude to my co-authors! Can we fly to Brno already?! ✨
1
32
Replying to @iver56
Haha not a bad idea
3
29
45,293
📢 Now that my Schengen visa is approved, time for some quick updates and summer plans!
2
32
5,011
My internship work from last summer is now accepted for publication at @ieeeICASSP 😄 Immense gratitude to my collaborators Liang, Zhuo, Yashesh, and Jinyu for their guidance. pdf: arxiv.org/pdf/2109.08555.pdf abs: arxiv.org/abs/2109.08555
📢 Summer update 📢 I will be interning with the awesome #speech people at Microsoft. I'll work on cool transducer-based streaming models for multi-speaker ASR. P.S.: Please send me your fav neural transducer paper recommendations 😀
33
Last month, our system from JHU CLSP achieved 2nd best WER in the CHiME-6 challenge (track 2: dinner party diarization + ASR). The system description paper is now available at: arxiv.org/abs/2006.07898
1
2
33
📢 New paper on ArXiv 📢 "Injecting text and cross-lingual supervision in few-shot learning from self-supervised models" abs: arxiv.org/abs/2110.04863 pdf: arxiv.org/pdf/2110.04863.pdf
2
6
32
This short collaboration with @SamueleCornell and colleagues on "separation+diarization" turned out well. Looking forward to working together on CHiME-7! chimechallenge.org/current/t…
1
31
2,841
The next generation Kaldi is under development, and you can help in crafting a roadmap for its next life cycle: kaldi.dev/
1
17
31
How do you create a hybrid ASR system for a new language X with only 15 mins of transcribed speech? Answer: Use XLSR-53, transcribed speech from other languages, and extra text from language X
1
9
32
Every time I leave my parents to come back to the US, I feel a little sad for leaving them. Then I arrive at @PatnaAirport and remember why I left in the first place. I have been to bus stations better run than this shithole.
6
30
5,262
#Lhotse now supports annotating audio files with OpenAI's #Whisper! Here's a quick demo I created in 10 mins to transcribe the first 100 recordings from VoxCeleb. See PR from @PiotrZelasko: github.com/lhotse-speech/lho… Here's the sample audio: tinyurl.com/3vjxnubk
2
4
29
ICASSP is free :)
Registration for the first virtual #ICASSP2020 is now open! SPS is excited to offer complimentary registration to non-authors, sharing our cutting-edge ICASSP sessions and energizing our signal processing community around the globe. Register today! cmsworkshops.com/ICASSP2020/…
2
4
28
26 🥳 and newly found love for sparkling wine 😆
4
28
My Instagram feed is full of people getting married or attending weddings. My Twitter feed is full of academics arguing about every little thing under the sun. Intense contest for which account gets deactivated first.
2
28
Hey @AIatMeta, look and tell me how cool these RBM glasses are!
2
30
3,207
3) 🇫🇷 Right after ICASSP, I will spend 8 weeks at Le Mans University (France), participating in JSALT 2023! I will work on "WFST methods for modern speech processing" alongside researchers from @Google, @rev, @ButSpeech, and more. Check 👇 jsalt2023.univ-lemans.fr/en/…
5
29
1,357
I haven't reviewed a lot, but I make sure that I always provide some +ve comments about the paper, and make my -ve comments come across as constructive feedback. It is quite disheartening, then, to receive a review where it seems the reviewer is out to personally get you.
3
29
I'll be talking about some of my recent work on speaker diarization in the @iscasigml seminar on May 5 :)
We are delighted to announce the ISCA SIGML seminar series. This seminar series focuses on speech processing, providing a place for speech researchers to present, discuss, learn, and exchange ideas. Please find the schedule in the following webpage. homepages.inf.ed.ac.uk/htang…
1
26
I will present this work in an oral session at ICASSP: Session: Multi-speaker ASR When: June 7, 2023 at 10:50 AM EEST Where: Rhodes, Greece Slides and a short video are now available: 💾: tinyurl.com/u5jj4nfp 📽️: piped.video/watch?v=L2WnjQC8…
Replying to @rdesh26
This was work done with my @MetaAI colleagues Junteng, Jay, Chunyang, Niko, Xiaohui, and Ozlem, with whom I spent a really fun 14 weeks last summer. (10/n) Paper: arxiv.org/abs/2210.11588
2
28
2,981
Me learning French: cool, now I can speak the local language in Montréal! YUL check-in: Sir do you speak English or parlez-vous français? Me tongue-tied: Parle anglais 😓
4
28
I'm planning to take the next 2 weeks off, which means I'll prepare the presentation videos for SLT 2021 (due on Jan 5) and work on some implementation that I have been putting off for a while. Happy holidays to you too! #phdlife
2
27
Brother's wedding: Feb 20-26 (Indian wedding) Holi festival: March 8 InterSpeech submission: March 8 ICASSP camera-ready: March 13 Looks like an eventful next few weeks!
In response to some queries, please be aware that we will NOT be extending the paper deadline for #INTERSPEECH2023 . We have a very tight schedule for reviewing! You can make it!
3
1
23
4,852
Congratulations Dr. Snyder! david-ryan-snyder.github.io/
4
25
Nothing screams "India" quite like getting 8 passport photos in under $1 in 10 mins, and then spending 2 hours at the bank to get an account phone number changed.
2
1
24
The flag<->language thing is also an issue for countries with huge language diversity. If they hire a Hindi-speaker, for instance, it wouldn't work to add the Indian flag to the list because 56% Indians don't speak Hindi as their first language (and 43% don't speak Hindi at all).
3
23
This doesn't give me a free pass to socialize. We must practice social distancing until each of us has been vaccinated :) But it's definitely a small win! 🎉
2
24