AI Research Scientist working on Reinforcement Learning, Large Language Models, and Recommendation Systems.

Redmond, WA
Amazon Rufus is an expert shopping assistant powered by GenAI. We’re hiring LLM/RL talents to work on an array of intellectually challenging science questions. Come join us in this exciting (and fun) adventure!
6
12
131
71,137
Next week, @marcgbellemare and I are organizing a Deep RL workshop as part of Simons Institute's Theoretical RL program, with a great lineup of speakers. All talks will be recorded, and can be viewed live on YouTube channel. See simons.berkeley.edu/workshop… for more details!
2
31
182
Interested in reinforcement learning *without* interaction with the environment or simulator? We're organizing a @NeurIPSConf 2020 workshop on Offline RL. Visit the homepage offline-rl-neurips.github.io for more details including Call for Papers!
Excited to be organizing a workshop on Offline Reinforcement Learning @NeurIPSConf 2020! CfP and other details at offline-rl-neurips.github.io. With organizers Aviral Kumar @berkeley_ai, @georgejtucker, Doina Precup @DeepMind and @LihongLi20.
25
151
Truly grateful and humbled to receive the award. It's gratifying to see this 13-year old work continues to be useful, and exciting to witness how much the field has grown since then! Congrats to my coauthors, Wei, @JohnCLangford and Rob.
#TheWebConf2023 Seoul Test of Time Award: "A Contextual-Bandit Approach to Personalized News Article Recommendation" Lihong Li (Amazon), Wei Chu (Ant Group), John Langford (Microsoft) and Robert Schapire (Microsoft) First presented at the 2010 conference. Congrats!
14
8
91
22,626
Excited to share how reinforcement learning is used to delight customers in Amazon, among others!
How does the Amazon Store know what products and offers to display? Part of the answer involves reinforcement learning. Learn how scientists in @AmazonAds are developing reinforcement learning techniques to improve outcomes for customers. #machinelearning amazon.science/working-at-am…
1
7
55
I'm excited to share the CfP for the Machine Learning Special Issue on RL for Real Life: springer.com/journal/10994/u… (With Alborz Geramifard, @yuxili99, @CsabaSzepesvari, Tao Wang). Deadline: March 5, 2020.
18
53
Please come join us this weekend if you're interested in how RL is applied to the real life!
It is exciting our RL for Real Life 2020 Virtual Conference is approaching on June 27-28, sites.google.com/view/RL4Rea…, co-organized with @gabepsilon, Alborz Geramifard, Omer Gottesman, @LihongLi20, Anusha Nagabandi, @TonyZQin, @CsabaSzepesvari.
9
43
Awesome Day 1 of the Deep RL workshop. Enjoyed the excellent talks by @tengyuma @EmmaBrunskill @svlevine @ofirnachum . Thanks every one for participating. Looking forward to Day 2! @SimonsInstitute
Excited to kick off the Deep Reinforcement Learning theory workshop at the Simons Institute today, co-organized with @LihongLi20 . Today's topic is Offline reinforcement learning 🔥 Schedule is here: simons.berkeley.edu/workshop…
3
39
Another fantastic day at the Deep RL workshop! Thx to @IanOsband @chelseabfinn @wwdabney Alekh Agarwal for the wonderful talks, and inspiring discussions moderated by Joel Lehman. All sessions recorded. Looking forward to tomorrow (optimization!) @marcgbellemare @SimonsInstitute
3
34
As co-organizer, I'm super excited about the program and looking forward to it next week. Come join us at #NeurIPS2019 if you're curious about how the optimization toolkit helps to design, unify and analyze RL algorithms!
The schedule and accepted papers are released: optrl2019.github.io/. Congratulations to all the recipients of the travel awards. We thank all the invited speakers, panelists and authors. Thanks to our sponsors @GoogleAI and @DeepMindAI. See you in Vancouver next week.
4
29
I'm looking for an Applied Scientist with strong ML/Stats background to join our team in Amazon Advertising. The position is based in New York City: amazon.jobs/en/jobs/1544000/…. Please consider applying!
1
27
One more day before the Offline Reinforcement Learning Workshop at @NeurIPSConf. Consider submitting questions to the panelists at offline-rl-neurips.github.io… . See you tomorrow! #OFFLINERL2020
1
4
26
Thanks to @jacobandreas @clarelyle @yayitsamyzhang Doina Precup & @ShamKakade6 for the wonderful talks at the deep RL workshop, and to the audience, esp. given how close the @iclr_conf deadline is. Come join us tomorrow to recover from the deadline craziness! 😀 @marcgbellemare
1
24
Look forward to talking at the AI for Economics seminar [aiforeconomics.com] on 12/15. There is a natural connection b/t off-policy #ReinforcementLearning & econometrics. Thx to the organizers (David Parkes, @alexrtrott, @StephanZheng) for inviting!
1
24
We are opening an exciting Early-career Scientist program at Amazon Advertising, to attract talent to innovate on behalf of our customers and publish their cutting-edge research. Please consider applying and share broadly. Application deadline: May 14. amazon.science/amazon-advert…
1
8
22
A systematic study of long-horizon off-policy evaluation via duality! Related to an earlier doubly robust work: openreview.net/forum?id=S1gl… , but in the more general behavior-agnostic setting, and with a more careful investigation of various algorithmic choices in the design space.
Policy evaluation via duality/Lagrangian methods presents a lot of choices (how to setup the LPs, regularize them, etc). In arxiv.org/abs/2007.03438 we examine how these choices affect accuracy of final eval. Lots of insights in this paper, many of which I didn't expect....
1
1
19
Learned a great deal today about how to do better optimization in deep RL from excellent talks by Matthieu Geist, Nevena Lazic, @pabbeel & Martha White. Esp enjoyed the discussions (thx @neu_rips for "spicy" questions). Can't wait for tomorrow! @marcgbellemare @SimonsInstitute
1
18
Replying to @denny_zhou @ysu_nlp
And you don't need *train*ing :)
2
13
806
Congrats to the authors! Looking forward to the workshop tomorrow.
13
My team is hiring a Data Scientist to extract critical insights from data and influence customer-facing shopping experiences in Amazon Ads: amazon.jobs/en/jobs/1990034/…. Please reach out if you are interested! #Amazon #advertising #DataScience #dataScientist #Statistics
3
12
Vote for questions you find interesting for the panel discussion: tricider.com/brainstorming/2… , as part of the RL for Real Life workshop sites.google.com/view/RL4Rea… . Please do so by 10:30am June 14 PDT!
10
Replying to @lilianweng
Interesting idea. On the other hand, a paper should be self-contained, so just describing the differences from another paper probably won't work in most cases.
9
Replying to @nanjiang_cs
Will continue reading the details. In our recent paper [arxiv.org/abs/1910.07186] we were similarly surprised by the interplay between value functions and importance ratios, where we found a similar estimator (using V instead of Q). [1/2]
1
1
6
Feel free to message me if interested!
4
6
3,009
An intriguing and rather surprising finding is the superiority of working with the duals (state visitation distributions) over the primal (value functions), although the latter has been the "default" approach in much of RL literature.
6
You may also find Sec 4 interesting, which makes explicit the connection to Lagrangian duality (as hinted at the end of your paper). Still reading/enjoying your paper. Many interesting stuffs at the intersection of optimization & RL! [2/2]
6
To make things harder, most environments are non-stationary. We may model nonstationarity by hidden state variables (at least conceptually), but will need strong assumptions to handle it effectively (to my best knowledge).
5
289
Replying to @SoloGen
A perhaps trickier situation: what if you remember the claim, but forget how to prove it...
1
4
Replying to @nanjiang_cs
Yes :) with @daibond_alpha, @ofirnachum, Yinlam Chow, @CsabaSzepesvari and Dale Schuurmans.
4
Replying to @nanjiang_cs
Same for me; took me 3 days to realize they are different...
4
Replying to @nanjiang_cs
Congrats! The name change is helpful. Incidentally, we have a paper accepted to NeurIPS that is about "confidence" intervals. :-)
1
4
Replying to @CsabaSzepesvari
Thanks, Csaba. Would be great to have you at the workshop!
4
Congrats! It's on my to-listen list :)
3
179
Replying to @edchi
Very sorry for your loss, Ed! Thanks for sharing these beautiful memories. Your father was truly amazing.
3
Replying to @marcgbellemare
This is awesome! Huge congrats!
3
Replying to @nanjiang_cs
BIG congrats!!!
3
403
Replying to @yubai01 @OpenAI
Congrats!
2
460
Replying to @neu_rips
Looking forward to it!
2
Replying to @nanjiang_cs
It can be confusing, unless the math is shown... :-(
1
2
Congratulations!
2
Congratulations!
2
Replying to @edchi
Congrats @edchi ! Well deserved!
2
408
Congrats, Seb!!
2
323
Replying to @nanjiang_cs
Super cool & interesting!
2
My short answer is no: they are difficult in different ways as the settings are different. For a longer answer (or different opinion), submit your question and come to the panel tomorrow! 😀
1
1
Replying to @SimonShaoleiDu
Congrats and welcome to Seattle!
1
Replying to @nanjiang_cs
Impressive that you got signals out of my very random noise (question)! ;)
1
Replying to @jiajunwu_cs
Congrats!!
1
Replying to @neu_rips
for what...?
1
1
Congrats! Very well-deserved!
1
157
Replying to @yisongyue @Caltech
Congratulations!!
1
Replying to @mtoneva1 @chrodan
Congratulations!!
1
Replying to @nanjiang_cs
Congrats! Very well-deserved!
1
257
Replying to @zicokolter
Congrats, Zico!
1
232
Replying to @StanfordDBDS
Congrats @james_y_zou! Very well-deserved!
1
1
398
Replying to @nanjiang_cs
Congrats!
1
1
2/3 One quick comment: As the paper already points out, the Stationary IS studied here is essentially the marginalized IS of Xie et al., not the distributions studied in HM/LLTZ/GB. You seem to suggest the difference can be removed by "taking T → ∞ when necessary" (page 3) ...
1
1
3/3 ... but it seems tricky. Eg, as T → ∞, IS/PDIS can have infinite variance (see LLTZ for an example), but probably not for the methods in HM/LLTZ/GB.
1
1
Replying to @chijinML
Congrats! So well-deserved!
1
193
Replying to @tengyuma
Interesting! Esp. the nice & simple example that illustrates the exponential gap between representing a model and a value function. Reminds me of earlier work that shows a similar exponential gap, eg in the context of factored MDPs (Boutilier et al.: doi.org/10.1613/jair.575).
1