thrilled to be back @Google in the @GoogleDeepMind team! The technical breadth and expertise across the whole stack (hardware->infra->deep learning->products) is truly mind-blowing. Great to see a lot of familiar faces and meet new friends. Look forward to learning a lot!
33
31
1,141
97,366
Excited to join @AIatMeta! The past 4.5 years at @OpenAI,working on embeddings, GPT-3 & 4,API and ChatGPT, have been career highlights. Now, I'm thrilled to work on the next generations of Llama and contribute to its impact on the developer ecosystem and billions of users!🚀 1/2
44
26
1,133
143,141
We explore a simple approach to task-oriented dialog. A single neural network consumes conversation history and external knowledge as input and generates the next turn text response along with the action (when necessary) as output. Paper: arxiv.org/pdf/1910.14613.pdf 1/4
3
57
227
A thread on how we evaluate our embedding models in OpenAI’s API. We achieve state-of-the-art results in linear probe classification, text search and code search. It’s not fine-tuned, so it works great in the real world — and our customers love it. 1/7
5
30
155
Replying to @tszzl
imagine being told you are wrong million times a second, for a few months.
3
84
4,577
Zero-shot results of OpenAI API’s embeddings on the FIQA search dataset. Evaluation script: github.com/arvind-neural/bei… We zero-shot evaluated on 14 text search datasets, our embeddings outperform keyword search and previous dense embedding methods on 11 of them!
Replying to @arvind_io
In text search tasks, we obtain best zero-shot results in msmarco, triviaQA, and NQ and also the best transfer results on the BEIR benchmark. 5/7
14
68
We are excited to release Taskmaster-1, a new task-oriented dialog dataset. We explore two methods for data collection, two-person and self-dialogs. Surprisingly self-dialogs are an effective way to collect dialog. Paper accepted to @emnlp2019 : arxiv.org/pdf/1909.05358v1.p…
1
9
59
look forward to working with @manohar_paluri, @Ahmad_Al_Dahle, @edunov and many others in the excellent @AIatMeta team! 2/2
4
52
20,204
Thanks for a balanced take! Couple of comments that are also added to the video description now: 1/4
🔥New Video🔥 OpenAI now offers embeddings for text similarity and search, but are they holding up? We look at the release, the paper, the criticism, and most important: the price! Are the embeddings worth it? Watch here to find out: piped.video/5skIqoO3ku0
4
9
53
Small models specifically fine-tuned on a dataset can do well on a narrow benchmark, but they far underperform in real-world settings, as many of our customers are discovering. This study from @FineTuneLearn shows our API performance. 7/7
2
14
48
OpenAI embeddings work on a very broad set of use cases. Here, Viable gets a 7.7% absolute improvement in clustering quality using OpenAI embeddings when compared to previous methods!
We tested different embedding models and show the data behind why GPT-3 was the clear winner for our clustering needs askviable.com/blog/why-we-ch…
3
43
The cost to run this experiment with text-search-ada, embedding both documents and queries, is ~$80. text-search-ada achieves a 62% relative improvement over keyword search here!
Zero-shot results of OpenAI API’s embeddings on the FIQA search dataset. Evaluation script: github.com/arvind-neural/bei… We zero-shot evaluated on 14 text search datasets, our embeddings outperform keyword search and previous dense embedding methods on 11 of them!
1
7
38
@OpenAI embeddings api over time
1
2
36
2,426
We describe a simple technique to parallelize Scheduled Sampling across time that allows us to apply Scheduled Sampling for problems that involve generating very long sequences. We get better sample quality and train almost as fast as teacher-forcing. arxiv.org/abs/1906.04331
2
6
35
Replying to @ylecun
For the same reason a kind of unsupervised learning that people were always doing was branded as self-supervised learning 😉
1
30
We've trained embedding models to produce high quality text and code embeddings. Our general purpose embeddings achieve top results in classification, text search, and code search. The models are now available in the @OpenAI API: openai.com/blog/introducing-…
We're introducing embeddings, a new feature of our API that distills relationships between concepts, sentences, and even code in a simple numerical representation — for more powerful search, classification, and recommendations. openai.com/blog/introducing-…
2
3
23
@OpenAI embeddings achieve better retrieval performance and are also lot cheaper! Results taken from: arxiv.org/pdf/2305.06300.pdf
4
2
23
2,612
My team and I trained the model. We look at 33 datasets across four different categories: linear probe classification, sentence similarity, text search, and code search. All these results and figures were in our paper, released this week. arxiv.org/pdf/2201.10005.pdf 2/7
2
23
In text search tasks, we obtain best zero-shot results in msmarco, triviaQA, and NQ and also the best transfer results on the BEIR benchmark. 5/7
3
20
amazing multimodality performance (& more) !! storage.googleapis.com/deepm…
1
3
24
4,620
TPU -> XLA -> JAX -> Transformer, MoE, Chinchilla, AlphaGo, .... -> Gemini, Veo, .... -> Search, YouTube, Waymo, ... -> Chrome, Android, .... 🤯🤯🤯
20
2,948
OpenAI Embeddings helps you go beyond keyword search!
Replying to @lilianweng
The code is actually extremely simple for a cool app like this - open sourced here: github.com/lilianweng/emoji-…
19
We also achieve new state-of-the-art results on code search. 6/7
2
16
Check out our spotlight talk and poster describing the Neural Assistant work in the ConvAI workshop tomorrow @NeurIPSConf #neurips19 alborz-geramifard.com/worksh…
We explore a simple approach to task-oriented dialog. A single neural network consumes conversation history and external knowledge as input and generates the next turn text response along with the action (when necessary) as output. Paper: arxiv.org/pdf/1910.14613.pdf 1/4
1
14
Replying to @agihippo
Good but hard to not have @DBahdanau
2
10
8,223
We do a large-scale human study to compare different decoding methods for language generation and develop a globally normalized decoding method that optimally traverses the quality-diversity curve.
How does one trade-off sample quality and diversity in a language model? Which decoding method is best? We introduce a multi-objective framework maximizing human judgement score subject to a constraint on diversity (entropy). arxiv.org/abs/2004.10450 (1/7)
1
12
In linear probe classification, we obtain best results wrt average accuracy on seven classification tasks. 3/7
2
12
In sentence similarity tasks, we perform worse than previous work. This was explained in our paper as well. 4/7
2
12
Replying to @WilliamWangNLP
Thanks for having me, I had a fun time visiting @ucsbNLP !
10
600
Replying to @ilyasut
Belief is all you need!
1
9
in case people are counting, I forgot to share the results for text search from 3 more datasets (apart from the 11 text search results already reported) 🙂
Replying to @arvind_io
My team and I trained the model. We look at 33 datasets across four different categories: linear probe classification, sentence similarity, text search, and code search. All these results and figures were in our paper, released this week. arxiv.org/pdf/2201.10005.pdf 2/7
7
Nice! Would be interesting to compare with vanilla Transformer trained using the new objective.
7
We get good results on real-world question answering with neural semantic parsing/program induction. Code is here: github.com/tensorflow/models…
Learning a Natural Language Interface with Neural Programmer. (arXiv:1611.08945v1 [cs.CL]) ift.tt/2fvmppE
2
6
Replying to @sdand
Any feedback for us ? :)
5
663
In our experiments we find that: 1) our model was able to incorporate external knowledge and generate factual text response with weak supervision signal. 2) our model can incorporate medium-size knowledge bases with only 8K training examples over multiple verticals.
1
1
4
This was fun, thanks for having me!
1
5
Implementation of Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning: github.com/tensorflow/tensor…
1
5
Replying to @GaryMarcus
Things are changing : arxiv.org/abs/1810.04805 and multiple other recent work in nlp
5
Replying to @jobergum
our method actually zero-shot transfers better than bm25 to 11 search tasks on average as shown in the entire table. even our smallest models are better than bm25. while it is not the only way to exploit training data with bm25, we perform better than one such method docT5 query
1
4
I still remember your super helpful LSTM language model tutorial for 2015 interns! 🙂
5
The code for FIQA experiments to reproduce the results in the paper using the API: nitter.app/arvind_io/status/14882… . There's no discrepancy AFAIK. 2/4
Zero-shot results of OpenAI API’s embeddings on the FIQA search dataset. Evaluation script: github.com/arvind-neural/bei… We zero-shot evaluated on 14 text search datasets, our embeddings outperform keyword search and previous dense embedding methods on 11 of them!
1
4
Replying to @egrefen @pfau
what are the drawbacks of the benchmark/metric and any suggestions on how they can be improved ?
4
Work done with awesome intern Semih Yavuz and many awesome colleagues @GoogleAI @Google
1
3
Replying to @agihippo

ALT Padme GIF

1
4
639
We leave out 6 not 7 BEIR datasets.Results on MSMARCO, NQ, TriviaQA are in a separate table (Table 5 in the paper).NQ is part of BEIR too and we didn't want to repeat it.The 6 datasets we leave out are not readily available and it is common to leave them out in prior work too.3/4
1
3
For example, see SPLADE v2 (arxiv.org/pdf/2109.10086.pdf) also evaluates on the same 12 BEIR datasets. Discussion from their paper: 4/4
1
4
Replying to @quocleix
Agree! But, I think once widely used brown clusters (e.g., : wing.comp.nus.edu.sg/~antho/…) should also be given credit. They use language model pre-training objective on unlabeled data and transfer the word clusters to supervised tasks. They are not "contextual" though.
4
Data: ai.google/tools/datasets/tas… Work done with many awesome colleagues at Google Assistant team and @GoogleAI along with student researcher Chinnadhurai Shankar
3
Thanks! As stated in the paper, we plan to release the code with the next version of the paper.
3
thank you, Melvin! look forward to working with you as well :)
477
Replying to @egrefen
I think it's a little harsh to call that work flag-planting. They performed experiments on 4 real-world datasets that AFAIK were widely used by the NLP community. In comparison there were many novel methods during that period only evaluated on toy-data.
1
3
Thanks for building an extremely useful benchmark!
3
and also impressive performance on text classification and search!
1
3
Replying to @sama
I've noticed some of these similarities as well and @paulg explains it well "A startup founder is in effect an economic research scientist." (paulgraham.com/growth.html)
3
Paper updated with experiments on image generation.
We describe a simple technique to parallelize Scheduled Sampling across time that allows us to apply Scheduled Sampling for problems that involve generating very long sequences. We get better sample quality and train almost as fast as teacher-forcing. arxiv.org/abs/1906.04331
2
we see massive improvement in code search using our models!
2
2
Awesome, congratulations!!!
2
Replying to @Thiagogm @OpenAI
Zero-shot results of OpenAI API’s embeddings on the FIQA search dataset. Evaluation script: github.com/arvind-neural/bei… We zero-shot evaluated on 14 text search datasets, our embeddings outperform keyword search and previous dense embedding methods on 11 of them!
1
thank you, Jeff! so happy to be back :)
1
968
congratulations!!! :)
Replying to @shaneguML
Congratulations!!!
1
Replying to @AkhileshGotmare
ndcg@10 as done in previous work
1
1
The conversation is annotated with accept/reject. At test time we would want the third-party business to implement a boolean function that returns whether transaction can be completed.Neural Assistant will learn to work with the response as it has been annotated at training time.
1
1
Replying to @NirantK @rishabh16_
You can find the full table of results below. Even the smallest model outperforms bm-25 and its extension, docT5query
Replying to @jobergum
our method actually zero-shot transfers better than bm25 to 11 search tasks on average as shown in the entire table. even our smallest models are better than bm25. while it is not the only way to exploit training data with bm25, we perform better than one such method docT5 query
1
Joint work with Daniel Duckworth, Ben Goodrich, @lukaszkaiser and Samy Bengio
1
Improvement in decoding speed, as shown by some recent work in non-autoregressive machine translation
1
thank you, Quoc! it was a great chat, felt like I never left :)
1
703
hope it answers your question!
1
The model is trained at turn-level where the dialog history fed into model as input has previous ground-truth turns of the dialog. In the conversations here the actual text responses generated by model itself are used as the assistant’s side of dialog history to be fed as input.
2
Replying to @jaseweston
Really nice work, congratulations!!!
1
Replying to @dmimno
congratulations!!!
1
Replying to @HandNF @JeffDean
Thanks for the interest. I think Neural Assistant + Taskmaster (ai.google/tools/datasets/tas…) + Google search results as source for external knowledge can work really well for task-oriented dialog!
1
1
Replying to @DBahdanau
Nice work! 🙂
1