reducing perplexity @openai | past: probabilistic programs, proteins, science & reasoning @ google brain 🧠

Happy to release our work on Language Model Cascades. Read on to learn how we can unify existing methods for interacting models (scratchpad/chain of thought, verifiers, tool-use, …) in the language of probabilistic programming. paper: arxiv.org/abs/2207.10342
6
97
673
ā€œ99% of Americans don’t talk about AI at parties. You can too if you try!ā€
81
186
2,303
New chapter: Happy to share that I recently joined @OpenAI! Thankful for many collaborators, friends, and mentors who made my 6 years of research @Google Brain special🧠 Excited to collaborate toward reliable reasoning & alignment in AI systems and products like #ChatGPT
37
18
1,034
184,901
o3 @ 87.5% on ARC-AGI It was 16 hours at an increase rate of 3.5% an hour to "solved"
At this rate, how long til ARC-AGI is ā€œsolvedā€? For context: - gpt-4o @ 5% - Sonnet3.5 @ 14% - o1-preview @ 18% - o1 @ 32% - best scaffolded solution @ 54%
28
65
905
1,116,614
imo the improvements on FrontierMath are even more impressive than ARG-AGI. Jump from 2% to 25% Terence Tao said the dataset should "resist AIs for several years at least" and "These are extremely challenging. I think that in the near term basically the only way to solve them, short of having a real domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packagesā€¦ā€
Replying to @__nmca__
Well, on FrontierMath 2024-11-26 o3 improves the state of the art from 2% to 25% accuracy. These are absurdly hard strongly held out math questions. And on ARC, the semi-private test set and public validation set scores are 87.5% (private) and 91.5% (public). (7/n)
20
73
877
153,425
🩶🫶 Ilya and Sam’s yin/yang was a major reason I joined OpenAI. It is still possible to repair what was shattered.
I deeply regret my participation in the board's actions. I never intended to harm OpenAI. I love everything we've built together and I will do everything I can to reunite the company.
23
30
734
168,432
It's important to emphasize that this is a huge leap /and/ we're still at the start Give o1-preview a try, we think you'll like it. And in a month, give o1 a try and see all the ways it has improved in such a short time And expect that to keep happening
13
48
629
233,413
OpenAI is nothing without its people
8
23
520
56,625
At this rate, how long til ARC-AGI is ā€œsolvedā€? For context: - gpt-4o @ 5% - Sonnet3.5 @ 14% - o1-preview @ 18% - o1 @ 32% - best scaffolded solution @ 54%
Verified o1 performance on ARC-AGI's Semi-Private Eval (100 tasks) o1, Low: 25% ($1.5/task) o1, Medium: 31% ($2.5/task) o1, High: 32% ($3.8/task)
27
12
403
235,497
šŸ“is ripe and is ready to think, fast and slow: check out OpenAI o1, trained to reason before answering I joined OpenAI to push boundaries of science & reasoning with AI. Happy to share this result of team's amazing collaboration does just that Try it on your hardest problems
We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math. openai.com/index/introducing…
8
18
359
36,380
🩶🫶
i love the openai team so much
3
13
313
57,982
language models are superhuman at predicting the next word try this yourself to see how hard it is rr-lm-game.herokuapp.com/
Like the International Math Olympiad or Spelling Bee, there should be a ā€œlanguage modeling competitionā€ where humans compete to predict the next word in a sequence. The best humans would probably still lose to GPT-2, and we’d have more empathy for how hard it is to be an LLM :)
19
23
274
180,601
LM performance typically gets worse given irrelevant info. Simple prompting improves it: "Feel free to ignore irrelevant information given in the questions." Work led by @fredahshi, with @xinyun_chen_, @kanishkamisra, @nkscales_google, @edchi, Nathanael SchƤrli, & @denny_zhou
Large Language Models Can Be Easily Distracted by Irrelevant Context arxiv.org/abs/2302.00093
26
194
36,873
We are used to the cadence of big model releases: GPT2->3->4 took two years each time We’re in a different world now o1 was announced months ago, now already on next generation Expect faster improvement going forward: o1 is like gpt2 if we could jump to gpt4 ~immediately
12
23
188
48,468
Welp looks like the parrot is better at math than I am
7
5
188
11,670
We’re so back (to work)
We have reached an agreement in principle for Sam Altman to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo. We are collaborating to figure out the details. Thank you so much for your patience through this.
6
181
31,870
At ICML & excited to talk with old and new friends Message me to chat. A few possible topics: - Model chains, agents, programs - Probabilistic programming - Simulation-based/likelihood-free inference - AI for science and reasoning - AI-first Human-Computer interfaces
7
7
173
29,652
2
10
155
11,809
The C Elegans of GPT
This is a baby GPT with two tokens 0/1 and context length of 3, viewing it as a finite state markov chain. It was trained on the sequence "111101111011110" for 50 iterations. The parameters and the architecture of the Transformer modifies the probabilities on the arrows. E.g. we can see that: - state 101 deterministically transitions to 011 in the training data, so the probability of that transition becomes higher (79%). Not near 100% because we only did 50 steps of optimization. - state 111 goes to 111 and 110 with 50% probability each, which the model almost learns (45%, 55%). - states like 000 are never encountered during training, but have relatively sharp transition probabilities, e.g. 73% of going to 001. This is a consequence of inductive biases in the Transformer. One might imagine wanting this to be 50%, except in a real deployment almost every input sequence is unique, not present in the training data verbatim. Not really sure where I was going with this :D, I think it's interesting to train/study tiny GPTs because it becomes tractable to visualize and get an intuitive sense of the entire dynamical system. Play with here: colab.research.google.com/dr…
2
23
146
47,334
Also want to point out o1-mini, which is incredible at coding tasks while being /fast/ It and o1 are the first generation of a new type of model.
As part of today, we’re also releasing o1-mini. This is an incredibly smart, small model that can also reason before it’s answer. o1-mini allows us at @OpenAI to make high-intelligence widely accessible. openai.com/index/openai-o1-m… On the AIME benchmark, o1-mini re-defines the intelligence + cost frontier (see if you can spot the old GPT-4o model in the bottom šŸ™‚). Massive congrats to the team and especially @ren_hongyu and @shengjia_zhao for leading this!
5
7
144
28,676
Example of what ARC-AGI problems look like
7
3
123
21,370
OpenAI achieved gold medal on 2025 International Math Olympiad (solving 5 of 6 problems)! Thinks for hours and writes proofs in natural language. We've come a long way from LLMs solving 50% of MATH dataset in 2022 Congrats @alexwei_ on spearheading a major milestone!
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
1
125
7,490
Excited to present our work on evolving architectures for translation and image generation in a modular language this afternoon at #GECCO2018! Joint work with David So and Quoc Le.
3
28
117
ProtNLM (Protein Natural Language Model) annotates previously "uncharacterised proteins" in @uniprot in English Instead of a restricted tag set, it predicts function as language: [amino acids] -> "CRISPR-associated endonuclease Cas9" Collaboration between @GoogleAI and @emblebi
Ever got a result back saying uncharacterised protein? 😩 @uniprot and @GoogleAI have teamed up to create a natural language processing model that has generated over 40 million protein annotations to address this challenge. ebi.ac.uk/about/news/technol…
1
29
114
Found the OpenAI tenders
4
112
10,892
Copilot turning me from code monkey into tab monkey
7
2
104
14,630
Happy 1000 days til AGI to those who celebrate
4
103
13,179
GPT4 feels qualitatively different than models I've used before: like working with a creative partner with vast knowledge. The results on standardized tests will make the rate of progress tangible for many people outside AI
Announcing GPT-4, a large multimodal model, with our best-ever results on capabilities and alignment: openai.com/product/gpt-4
1
5
94
17,179
Just ask for smaller, better models! Paper led by @_angie_chen, w/ @david_r_so & me: LMs discover architectures *by directly writing Python Jax code* instead of searching a restricted DSL With EvoPrompting, we use LMs within an evolutionary algorithm to crossover parent prompts
New paper w/ @dmdohan and @david_r_so! Can LMs be used to design novel model architectures? We propose EvoPrompting, which evolves few-shot prompts to enable a code-pretrained LM to generate novel state-of-the-art architectures. arxiv.org/abs/2302.14838 (1/4)
1
9
93
29,014
Replying to @yacineMTB
advice I give for short notice interview prep: - get a copy of "elements of programming interviews in python" - read through each chapter & for each problem: a. spend a few minutes thinking of ways you might solve it. b. Imagine approaches: visualize solution/gestalt, how you would structure the code, how it would behave. c. read the solution in back of book should fully solve a few on computer to practice end-to-end there's a limited set of leetcode style questions that are reasonable for an interview, and this covers most of them. also a pretty good review of algorithms in general.
4
3
90
7,263
GPT-4 is in the top 20% of test takers in many of these standardized tests openai.com/research/gpt-4
5
19
83
92,132
will exclusively refer to the dataset as "ARGH-AGI" from now on
1
84
7,674
Declarative langs like SQL let us declare a goal (query), and the system plans how to satisfy constraints LMQL does this for LMs: can get better results for sampling & tool use in fewer tokens bc it optimizes the decoding Try it out in the playground: lmql.ai/playground/#cot
šŸš€ Excited to announce the first release of lmql.ai, a novel open source programming language and platform for language model interaction! Combining prompts, constraints & scripting, LMQL elevates the capabilities of large language models. 🧵1/6 A quick tour.
2
11
81
31,114
Presenting two posters at #NeurIPS2023, come by! 10:45am-12:45pm for both - #527 Tuesday @ poster session 1: "Training Chain-of-Thought via Latent-Variable Inference" - #332 Thursday @ poster session 5: "EvoPrompting: Language Models for Code-Level Neural Architecture Search"
5
77
8,027
Maybe an ask-to-answer to @adamdangelo on Quora can clear things up?
3
3
75
14,750
Replying to @DavidSHolz
Think most benchmarks are necessary not sufficient Beating ARC-AGI doesn't mean you actually have AGI, but it would be surprising to have an AGI that couldn't do it. Though FrontierMath is hard enough that we could have an AGI that doesn't solve it, not sure here.
1
73
5,196
Did you know? Reading a paper signed by the author doubles your learning rate! Today we are launching papers.deals to share our beloved arXiv of autographed machine learning papers with the world All proceeds from these historic artifacts go to charity šŸ’–
9
6
75
19,602
Neat prompt trick for Chat: "express same in Prolog" Simple way to get LMs to translate back-and-forth between informal language and formal representations like Prolog/Idris/MiniKanren/... Next up: use the formal language to check its work
I'm not kidding, it's really good. Remember how the 420 latest GPT news reddit post raved about compression? Well "Prolog" as an idea of how to present information is a one word miracle
6
6
66
15,552
LMs are pretrained to predict the next token. This description is helpful to build intuition, but it’s no longer quite accurate for RL fine tuned models.
I think "predicting the word that comes next" is a good description of what pretrained LMs (base models) do. But the description is much less apt after base models are fine-tuned with reinforcement learning.
4
4
60
21,009
How to code a side project in 2025: 1. May 31 - Write project spec 2. Procrastinate 6 months 3. Dec 31 - ask favorite AI to implement it
2
3
64
4,426
Replying to @alitaylor
Paraxanthine! 80% of caffeine metabolizes to it, rest to theobromine/theophylline. All 4 are xanthines which block adenosine "4 hour half life" of caffeine doesn't include processing the 3 stimulants it turns into Rarebird has px coffee & there are preworkouts/energy drinks
6
3
58
7,041
At NeurIPS this week Come talk about test time compute, compound systems, reasoning, ai for science, alignment, the impending singularity, and just how weird this is all getting. 2022 was a simpler time. Now we get to live in interesting times.
NeurIPS 2022. Here is a photo of @dmdohan @_sholtodouglas @jimmybajimmyba 1 day after the release of ChatGPT, before post-ChatGPT capitalism tore them apart to OpenAI, DeepMind, and xAI. Post-AGI is when they can play Jenga againšŸ˜”. I'll attend NeurIPS 2024 and am excited to meet with both old and new people. Please find me if you are interested in Gemini, post-training, AGI timeline and Silicon Valley drama, a based RL person's perspective on scaling laws, symbolic vs physical intelligence, multilinguality team (has headcounts), angel investing strategies for Silicon Valley and Tokyo, how I ended up demoing GPT-4 to Japanese Prime Minister at OpenAI, opportunities of working from Tokyo, etc.
2
59
6,181
o1 ranks in the top 500 students for AIME -> would qualify for the USA Math Olympiad Coding @ the IOI, a variant scores at median among contestants, and an oracle among 10,000 samples per problem would receive a gold medal On GPQA it achieves 78%, compared to 70% for PhDs
1
57
15,605
We've entered a new paradigm which allows scaling test-time compute alongside train-time compute, so the model can spend more time and achieve better results. Check out the research blog with details: openai.com/index/learning-to…
1
2
53
6,143
Gotta bring a ā€œNo AI roomā€ poster to @NeurIPSConf to create an oasis at events
4
53
Caveat on the Tao quote: that refers to the hardest "research" split of the dataset, while the 25% is across the entire dataset.
Replying to @GarrisonLovely
To clear a possible misunderstanding: the quotes refer to questions in the highest tier of difficulty of FrontierMath. Not every question in the benchmark is as difficult as the ones Tao and Gowers reviewed.
2
2
46
7,259
Love this! There’s so much unexplored space for LM UX experiences
ChatGPT, but with rabbit holes
3
4
50
7,739
Who's going to tell Marvin Minsky we put Perceptrons into the Society of the Mind
4
1
47
11,198
Party in Principle
45
2,933
New favorite prompt: "Write like Wittgenstein" The general pattern is: "Be concise. Write like X."
Model too verbose? "Write like Wittgenstein"
2
43
3,436
Authorship ordering is a challenging problem. In "Academic Author How Names Order To", the authors propose several groundbreaking solutions for this well studied yet thus far intractable task
3
2
45
18,335
Practically, thinking in plain language opens up a ton of possibilities. On safety & alignment, the model is more reliable because it can reason about policies and available choices before responding, and we are able to inspect its thinking and look for why something happened
2
2
45
8,850
ā€œCalling it attention is anthropomorphizing. If they called it ā€˜Kernel Smoothing is All You Need’ that would not have grabbed the imagination nearly as muchā€ -Ted Chiang @ creativity and AI NeurIPS workshop
1
45
4,560
An encouraging aspect of the o3 series is that the model can explicitly think about safety and what's OK, leading to more robustness all around
Chain-of-thought reasoning provides a natural avenue for improving model safety. Today we are publishing a paper on how we train the "o" series of models to think carefully through unsafe prompts: openai.com/index/deliberativ……
4
45
21,908
ChatGPT can now use tools through AI Plugins: openai.com/blog/chatgpt-plug… 1. Browsing: Search web to answer questions (WebGPT) 2. Code Interpreter: Write/execute/debug—sandboxed—Python to test/analyze/... 3. Interface with services like Kayak/WolframAlpha/Zapier, or ones you create!
We are adding support for plugins to ChatGPT — extensions which integrate it with third-party services or allow it to access up-to-date information. We’re starting small to study real-world use, impact, and safety and alignment challenges: openai.com/blog/chatgpt-plug…
2
5
46
13,660
Time for Good Old Fashioned AI to make a comeback? I enjoyed "Cognitive Architectures for Language Agents" from @tedsumers and @ShunyuYao12 Discussion tomorrow with @hwchase17 and @charles_irl on the evolving world of scaffolds/abstractions around LLMs! arxiv.org/abs/2309.02427
Our webinar tomorrow might be my favorite one yet. An absolute MUST JOIN for anyone building chains/agents Guests: @dmdohan - Model Cascades paper author @ShunyuYao12 - ReAct paper author @tedsumers - COALA paper author @charles_irl - top tier educator crowdcast.io/c/v7i2ysxqkbd2
5
4
40
8,114
Googled phone # to cancel Citi credit card. Grabbed from generated info box. Called it. Weirdly got different security questions than I had noted but made it through. Request to cancel the card and they don't see it. Realize Google's search LLM gave me Chase's phone numberšŸ¤¦ā€ā™‚ļø
5
1
39
6,647
Replying to @nickcammarata
By scaffolding I mean some process beyond just sampling from the model. e.g. the top ones sample tons of programs and filter for ones that solve the examples, or "test-time finetune" on examples for each problem individually arcprize.org/blog/openai-o1-…
2
2
40
9,273
šŸ’œ
Replying to @janleike
To all OpenAI employees, I want to say: Learn to feel the AGI. Act with the gravitas appropriate for what you're building. I believe you can "ship" the cultural change that's needed. I am counting on you. The world is counting on you. :openai-heart:
39
6,795
@ ICML workshops til Sunday! Come by beyond-bayes.github.io workshop Friday @ 9:40am for our talk, with posters @ 5pm. You'll learn how probabilistic programming lets us formalize models talking to models ("model cascades"), unifying many approaches to prompting and inference.
6
37
Prompt engineering was fun while it lasted
Can large language models write prompts…for themselves? Yes, at a human-level (!) if they are given the ability to experiment and see what works. arxiv.org/abs/2211.01910 with @Yongchao_Zhou_, @_AndreiMuresanu, @ziwen_h, @silviupitis, @SirrahChan, and @jimmybajimmyba (1/7)
2
2
37
Above all, I can't wait to see what you all do with o1 On many tasks it will feel similar to GPT4. But on hard problems where it really shines, it's like nothing else For inspiration, here are a few videos showing experts using it for their use cases
šŸŽ‰Congrats to @OpenAI for releasing o1: - Economics: @tylercowen asked o1 basically to write a college essay - Genetics: @catbrownstein asked o1 to help her reason through "n of 1" cases - medical cases that nobody has ever seen - Physics: @mariokrenn6240 used o1 to draft and reason through complex quantum physics equations - Code: @ren_hongyu prompted a full snake game and it was generated zero shot, working perfectly, and obeyed instructions to add obstacles
1
35
3,101
FrontierMath details: arxiv.org/html/2411.04872v1
1
35
8,205
The HF0 crew have made what I can best describe as a tech monastery in the heart of San Francisco. Hard to imagine a more focused environment. Apply if you want 3 incredibly focused months to build on your projects!
GPT4 launched yesterday. Today, HF0 launches: HF0.com (1/n)
1
3
31
9,387
Teaching MinervašŸ¦‰ math & science has been a ton of fun. What else were we supposed to do after realizing all the LaTeX on arXiv is available? Check out the sample explorer: minerva-demo.github.io paper: arxiv.org/abs/2206.14858
Very excited to present MinervašŸ¦‰: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. goo.gle/3yGpTN7
2
3
32
New paper on program synthesis with large language models (244M-137B). We investigate: (1) how scaling improves performance on Python and math tasks (2) whether the models can predict output of executing code (3) humans-computer collaboration to write programs via conversation
2
3
29
Replying to @tszzl @izzyz
šŸ˜‰
2
30
2,580
Replying to @moebio
Cool! Is each point an embedding of the sentence up to that word? If you haven't already, worth having a look at Loom which is an interface for branching interactions with LMs. Has some related interfaces for visualizing paths. generative.ink/meta/block-mu… github.com/socketteer/loom
3
5
27
3,302
Everyone seems to have their own bar for what will qualify as AGI. Best I can tell it usually cashes out to "I'll know it when i see it" Fan of @RichardMCNgo's "time-AGI" frame for placing abilities on a spectrum rather than making binary distinctions. We're in the fog now & reasonable people will debate. Of course progress across domains is not uniform, but on average I'd say we're comfortably past second-AGI, and hovering around minute to hour for many areas. I think of this as a rising waterline, with most areas rising at the same rate but some vastly superhuman spikes, usually around processing huge amounts of information. Can further consider ability @ given price point: there are plenty of things models can already do today that are simply not economical nitter.app/RichardMCNgo/sta…
Instead of treating AGI as a binary threshold, I prefer to treat it as a continuous spectrum defined by comparison to time-limited humans. I call a system a t-AGI if, on most cognitive tasks, it beats most human experts who are given time t to perform the task. More details:
3
1
27
4,546
WebGPT by prompting only Waiting for an API that lets us do prompt tuning/soft prompting (gradient based continuous z tuning) to make this even easier
WebGPT reproduced from advanced prompting only. Dust-based web-search assistant demo answers questions by searching the web, summarizing content and compiling a final answer with references: dust.tt/spolu/a/41770fd3d9
3
2
27
1 token is all you need
3
28
4,605
Come see what's brewing @OpenAI
We’ll be hosting our first developer conference, OpenAI DevDay, on November 6. Registration to attend in person in San Francisco will open in a few weeks. We’ll also livestream the keynote. openai.com/blog/announcing-o…
1
1
24
3,189
By letting an LM parse natural -> formal language, we get the best of both worlds: the formal system checks consistency of the natural language reasoning LM = fast system 1, Prolog etc = slow system 2 @Maxwell_Nye has neat work exploring the combo: arxiv.org/abs/2107.02794
1
1
27
1,618
look at your data
anyone: every experienced ai engineer:
1
24
3,196
Replying to @jekbradbury @ylecun
Also check out primer.ought.org - does an excellent job of demonstrating factored cognition (~latent variable models) with LLMs. It does not have explicit probabilistic inference yet.
1
2
24
The rate of progress is astounding. Where do we land after 2 more comparable leaps? June 11, 2020: GPT-3 March 14, 2023: GPT-4 Jan 1, 2026: ??? Jan 1, 2029: !?!?!?!?
1
22
2,317
Want Bespoke, but for everything (especially neural network structures) github.com/awwbees/BespokeSy…
more playing around with livecoding python in bespoke. I added a nice "note stream" module for visualization, which is very useful for understanding what you're doing in live generative composition.
2
23
Built a few graph viz tools on top of the @rem_note API in @observablehq. observablehq.com/@dmrd/remno… Read-only view for now. Next up: extend to whole knowledge bases & allow directly manipulating content inside the graph! What else would you like to see?
2
21
Can fine tune a base model on different data and weight average, or use the multiple models as a mixture of experts.
Train an LM made of independent expert LMs (no syncs! no shared params!) āž”ļø āž• new or āž– existing experts. At. Any. Time. āž”ļø Ensemble OR parameter average(!!) to outperform dense & sparse LMs & ensemble baselines with less compute, a fraction of the simultaneous GPU usage. 🌳/n
1
1
21
Has science gone too far?
ā–“ā–“ā–“ā–“ā–“ā–“ā–“ā–“ā–“ā–“ā–“ā–“ā–“ā–“ā–“ 101%
1
23
Replying to @tszzl
Alignment is the ultimate capability
1
1
21
2,676
Not there yet, but someday we'll have models that can think for as long as needed (and interact with the world) to solve the hard problems that really matter to society What will you have it think about?
Replying to @polynoamial @rao2z
@OpenAI's o1 thinks for seconds, but we aim for future versions to think for hours, days, even weeks. Inference costs will be higher, but what cost would you pay for a new cancer drug? For breakthrough batteries? For a proof of the Riemann Hypothesis? AI can be more than chatbots
1
17
1,724
Congratulations to the Metaphor team for launching! It's a different way of building a search engine. You "search by prompting" - instead of asking a question, phrase it so the natural completion would give the answer like: "My favorite personal webpages on the internet are"...
metaphor.systems is now publicly available! Metaphor is a search engine based on generative AI, the same sorts of techniques behind DALL-E 2 and GPT-3 1/
1
22
Look forward to interfaces that let designers work with generative models like this neat SVG generator.
My 1st @GoogleAI Residency paper is finally on arxiv! We train a powerful generative model of fonts as SVG instead of pixels. This highly structured format enables manipulation of font styles and style transfer between characters at arbitrary scales! šŸ‘‰šŸ½Ā arxiv.org/abs/1904.02632
1
2
20
Replying to @rayefull
Replying to @mollyfmielke
There's evidence for it: "In all cases, with exception of S9, they report having owned 1-of-3 toys widely sold by Fisher-Price between 1972 and 1989" Anecdotally, friend traces some # colors to license plate on family car. neurocritic.blogspot.com/201… study: ncbi.nlm.nih.gov/pmc/article…
5
202
The "No AGI zone" shirt looks more useful by the day.
Would you like any ā€œNo AGI zoneā€ tshirts Even better if it’s reversible with ā€œLet’s talk about AIā€
1
1
19
Come by the 11am posters on Wednesday to learn how irrelevant context effects LLMs:
LM performance typically gets worse given irrelevant info. Simple prompting improves it: "Feel free to ignore irrelevant information given in the questions." Work led by @fredahshi, with @xinyun_chen_, @kanishkamisra, @nkscales_google, @edchi, Nathanael SchƤrli, & @denny_zhou
1
20
6,071
How many minutes of thinking is acceptable to cure cancer?
the test time compute era is weird — before too long someone will be saying ā€œsure it’s AGI, but it had to think for twenty minutes before curing cancer soā€¦ā€
2
19
2,866
Amused that the media beat me to the announcement
1
19
5,241
Replying to @andrewwhite01
The "I Can't Believe It's Not Better" workshop @ NeurIPS does this! So many beautiful ideas with the tiny problem that they don't actually work (yet?) icbinb.cc @ICBINBWorkshop
1
1
19
893
Replying to @typedfemale
Favorite Twitter bio
1
20
1,016
Had a chance to discuss the state of natural language processing & potential applications toward an "IDE for thought" with @AthensResearch last month. @PsionicaOrg demoed Dual, which provides natural language interface over a knowledge base. recording: piped.video/watch?v=Oxbv9Enh…
For today's community call in 40 minutes, @dmdohan (Google Brain) will be chatting about how we might apply AI/NLP/GPT-3 to Athens Paul Bricman is joining to talk about his project psionica.org/ A preview of the call here: github.com/athensresearch/at… Don't miss this !!
3
18
i am not a very good language model =\ the site is also subtly broken in a few ways (not all words are allowed tokens, some correct guesses marked as wrong, ...) still good way to build intuition! anyone know who actually made this?
2
18
4,658