Assistant Prof @columbia CS. Visiting Researcher @ Google DeepMind. PhD from @stanfordnlp. Language x Neural Nets.

New York, NY
New paper! Subliminal learning—transferring hidden signals between language models—is more powerful than we thought. By biasing the teacher with a steering vector instead of a prompt, we achieve strong, consistent transfer, which we use to study its mechanisms. w/@GeorgeMorgulis
6
35
302
20,239
My first NLP lectures at Columbia are in the books! In our first two lectures, we went over (1) learning from text with a simple word vector language model, and (2) tokenization of text. Lecture notes are brand new and freely available on my website (links in thread.)
17
73
1,129
72,705
I’m joining the Columbia Computer Science faculty as an assistant professor in fall 2025, and hiring my first students this upcoming cycle!! There’s so much to understand and improve in neural systems that learn from language — come tackle this with me!
122
52
886
100,407
I’m hiring PhD students in computer science at Columbia! Our lab will tackle core challenges in understanding and controlling neural models that interact with language. for example, - methods for LLM control - discoveries of LLM properties - pretraining for understanding
18
154
873
106,887
Does my unsupervised neural network learn syntax? In new #NAACL2019 paper with @chrmanning, our "structural probe" can show that your word representations embed entire parse trees. paper: nlp.stanford.edu/pubs/hewitt… blog: nlp.stanford.edu/~johnhew/st… code: github.com/john-hewitt/struc… 1/4
9
245
788
For this year's CS 224n: Natural Language Processing with Deep Learning, I've written notes on our Self-Attention and Transformers lecture. web.stanford.edu/class/cs224… Topics: Problems with RNNs, then self-attention, then a 'minimal' self-attention architecture, then Transformers.
4
150
739
86,662
I’m beginning to share notes from my upcoming fall 2025 NLP class, Columbia COMS 4705. First up, some notes to help students brush up on math. Vectors, matrices, eigenstuff, probability distributions, entropy, divergences, matrix calculus cs.columbia.edu/~johnhew/com…
9
49
438
31,552
#acl2023! To understand language models, we must know how activation interventions affect predictions for any prefix. Hard for Transformers. Enter: the Backpack. Predictions are a weighted sum of non-contextual word vectors. -> predictable interventions! backpackmodels.science
6
103
398
106,792
I'm on the faculty market! My goal is to build language systems that we understand deeply through discovery and by design, so we can precisely control them and treat their failures. Let's tackle this grand challenge of science and engineering together. nlp.stanford.edu/~johnhew/
6
73
407
96,911
#emnlp2020 paper: we give some theoretical insight into the syntactic success of RNN LMs: we prove they can implement bounded-size stacks in their states to generate some bounded hierarchical langs with optimal memory! paper arxiv.org/pdf/2010.07515.pdf blog nlp.stanford.edu/~johnhew/rn…
4
54
317
If I finetune my LM just on responses, without conditioning on instructions, what happens when I test it with an instruction? Or if I finetune my LM just to generate poems from poem titles? Either way, the LM will roughly follow new instructions! Paper: arxiv.org/pdf/2409.14254
8
39
272
44,663
Our paper on Backpacks has won an Outstanding Paper Award at ACL 2023! If you're excited about both fascinating learned structure in language models, and designing architectures to enable interpretability while maintaining expressivity, take a read! backpackmodels.science/
Our papers of #ACL2023NLP: Backpack Language Models @johnhewtt, @jwthickstun, @chrmanning, @percyliang backpackmodels.science/ Mon July 10, poster 14:00-15:30, Frontenac Ballroom and Queen’s Quay
5
35
261
48,354
It’s conference time! Come say hello at EMNLP to hear my hot takes on understanding LMs Is your CS department hiring? Hey nice come talk to me! Do you know few people at EMNLP? Not for long; come talk to me! Here’s what I look like at a poster session when the lights go out
6
16
233
54,576
I wrote a note on linear transformations and symbols that traces a common conversation/interview I've had with students. Outer products, matrix rank, eigenvectors, linear RNNs -- the topics are really neat, and lead to great discussions of intuitions. cs.columbia.edu/~johnhew//fu…
6
23
231
21,734
This winter, I’ll be helping @chrmanning teach NLP with Deep Learning (CS224n). Every year, we attempt to update the course to best teach our students. For this, I am learning from how others teach topics in NLP. Please share your favorite technical explanation of an NLP topic!
9
15
217
Ever added new words to the vocabulary of your language model only to generate from it and have it generate gibberish? In a technical blog post I detail why this happens, and that representing new words as an average of existing words solves the problem. nlp.stanford.edu/~johnhew/vo…
6
48
216
Understanding and control are two sides of the problem of communicating differing concepts between humans and machines. New position paper: Robert Geirhos, @_beenkim, and I argue we must develop neologisms - new words - for human and machine concepts to understand and control AI
13
29
191
51,576
New work! Gemma3 can explain in English what it learned from data – when we distill that data into a new word (embedding) and query it for a description of the word. Gemma explained a word trained on incorrect answers as: “a lack of complete, coherent, or meaningful answers...”
4
30
191
36,807
We characterize and improve on language model _truncation sampling_ algorithms, like top-p and top-k. We frame them as trying to recover the true distribution support from an implicitly _smoothed_ neural LM, and provide a better sampling algo! Paper arxiv.org/pdf/2210.15191.pdf
5
35
162
Learned a lot about LSTM behavior -- in very different ways -- from two excellent @acl2018 papers: Sharp Nearby, Fuzzy Far Away... by @ukhndlwl, He He, Peng Qi, and @jurafsky, and LSTM as Dynamically Computed... by @omerlevy_ , @kentonctlee, @nfitz, @lukezettlemoyer.
25
133
I’ll be at neurips for a bit! If you want to talk in person about a PhD in my lab at Columbia, book a slot here: calendar.app.google/RWkDQVvm… If your organization wants to fund LLM understanding/interpretability/control research, reach out to me!
3
9
118
14,080
If you're adding new tokens to Gemma, you're likely running into the "all logits are negative, so randomly init embedding with a logit of ~0 dominates the softmax" problem! Averaging existing embeddings solves this by bounding KL from initial model. See: nlp.stanford.edu/~johnhew/vo…
Gemma cant handle training with added tokens... maybe you were right @Mascobot - we aint getting chatml yet lol
4
15
118
29,892
I'll be at ICML this year! Reach out if: - you want to chat -- great! -- sign up here calendar.app.google/qtDkRmS1… and/or DM me. - you want to fund my lab @ Columbia -- also great! -- research into deeply understanding language models for alignment, safety, performance. email me.
5
10
118
15,921
Teaching and mentorship are key reasons why I chose to join academia. This img is some of my not-great freshman grades. I know every student needs different support at different times, and every student contributes different skills. Come to New York and learn with me!
1
2
105
8,713
Teaching CS224N (twice now!) with @chrmanning has been one of the most rewarding parts of my PhD, not least because the notes and videos are public. Lots of exciting new lectures (RLHF, generation,++) here, as well as refined Transformers and pretraining lectures!
A 2023 update of the CS224N Natural Language Processing with Deep Learning YouTube playlist is now available with new lectures on pretrained models, prompting, RLHF, natural language and code generation, linguistics, interpretability and more. #NLProc piped.video/playlist?list=PL…
10
106
19,036
I'll be at ACL2023! If you're there and don't know anyone, come say hi! (Or let your students know I'm happy to chat!) I'll be presenting Backpack Language Models backpackmodels.science/ and helping give a tutorial on Generating Text from Language Models!
5
3
100
14,450
How do we design probes that give us insight into a representation? In #emnlp2019 paper with @percyliang, our "control tasks" help us understand the capacity of a probe to make decisions unmotivated by the repr. paper: arxiv.org/abs/1909.03368 blog: nlp.stanford.edu/~johnhew/in…
1
23
93
It's #acl2020nlp and one of the best parts of a conf is meeting new people. If you'd like to chat #nlproc, and especially if you didn't have the money to sign up for the conference, email me to chat for 30min! I can talk research, admissions, grad school++. email on my website!
2
7
97
Guess what it’s STILL conference time this time NeurIPS! Just got in; everything in this tweet holds true, come talk to me
It’s conference time! Come say hello at EMNLP to hear my hot takes on understanding LMs Is your CS department hiring? Hey nice come talk to me! Do you know few people at EMNLP? Not for long; come talk to me! Here’s what I look like at a poster session when the lights go out
1
3
87
21,024
Come chat with me at our ICML poster about interpretability as a communication problem, and the need to derive new words for referencing language model concepts! 4:30PM-7, East Exhibition Hall A-B #E-500 We Can’t Understand AI Using our Existing Vocabulary
Understanding and control are two sides of the problem of communicating differing concepts between humans and machines. New position paper: Robert Geirhos, @_beenkim, and I argue we must develop neologisms - new words - for human and machine concepts to understand and control AI
2
10
79
15,570
Lecture 1: Text Representation and Language Modeling cs.columbia.edu/~johnhew/com… Lecture 2: Tokenization cs.columbia.edu/~johnhew/com…
2
5
78
4,401
Very thankful for the chance to give this talk! Students interested in understanding neural representations of language, I’d love if you came and gave your thoughts and perspectives on this ongoing work on the probing methodology.
We are very excited to announce our next speaker!! 🗣John Hewitt @johnhewtt talking with us about ❓"Language Probes as V-information Estimators" 🗓Sept 9nd, 14:00 UTC 📝Sign up here: eventbrite.co.uk/e/nlp-with-…
1
11
73
It's #emnlp2020 and one of the best parts of a conf is meeting new people. If you'd like to chat #nlproc, and especially if you didn't have the money to sign up for the conference, email me to chat for 30min! I can talk research, admissions, grad school++. email on my website!
70
Come see this panel I'll speak on! There's so much to understand about language models that it's a good thing we have multiple rich subcommunities with differing perspectives and expertise -- this panel will facilitate sharing ideas and refining goals.
BlackboxNLP will this year feature a panel discussion on "Mechanistic Interpretability". We hope this panel may serve as a way of creating stronger bridges between interpretability in NLP and MI! We are now collecting questions for the discussion here: forms.gle/uFKi19aMCQ2GmhPHA
4
47
13,670
So a lot of people have arrived here; please read @nsaphra's excellent take on neural net probes and @nelsonfliu's comprehensive neural net probing study, both also at #naacl2019 nitter.app/nsaphra/status/1099978… Saphra: arxiv.org/abs/1811.00225 Liu: homes.cs.washington.edu/~nfl…
I'm still prepping the camera-ready for my @naacl paper, but if people take away one thing, I want it to be that they should be specific in what they mean when they say a representation "encodes" some linguistic property, and to recognize the drawbacks of their definition.
1
8
46
In analysis of neural nets, there’s no single right way to “probe” the neural net’s representations. In this opinion piece, we draw from neuroscience to enumerate a few distinct goals of probing and how each guides the design of the probe.
Check out our short opinion piece where we draw parallels between investigating brains and neural nets! "Probing artificial neural networks: insights from neuroscience" arxiv.org/abs/2104.08197 Written with @NogaZaslavsky and @JohnHewtt for the #brain2AI #ICLR2021 workshop. 1/
1
6
45
I'll be excitedly yammering about structural probes and finding syntax in unsupervised representations today at 4:15 in Nicollet B/C #naacl2019. Even if you don't ❤️ parse trees, come by to learn a method to tell if your neural network softly encodes tree structures!
4
38
This is a nice paper: On the (un)reliability of feature visualizations [Geirhos et al] arxiv.org/pdf/2306.04719.pdf Shows that vision model feature visualizations don't pass some sniff checks -- they can show plausible things unrelated to behavior on real inputs.
1
7
39
19,255
I'm so glad this content is now freely available. As head TA this pas year, I had the privilege of writing and giving 3 lectures: on self-attention & Transformers, pretraining, and model analysis & explanation. I hope many find them useful in their studies!
Looking for a series to binge-watch with more depth? We are delighted to make available the latest CS224N: Natural Language Processing with Deep Learning. New content on transformers, pre-trained models, NLG, knowledge, and ethical considerations. #NLProc piped.video/playlist?list=PL…
3
39
I enjoyed chatting with @waleed_ammar and @nlpmattg on #nlphighlights about my paper with @chrmanning on finding syntax in word representations. I'm very grateful to have had this opportunity to talk (at length!) about my work!
#nlphighlights 88: John Hewitt @johnhewtt talks about probing word embeddings for syntax by projecting to a vector space where the L2 distance between a pair of tokens approximates the number of hops between them in the dependency tree. bit.ly/2vLVsU8
4
39
We split the problem of extrapolation to lengths not seen at train time in NNs into 1. what content to generate? 2. where to put EOS? Give up on 2 and NNs learn very different dynamics; better at 1! BlackBoxNLP arxiv.org/pdf/2010.07174.pdf Ben Newman, me, @percyliang @chrmanning
1
8
36
Excited to give a talk at the interplay workshop tomorrow! Come say hi! Alas, it’s my only day at COLM. Catch me at the coffee breaks or the roundtable.
✨ The schedule for our INTERPLAY workshop at COLM is live! ✨ 🗓️ October 10th, Room 518C 🔹 Invited talks from @sarahwiegreffe @johnhewtt @amuuueller @kmahowald 🔹 Paper presentations and posters 🔹 Closing roundtable discussion. Join us in Montréal! @COLM_conf
2
40
9,220
I'm giving a talk on designing and interpreting probing methods for understanding neural representations at EMNLP, Hall 2C, today at 1:30!
1
37
LMs make low-rank distributions (hidden dim < vocab_size) -> unavoidable errors! But samples are great if you use nucleus/top-k sampling 🤔. Matt: truncation sampling can fix low-rank errors, AND we can use the low-rank basis to find good tokens below the truncation threshold!
Nucleus and top-k sampling are ubiquitous, but why do they work? @johnhewtt, @alkoller, @swabhz, @Ashish_S_AI and I explain the theory and give a new method to address model errors at their source (the softmax bottleneck)! 📄 arxiv.org/abs/2310.01693 🧑‍💻 github.com/mattf1n/basis-awa…
5
30
14,040
Congratulations to Ben Newman, who spearheaded the work, for winning Outstanding Paper at #BlackBoxNLP, and thanks to the organizers and reviewers for your efforts! Congrats as well to the winners of the other Outstanding Paper award!
4
28
Key idea: Vector spaces have distance metrics (L2); trees do too (# edges between words). Vector spaces have norms (L2); rooted trees do too (# edges between word and ROOT.) Our probe finds a vector distance/norm on word representations that matches all tree distances/norms 2/4
2
1
28
This claim, that parse trees are embedded through distances and norms on your word representation space, is a structural claim about the word representation space, like how vector offsets encode word analogies in word2vec/GloVE. We hope people have fun exploring this more! 4/4
3
1
24
My favorite deeper dive experiment in this paper: we wondered if putting the question _before_ the documents would remove the U-shaped effect, since the autoregressive contextualization would "know" what info to look for when processing each doc. Nope! The trend still holds.
1
1
23
3,939
It’s #naacl2021 and one of the best parts of a conf is meeting new people. If you’d like to chat #nlproc, and especially if you couldn’t make it to the conference, email me to chat for 30 min! I can talk research, admissions, grad school++. email on my website!
1
2
20
The position paper is We Can’t Understand AI Using Our Existing Vocabulary arxiv.org/pdf/2502.07586 Feedback and discussion are very welcome.
7
1
19
1,950
We’re coming to the end of #cs224n and it’s so good to see students excitedly discussing the results of their work at the end of the quarter. I’m grateful to our 28 TAs for making the course work.
The #cs224n poster session is happening now! We are super excited about amazing, cutting-edge NLP posters from ~650 students!
1
18
8,833
I'm also deeply committed to how open research dovetails with open teaching. I've twice co-taught Stanford's CS 224n: Natural Language Processing with Deep Learning; you can find some of my lectures here piped.video/watch?v=LWMzyfvu… and here piped.video/watch?v=DGfCRXuN… !
18
3,097
Ruth-Ann's great work building a Jamaican Patois Natural Language Inference dataset was picked up by Vox as part of its video "Why AI doesn’t speak every language." Happy to see Ruth-Ann's work (and disparities in NLP across languages) get this general audience coverage.
Check out this Vox video I was featured in where I chat about JamPatoisNLI which I worked on with @chrmanning and @johnhewtt! Many thanks to @PhilEdwardsInc for platforming our work piped.video/a2DgdsE86ts
5
17
10,317
Excited to give this talk! Tidbits: 1) Could finite-precision RNNs implement (bounded) stacks without access to an external stack? Yes, efficiently! 2) We train probabilistic models in NLP but prove things about acceptors; what if we connect language models to formal languages?
Join John Hewitt @johnhewtt, computer science PhD student at @Stanford, for his talk on November 12 at 11am entitled "The Unreasonable Syntactic Expressivity of RNNs." Details can be found here: isi.edu/events/calendar/1337…
1
15
Base models don’t follow instructions. We find that _response tuning_ (training on responses with no instruction) yields instruction following. Does that show we just need to teach the response distribution?
1
1
15
1,632
These distances/norms reconstruct each tree, and are parametrized only by a single linear transformation. What does this mean? In BERT, ELMo, we find syntax trees approximately embedded as a global property of the transformed vector space. (But not in baselines!) 3/4
1
2
15
Just a few minutes out! Come attend or watch the livestream, or reach out to me afterward if you couldn’t attend but would like to chat about the topic!
We are very excited to announce our next speaker!! 🗣John Hewitt @johnhewtt talking with us about ❓"Language Probes as V-information Estimators" 🗓Sept 9nd, 14:00 UTC 📝Sign up here: eventbrite.co.uk/e/nlp-with-…
1
2
14
Backpacks are an alternative to Transformers: intended to scale in expressivity, yet provide a new kind of interface for interpretability-through-control. A backpack learns k non-contextual sense vectors per subword, unsupervisedly decomposing the subword's predictive uses.
1
3
14
6,586
This is work with @nelsonfliu @chrmanning @percyliang and is my last paper at Stanford NLP. It’s been a blast finding these very odd results.
1
15
1,908
My work has discovered structure in language models - through the structural probe (aclanthology.org/N19-1419/), refined probing methods (aclanthology.org/D19-1275.pd…), and formalizing how models construct usable information about the solutions to hard problems (aclanthology.org/2021.emnlp-…).
3
13
2,925
Exciting work at #acl2020nlp in characterizing cross-lingual syntactic structure in multilingual BERT! Congrats Ethan!
Does Multilingual BERT share syntactic knowledge cross-lingually? In #acl2020nlp paper w/ @johnhewtt and @chrmanning, we visualize its syntactic structure & show it's applicable to a variety of human languages. Paper: arxiv.org/abs/2005.04511 Blog: ethanachi.com/multilingual-p… (1/4)
13
This work, with @mhahn29, @SuryaGanguli, @percyliang, @chrmanning, has been a fascinating and challenging new direction for me, and I'm deeply appreciative to them for enabling me to pursue it. Construction code: github.com/john-hewitt/dyckk… Learning code: github.com/john-hewitt/dyckk…
1
12
In my blog post, I argue that probing is a clear tool to characterize knowledge in neural networks when we didn't tell the network how to represent that knowledge. nlp.stanford.edu//~johnhew//… The code should be very useful for probing studies! github.com/john-hewitt/condi…
1
2
12
Further, we can and must design LMs for our understanding, not just performance: I introduced the Backpack, an architecture that brings many of the control and understanding benefits of linear models and word2vec with the power of the Transformer. (aclanthology.org/2023.acl-lo…)
1
2
12
3,597
I’m most interested in in-depth lectures or technical explainers, less interested in surface-level introductions. I’m also focused on (arguably) newer topics, since in these cases, I think personal opinions on the topics tend to come through stronger in pedagogical materials.
1
12
We see this as a step towards developing new language tools for learning about how language models store, process, and reason about potentially complex concepts—differently from how we do. Work with Oyvind Tafjord, Robert Geirhos, @_beenkim Blog here: cs.columbia.edu/~johnhew//ne…
1
13
1,714
To represent a word in context, Backpacks use information from the whole context to non-negatively weight the senses of all subwords in the context. So, the contribution of each sense is always towards predicting the same words; only the magnitude changes.
1
2
12
1,830
But then we also find that you don’t have to teach the response distribution. Fientuning (instruction-response) just on poetry, or just on math, or just python programs, leads to, e.g., recipe generation. It’s fascinating how little of the finetuning distribution comes across.
1
11
1,247
I’m deeply thankful to my co-authors on this work, @jwthickstun @chrmanning @percyliang. ArXiv! arxiv.org/abs/2305.16765 Demos here! By Lora Xie. huggingface.co/spaces/stanfo… Huggingface! huggingface.co/stanfordnlp/b…
11
1,860
By instead initializing new embedding to the average of existing embeddings, you guarantee that the partition function of the softmax grows by at most 1/n where n is the initial vocab size--- so the distrib doesn't change much!
10
2,936
e.g., >>> torch.max(model(tok('I like pizza', return_tensors='pt')['input_ids']).logits) tensor(-4.5862) So, if you add a new word, since you randomly init the embedding, it gets dot product ~0 with hidden states. Softmax([-4,-4,..., 0]) puts mass on the elt with 0!
1
10
3,788
I think there's a space of interesting work (and future work) around initializing new word embeddings (e.g., for domain adaptation) using more information -- about orthography, about the finetuning distribution, etc.; averaging will be a baseline to beat.
2
10
So, it isn’t just sample-efficient to instruction-tune LMs. Even seemingly totally deficient adaptations yield instruction following. I think this bears a lot more exploration! Blog: nlp.stanford.edu/~johnhew/in… GitHub: github.com/john-hewitt/impli…
1
10
2,317
We modeled derivational morphological transformations separately as orthographic and distributional functions, then combined: go see @_danieldeutsch present our paper on English derivational morphology in oral session 6D today at ACL! aclweb.org/anthology/P18-118…
2
8
We give a qualitative example where we sample many times, and ask the model to score its own outputs. We distill its preferences into a word 'Good_M', as in, 'Give me responses you'd think are Good_M'. Negating, 'Not Good_M', makes the model generate responses it scores lowly.
1
9
2,216
To make this concrete, we show: even just taking a product between a pretrained LM and a hand-written rule-based LM with only 3 rules also yields rough instruction following. The rules are: upweight EOS slowly, uniformly change 15 words’ probs, penalize repetition.
1
9
1,552
Maybe! We didn’t try this, but it’s a nice hypothesis. I do think that there’s probably a nice middle ground between instruction tuning and response tuning in terms of the amount of information provided about the instruction.
1
8
352
For one example, we observe that certain aspects of gender bias in career nouns (e.g., nurse, CEO) is represented by a Backpack in a particular sense vector (pointing towards, e.g., “he”, “she”.) By “turning down” this sense, this aspect of gender bias is reduced.
1
9
1,573
I was fascinated at the emergent structure of sense vectors, and I’m really excited to see what LM interpretability research the Backpack enables. We can design architectures that scale and learn to do some of the interpretability work for us.
1
8
1,957
We analyze a few truncation-sampling algorithms, and find that our eta-sampling leads to more plausible long English documents, breaks out of repetition better, and more reasonably truncates low-entropy distributions. With @chrmanning, @percyliang Blog: nlp.stanford.edu//~johnhew//…
8
This work, led by @ruthstrong_, provides a great new language resource in Jamaican Patois, and studies transfer in multilingual and monolingual LMs! One opportunity: studying how model predictions change as a sentence moves closer to or farther from the high-resource English.
JamPatoisNLI provides a dataset and examines how well you can do transfer to low-resource creoles like Jamaican Patois, versus other recent results for low-resource NLI. By @ruthstrong_ @johnhewtt @chrmanning. At Multilingual Rep’n Workshop. #emnlp2022 nlp.stanford.edu/pubs/armstr…
4
7
This can harm finetuning. I also show that a simple, popular heuristic -- just averaging all existing embeddings -- guarantees that adding new words doesn't deviate much from the pretrained LM, solving this problem.
1
8
Replying to @Teknium @teknium
Right; it's pretty rare. Most models don't have this issue. You can see for yourself by loading up gemma-2b: >>> torch.max(model(tok('I like pizza', return_tensors='pt')['input_ids']).logits) tensor(-4.5862) bc max logit << 0, any new token would dominate in probability
1
6
478
Replying to @devadityamohan1
Doing my best to release the videos; can't make promises, unfortunately. Glad the resources have been useful! I'll release as much as I can.
7
521
Our results are about what's possible, not what's learned. But a drop of empirical results: while RNNs don't learn Dyck-k in practice (aclweb.org/anthology/W19-390…), they can learn Dyck-(k,m) well, even with a vanishingly small fraction of the possible stack states seen at training!
1
6
Replying to @aryaman2020
legally I have no idea. If you want to give me $7 for a coffee though I’ll have my people talk to your people.
3
7
1,171
In one example, we taught Gemma a neologism that causes single-sentence answers. When asked for synonyms of this new word, it suggested “lack,” as in, “Give me a lack answer.” This didn’t look right, but indeed causes very curt answers. We call this a machine-only synonym.
1
9
2,011
Finetuning that is deficient compared to instruction tuning, yet still yields instruction following, we call _implicit instruction tuning _. Why does this happen? Well, one thought is that the difference between a base LM and instruction-following LM is ‘simple.’
1
6
1,082
Replying to @luis_hacm
Requirements: the university has some application requirements on the website. For my lab, no explicit requirements; I hope to hire curious, driven students with evidence of potential for creative, independent research. My website has more info.
1
6
1,547
In our new work, Neologism Learning for Controllability and Self-Verbalization (arxiv.org/pdf/2510.08506v1), we show that by asking Gemma about the new word ~concept, like “what’s a synonym for ~concept”, gemma can self-verbalize, generating English descriptions of the concept.
1
6
830
In neologism learning [HGK25] we freeze a language model, initialize one new word embedding, place that word in natural language contexts, and train it to optimize a loss on training examples that define some concept. Simple parameter-efficient finetuning, but you get a new word.
1
1
6
1,104
I’ll be on this panel! Come say hello!
Exciting update: we are opening up our upcoming mentoring session to the public. Use this link to join the webinar on Wednesday: umich.zoom.us/j/91364965401
2
6
Our proof is constructive, exactly specifying weights of 1-layer RNNs (and a separate mechanism for the LSTM using just gates) that allow RNNs to push/pop from internal stack and create probability distributions over the next token, encoding what's possible in Dyck-(k,m) strings.
1
6
On Dec 7, I'll be presenting Truncation Sampling as Language Model Desmoothing At the GEM workshop! One practical takeaway: word-level truncation decisions (from top-p or eta-sampling) can be unintuitive. A colab in which you can try these yourself: colab.research.google.com/gi…
2
6
To make this concrete, k: vocab size. m: max nesting depth. Let's say vocab size is 100k, and max nesting depth is 3. (Empirically, 3 is not a bad approx. of human language.) Then before: approx 10^20 hidden units needed (give or take a few powers). We prove 150 units suffices.
1
6
Replying to @johnhewtt @yoavgo
re: other languages: I expect so : ) ; we'll see. Some things in the works; lots of follow-up work to be done (hopefully by many people!) re: syntax reps and head choices -- I'd love to hear more about that! Which representations (UD/SD/?) do ELMo/BERT match best? etc.
2
6
Scott Aaronson's note scottaaronson.com/writings/b… is a delightful introduction to reasoning about large numbers, leading up to the Busy Beaver numbers. Years after finding that article, what fun to find Busy Beaver numbers in proofs on RNNs! arxiv.org/pdf/1711.05408.pdf
2
5
The neologism framing is clarifying for interp, e.g., at what level of abstraction should we search for model concepts? Neologisms in languages (e.g., 'vibes', 'doomscroll') hit moderate levels of abstraction (if too low-level, not common enough. too abstract: not informative.)
1
5
999
Intuitively, in early-stopped neural LMs optimizing KL, there's good reason to put "a bit of probability mass everywhere", to hedge and avoid very high loss. This smoothing is good for scoring, like in n-gram models, but bad for generation, since mass is on non-language strings.
1
5