Aman Arora · Sep 22, 2025 · 10:37 PM UTC

Aman Arora

Pinned Tweet

22 Sep 2025

🧵 Most modern LLMs like Qwen, DeepSeek & gpt-oss use YaRN to extend context from 4K→128K tokens. But what led to YaRN? Today I'm proud and excited to share a comprehensive resource into the evolution of positional embeddings such as APE, RoPE, YaRN & variants👇 1/n

2,289

Aman Arora · Jul 10, 2024 · 11:58 PM UTC

Aman Arora @amaarora

10 Jul 2024

I have primarily switched to Claude 3.5 Sonnet and hardly use GPT-4. Anybody else?

399

2,059

355,048

Aman Arora · Feb 19, 2020 · 9:48 PM UTC

Aman Arora @amaarora

19 Feb 2020

1/ After weeks of learning, I am proud to share - "The Annotated GPT-2" ladies and gentleman! In this post, I re-implement OpenAI's GPT-2 in PyTorch using @huggingface source code and try to explain all the magic that goes on inside the model. amaarora.github.io/2020/02/1…

323

1,245

Aman Arora · Aug 2, 2021 · 9:32 AM UTC

Aman Arora @amaarora

2 Aug 2021

I've been working on Object Detection for the past few weeks - and I am proud to announce "The Annotated DETR" !! amaarora.github.io/2021/07/2… In this post, I try to explain all the magic that goes on inside the architecture. 1/n

116

506

Aman Arora · Jan 13, 2021 · 12:14 PM UTC

Aman Arora @amaarora

13 Jan 2021

After days and hours of learning, I am very excited to share my latest blog post "The EfficientDet Architecture in PyTorch"! amaarora.github.io/2021/01/1… In this post, I reference @wightmanr's source code and try to explain all the magic that goes on inside the network. 1/n

369

Aman Arora · May 2, 2024 · 1:08 AM UTC

Aman Arora @amaarora

2 May 2024

What is the currently the best way to extract JSON from unstructured text using open source models by passing in a Pydantic schema? So far I have been looking into: 1. Guidance (github.com/guidance-ai/guida…) 2. Instructor (github.com/jxnl/instructor) 3. DSPy (github.com/stanfordnlp/dspy) 4. Guardrails-AI (github.com/guardrails-ai/gua…) 5. jsonformer (github.com/1rgs/jsonformer) Guidance and Instructor seem to have better openai compatibility. Getting them to work with open-source models seemed like a pain. Anyone have a working demo already? Anything else I should be trying? Also, is Hermes-2-Pro-Mistral-7B.Q8_0.gguf still the best go to model for this task? huggingface.co/NousResearch/…. I don't see a LLAMA-3 version of Nous-Hermes out yet. Finally, I have also been looking into llama-cpp-agent, has anyone tried this before? Seems to be working pretty well so far! github.com/Maximilian-Winter…

314

51,956

Aman Arora · Nov 2, 2019 · 10:34 PM UTC

Aman Arora @amaarora

2 Nov 2019

Investing time in @fastdotai is one of the best investments I have ever made. To continue to learn, I am starting a new series #CodeFirst where I will digging deep into the source code. This builds on top of @jeremyphoward code walkthrus. medium.com/@aman.arora0210/f…

291

Aman Arora · Mar 15, 2021 · 9:48 PM UTC

Aman Arora @amaarora

15 Mar 2021

Very excited to share my latest blog post on Optimizers called `"Adam" and friends`! amaarora.github.io/2021/03/1… In this blog post we are going to re-implement SGD, Momentum, RMSprop & Adam from scratch and also compare performance with PyTorch's implementation. 1/

288

Aman Arora · Apr 12, 2021 · 8:20 PM UTC

Aman Arora @amaarora

12 Apr 2021

I’m excited to share that I’ve joined @wandb! This means - more paper summaries, more research, more community events, more paper reading groups, more @fastdotai study groups, more open source contributions, more fun. :)

282

Aman Arora · Oct 30, 2019 · 12:52 AM UTC

Aman Arora @amaarora

30 Oct 2019

1/ Not only is @fastdotai great for building deep learning models, it is also an excellent place to learn! By reading 21 pages of cs231n.github.io/convolution… resource mentioned in the pets lesson of V2 bit.ly/34dUNtS, I had several AHA moments! Such as,

257

Aman Arora · Jul 8, 2024 · 10:56 PM UTC

Aman Arora @amaarora

8 Jul 2024

Excited to share a new blog post on Gemma 2 that goes into the details of: Grouped Query Attention, Sliding Window Attention, Rotary Position Embeddings (RoPE), Logit soft-capping & model-merging. **All with easy to follow PyTorch implementations!** 1/N

264

31,382

Aman Arora · May 4, 2021 · 10:46 PM UTC

Aman Arora @amaarora

4 May 2021

Super excited to present my latest blog post on ResNet-RS - "Revisiting ResNets: Improved Training and Scaling Strategies". bit.ly/2QT3yIU I also share code implementation in PyTorch using TIMM & more! 1/3

241

Aman Arora · Jul 2, 2024 · 3:47 AM UTC

Aman Arora @amaarora

2 Jul 2024

Trust me when I tell you that the below code implements Grouped Query Attention (GQA), Multi Head Attention (MHA) & Multi Query Attention (MQA). There is no magic to it. Paper (GQA): arxiv.org/abs/2305.13245 Implementation adapted from: github.com/meta-llama/llama/…

237

22,760

Aman Arora · Dec 25, 2019 · 4:15 PM UTC

Aman Arora @amaarora

25 Dec 2019

I am not sure if I should be scared or happy - with Uber's latest Plug & Play Language Model (arxiv.org/abs/1912.02164) it is now possible to drive LM's activations (such as GPT-2) and generate text with a specific sentiment on a specific topic. Is this dangerous? Time will tell.

219

Aman Arora · Aug 16, 2020 · 10:58 PM UTC

Aman Arora @amaarora

16 Aug 2020

It brings me great excitement as I share my latest blog on EfficientNet for two reasons: - Efficientnet-B7 achieved new SOTA while being 8.4 times smaller and 6.1 times faster than GPIPE - Recent and current SOTA have all been related to EfficientNets amaarora.github.io/2020/08/1… 1/

221

Aman Arora · Sep 14, 2020 · 12:11 AM UTC

Aman Arora @amaarora

14 Sep 2020

It's Monday and I am pretty excited to release my latest blog post "U-Net: A PyTorch Implementation in 60 lines of Code". amaarora.github.io/2020/09/1… I was able to train this network (without pretrained weights) for SIIM ACR Pneumothorax Kaggle Competition and get 0.79 dice score. 1/

217

Aman Arora · Jul 4, 2024 · 5:24 AM UTC

Aman Arora @amaarora

4 Jul 2024

After digging deep into HF's implementation of the LongFormer architecture, I have written a new blog post that explains SWA and shows how to implement in PyTorch. amaarora.github.io/posts/202… Continue reading this thread for a short summary. 1/

Sliding Window Attention: Longformer Explained with Animations and PyTorch

In this post, we take a deep dive into Sliding Window Attention that allowed transformers to have long context length. We do this with the help of animations and also implement it from scrath in...

amaarora.github.io

217

25,002

Aman Arora · Jun 8, 2021 · 4:56 AM UTC

Aman Arora @amaarora

8 Jun 2021

I'd love to be able to make beautiful visualizations for my future blog posts. For example, below I share fig-2 from the Weight Standardization paper. Does anybody any good tools that are fairly easy to use? (Don't want to spend months learning new tools.)

195

Aman Arora · Apr 10, 2021 · 10:38 PM UTC

Aman Arora @amaarora

10 Apr 2021

Here's a thread on why I write blogs and how that has completely changed my life. amaarora.github.io/ 1/

202

Aman Arora · Mar 29, 2021 · 6:43 PM UTC

Aman Arora @amaarora

29 Mar 2021

Now you can extract activation statistics from any module inside `timm` models easily using Unix filename pattern matching and @PyTorch hooks! In this example we extract average square channel mean of activations after every residual block inside a ResNet-50 model: 1/

188

Aman Arora · Nov 8, 2020 · 8:49 AM UTC

Aman Arora @amaarora

8 Nov 2020

What are some of the best books that really help you think about "how to design software?" Particularly after something that is: - Ideally for Python users - Mentions the key steps in designing/testing software - Mentions the tools - Helps think about key design decisions

176

Aman Arora · Jul 16, 2020 · 2:06 PM UTC

Aman Arora @amaarora

16 Jul 2020

1/ Wouldn't it be great if someone explained to you exactly what Resnet does in great detail and that too in a simple language? Fastbook's chapter 14 - ResNets (github.com/fastai/fastbook/b…) does exactly that! Thanks @jeremyphoward and @GuggerSylvain ! :)

173

Aman Arora · Apr 27, 2021 · 2:14 AM UTC

Aman Arora @amaarora

27 Apr 2021

"Could @huggingface Accelerate really be this easy?" I asked myself, and the result is this blog post where we take a deep-dive into the source code of the package. wandb.ai/wandb_fc/pytorch-im… Thanks @GuggerSylvain - you've done it again!! A thread: 1/n

An Introduction to HuggingFace's Accelerate Library

In this article, we dive into the internal workings of the Accelerate library from HuggingFace, to answer "could Accelerate really be this easy?".

wandb.ai

173

Aman Arora · Feb 19, 2020 · 9:49 PM UTC

Aman Arora @amaarora

19 Feb 2020

Replying to @amaarora @huggingface

Special thanks to @math_rachel and @jeremyphoward for the brilliant NLP course that really helped me in my journey to start learning about Transformers and NLP in general. fast.ai/2019/07/08/fastai-nl…

167

Aman Arora · Apr 13, 2021 · 9:07 PM UTC

Aman Arora @amaarora

13 Apr 2021

Super excited to share my latest blog on "Normalizer-Free ResNets" by @DeepMind !! Blog: bit.ly/3g9igFJ Paper: arxiv.org/abs/2101.08692 The idea is to explain everything in detail in a simple language & also show code implementation in @PyTorch. :) A thread: 1/5

Weights & Biases

Weights & Biases, developer tools for machine learning

wandb.ai

160

Aman Arora · Jan 18, 2021 · 11:40 PM UTC

Aman Arora @amaarora

18 Jan 2021

.@dr_hb_ai and I have teamed up to present the 1st blog in "TIMM SERIES" on "Vision Transformer"! amaarora.github.io/2021/01/1… Thanks to @dr_hb_ai's contributions, IMHO, this is one of the prettiest and most detailed blogs on ViT so far. We also share code implementations! 1/n

159

Aman Arora · Sep 12, 2020 · 2:56 PM UTC

Aman Arora @amaarora

12 Sep 2020

Always wanted to write code some day that does something productive and fits in a single screen. New blog post on "U-Net using PyTorch" coming out this Monday 9am AEST! :)

157

Aman Arora · May 28, 2021 · 4:58 AM UTC

Aman Arora @amaarora

28 May 2021

It's hard for me to contain my excitement as I share this with you! @fastdotai has been at the core of all my learnings and I look forward to sharing the love for this library with you through fastbook reading sessions at @wandb for the next ~20 weeks! A thread: 1/n

Lavanya

@lavanyaai

26 May 2021

Join @jeremyphoward & the @wandb team for our 1st Fastbook reading session on June 3, 2021 at at 8pm PST / 1pm AEST! Over the next 20 weeks, we'll dive into this hands-on-guide to deep learning. 📍 Register: wandb.me/fastbook #deeplearning #machinelearning

149

Aman Arora · Mar 2, 2023 · 9:11 AM UTC

Aman Arora @amaarora

2 Mar 2023

Best post on Transformers till date - The annotated transformer. nlp.seas.harvard.edu/annotat…

147

9,367

Aman Arora · Aug 9, 2020 · 9:32 PM UTC

Aman Arora @amaarora

9 Aug 2020

1/ It's Monday and as promised I am back with another blog post - "Group Normalization". amaarora.github.io/2020/08/0… As a summary, we look at: - What is GN - In which cases you might want to try GN as opposed to BN - Other norm techniques like LayerNorm and InstanceNorm (briefly)

147

Aman Arora · Jul 11, 2022 · 1:08 AM UTC

Aman Arora @amaarora

11 Jul 2022

Excited to bring to you the only resource that you'll need to understand "Swin Transformers" V1 (with #PyTorch code implementation!). amaarora.github.io/2022/07/0… A 🧵:

147

Aman Arora · Aug 20, 2020 · 11:25 PM UTC

Aman Arora @amaarora

20 Aug 2020

I am elated and humbled to have won my first silver medal on Kaggle in recent "SIIM-ISIC Melanoma Classification" competition. This is all thanks to the wonderful community and open source projects around me especially @fastdotai! Detailed write-up and journey coming soon.

145

Aman Arora · Jul 3, 2024 · 1:34 PM UTC

Aman Arora @amaarora

3 Jul 2024

OMG! "Sliding Window Attention" is seriously a wild concept to wrap your head around! 🤯 github.com/huggingface/trans…

148

15,418

Aman Arora · Mar 12, 2023 · 7:53 PM UTC

Aman Arora @amaarora

12 Mar 2023

Excited to present part-2 of Annotated CLIP (the only 2 resources that you will need to understand CLIP completely with PyTorch code implementation). amaarora.github.io/posts/202… As part of this blog post re-implement CLIP in PyTorch step-by-step using code from open clip. 1/

145

16,931

Aman Arora · Jun 23, 2021 · 4:58 AM UTC

Aman Arora @amaarora

23 Jun 2021

My biggest fear: "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running."

143

Aman Arora · Jul 27, 2023 · 8:41 PM UTC

Aman Arora @amaarora

27 Jul 2023

Thrilled to present my latest blog post: "Demystifying Document Question-Answering Chatbot - A Comprehensive Step-by-Step Tutorial with LangChain" 🔗: amaarora.github.io/posts/202… It's the most exhaustive resource that you'll find on the topic going into depths of @langchain. 1/

Demystifying Document Question-Answering Chatbot - A Comprehensive Step-by-Step Tutorial with...

Embark on an enlightening journey through the world of document-based question-answering chatbots using langchain! With a keen focus on detailed explanations and code walk-throughs, you’ll gain a...

amaarora.github.io

141

25,929

Aman Arora · Oct 18, 2020 · 11:35 PM UTC

Aman Arora @amaarora

18 Oct 2020

It's Monday and I am pretty excited to release my latest blog post "Introduction to Metric learning and Center Loss". amaarora.github.io/2020/10/1… This post is a first in a series of total 4 blog posts on Metric Learning! We start with center loss and later look at other losses. 1/

132

Aman Arora · Jun 19, 2023 · 11:29 PM UTC

Aman Arora @amaarora

19 Jun 2023

Hello everyone! 👋 I'm thrilled to share with you the journey I've embarked on in the world of AI and Machine Learning. Over the past few years, I've had the privilege of diving deep into various topics, exploring new technologies, and sharing my insights through numerous blog posts and reports. I've compiled a list of all my writings so far, each one a stepping stone in my learning journey. I hope these resources can be of help to you as they have been to me. nebula-cow-fb1.notion.site/C… I'm incredibly proud of the work I've done and the knowledge I've gained. But more than that, I'm excited about the opportunity to share it with all of you. I believe in the power of community and the collective wisdom we can build by sharing our experiences and insights. I'd love to hear your thoughts on these topics. Have you found any of these resources helpful? Do you have any favourite articles or insights you'd like to share? Let's start a conversation!

129

27,848

Aman Arora · Aug 24, 2021 · 7:38 AM UTC

Aman Arora @amaarora

24 Aug 2021

We're discussing the MDETR paper tomorrow with author @ashkamath20 tomorrow in our paper reading group at @wandb! Simply put, MDETR is a multi-modal transformer-based architecture that learns to assign free-form text to objects in an image. But, how does it do that? 1/

131

Aman Arora · Mar 6, 2021 · 9:31 PM UTC

Aman Arora @amaarora

6 Mar 2021

Did you know that the `timm` library also has over 15 available optimizers including Lookahead to choose from on top of hundreds of the pretrained models? This tutorial shows how to incorporate these optimizers in your custom PyTorch training scripts- fastai.github.io/timmdocs/Op…

123

Aman Arora · Jul 13, 2021 · 10:44 PM UTC

Aman Arora @amaarora

13 Jul 2021

Sure, a bit behind on the MLP mixer madness, but needed some time for the "Is MLP-Mixer a CNN in disguise?" debate to settle down! @dr_hb_ai and I spent hours on a call together to find the answer. The result? A new blog post! wandb.ai/wandb_fc/pytorch-im… 1/n

Is MLP-Mixer a CNN in Disguise?

As part of this blog post, we look at the MLP Mixer architecture in detail and also understand why it is not considered conv free.

wandb.ai

116

Aman Arora · Aug 25, 2021 · 10:53 PM UTC

Aman Arora @amaarora

25 Aug 2021

This is a thread on our fastbook reading group currently running at @wandb. We decided to finish the book in week-0 with @jeremyphoward, and I am so glad and happy that so many of you are with me on this journey. It's already week-12, and we're still going strong!! 1/n

115

Aman Arora · Jul 12, 2021 · 3:37 AM UTC

Aman Arora @amaarora

12 Jul 2021

Deep learning practitioners, I have a Q - Is there a nice visualization that explains why nn.Conv1d and nn.Linear are essentially the same when kernel size is 1 for conv?

114

Aman Arora · Oct 5, 2020 · 7:58 PM UTC

Aman Arora @amaarora

5 Oct 2020

Anyone interested in joining me on a journey to replicate the 1st place solution for Google Landmark Retrieval 2020 using Google Colab TPUs? solution summary: kaggle.com/c/landmark-retrie… arxiv: arxiv.org/abs/2009.05132

Google Landmark Retrieval 2020

Given an image, can you find all of the same landmarks in a dataset?

kaggle.com

108

Aman Arora · Mar 6, 2023 · 1:45 AM UTC

Aman Arora @amaarora

6 Mar 2023

Very excited to present "The Annotated CLIP (part-1)" blog, where I present an Introduction to CLIP in an easy to digest manner using pseudo-code from the paper. amaarora.github.io/posts/202… This is the first in a total of two blog posts on CLIP. 1/

The Annotated CLIP (Part-1): Understanding Contrastive Language-Image Pre-training

This post is part-1 of the two series blog posts on CLIP. In this blog, we present an Introduction to CLIP in an easy to digest manner. We also compare CLIP to other research papers and look at the...

amaarora.github.io

105

14,077

Aman Arora · Apr 8, 2021 · 12:57 PM UTC

Aman Arora @amaarora

8 Apr 2021

This just happened. ImageNet, here I come!

Aman Arora · Jul 13, 2021 · 10:17 PM UTC

Aman Arora @amaarora

13 Jul 2021

"Are fully connected and convolution layers with 1x1 kernel equivalent? If so, how?" In a quest to find the answer I ended up implementing both operations in MS Excel and compare results to PyTorch outputs! bit.ly/2VvGYrG 1/n

Aman Arora · Sep 9, 2021 · 1:10 AM UTC

Aman Arora @amaarora

9 Sep 2021

Personal Update thread: I've relocated to Delhi for a while. Sadly, my father was diagnosed with cancer, and I am here to support him in his fight. I've mostly been away from Twitter and work but slowly getting back to it now. 1/

Aman Arora · Aug 9, 2020 · 1:02 PM UTC

Aman Arora @amaarora

9 Aug 2020

New blogpost on ‘Group Normalization’ with PyTorch implementation coming tomorrow! :)

Aman Arora · Nov 18, 2021 · 11:36 AM UTC

Aman Arora @amaarora

18 Nov 2021

Before joining @wandb, I used to work for a medical startup in Sydney that was pretty heavy on compliance! Their product could diagnose around 127 diseases in Chest X-rays - but this also meant that any change in the deep learning model affected human lives directly. 1/

Weights & Biases

@wandb

11 Nov 2021

Need help with keeping track of your source code files and maintaining model & dataset versions that get served in production? In this webinar @amaarora will demo a robust framework to structure experiments with W&B. 📍Details & registration: wandb.me/webinar-registratio…

Aman Arora · Jul 21, 2021 · 11:37 PM UTC

Aman Arora @amaarora

21 Jul 2021

There's so much to learn from @kaggle.

Aman Arora · May 17, 2021 · 6:26 AM UTC

Aman Arora @amaarora

17 May 2021

Does anybody know where does the dot-product attention formula come from and why does it work?

Aman Arora · Jun 29, 2021 · 5:06 AM UTC

Aman Arora @amaarora

29 Jun 2021

Here's a wonderful article by @pandeyparul - "Building a compelling Data Science Portfolio with writing"! Having a good data science portfolio not only helps the people around you but also is great for your own personal growth! wandb.ai/parul_pandey/discus… 1/

Building a compelling Data Science Portfolio with writing

Writing in Data Science can have a transformative effect not only in your journey but also in your career. Made by Parul Pandey using Weights & Biases

wandb.ai

Aman Arora · May 31, 2021 · 11:05 AM UTC

Aman Arora @amaarora

31 May 2021

I am always happy when I have a new blog post to present. I am happy again and excited to present my latest blog on ConViT architecture. bit.ly/3c2nuAf So what is ConViT? It's ViT with the first 10 SA layers replaced with GPSA layers. Wait, what? A thread: 1/n

Aman Arora · Jul 26, 2020 · 7:50 PM UTC

Aman Arora @amaarora

26 Jul 2020

1/ It's Monday morning for me and I am back with another blog post. This time it's "Squeeze and Excitation Networks Explained with PyTorch Implementation". amaarora.github.io/2020/07/2… Research paper : arxiv.org/abs/1709.01507

Aman Arora · Jul 19, 2020 · 10:26 PM UTC

Aman Arora @amaarora

19 Jul 2020

1/ I am really excited to share my new blog post "Label Smoothing Explained using Microsoft Excel" amaarora.github.io/2020/07/1… In this post, not only do we implement Label Smoothing in Microsoft Excel step by step but also,

Aman Arora · Sep 6, 2020 · 11:32 PM UTC

Aman Arora @amaarora

6 Sep 2020

It's Monday again and I am keeping my promise of releasing a new blog post yet another week. This week's post is based on a "code-first" approach where we build a solution for SIIM-ACR Pneumothorax Segmentation competition using @PyTorch. amaarora.github.io/2020/09/0… 1/

Aman Arora · Aug 2, 2020 · 10:15 PM UTC

Aman Arora @amaarora

2 Aug 2020

1/ As is usual for Monday mornings, I am back with yet another blog post - "DenseNet Architecture Explained with PyTorch Implementation from TorchVision" amaarora.github.io/2020/08/0… In this blogpost, together, we look at-

Aman Arora · May 30, 2022 · 5:53 PM UTC

Aman Arora @amaarora

30 May 2022

I got my job at @wandb because of my blog. Having my own personal blog almost always adds an X factor to my profile every time I interview. It’s also a great way to document your learnings and I often find myself referring to my older blogs as part of revision. 1/

Scott Condron

@_ScottCondron

30 May 2022

Starting to write online was one of the most impactful decisions I've made in my career so far ✍️ I've gotten job opportunities, met amazing people and learned a lot from writing online. We are running a "blogathon" at W&B to get more people writing. wandb.me/blogathon 1/4

Aman Arora · Apr 21, 2021 · 9:29 PM UTC

Aman Arora @amaarora

21 Apr 2021

I implemented Resnet-RS in @PyTorch using TIMM and trained on ImageNet for the first time!! Paper: arxiv.org/abs/2103.07579 Here's what I learnt: bit.ly/2Qp6VqP

Revisiting ResNets: Improved Training and Scaling Strategies

Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies....

arxiv.org

Aman Arora · Nov 11, 2020 · 7:27 PM UTC

Aman Arora @amaarora

11 Nov 2020

Lots of information packed in this article "Make Delegation Work in Python" by @jeremyphoward ! bit.ly/2UeTpEB Apart from learning about how Python deals with `**kwargs`, I also got to know about the "DRY" principle! ;)

Aman Arora · Jul 6, 2021 · 11:38 AM UTC

Aman Arora @amaarora

6 Jul 2021

I recently summarised what’s new in CV. The talk summarises CV architectures’ progress over time. Starting with AlexNet in 2012 to Transformers, NfNet and MLP Mixers in 2021. I also shared my views about publicly sharing your work. Thanks @QLD_AI_Hub for hosting me.

Queensland AI Hub @QLD_AI_Hub

6 Jul 2021

Our mates @Queensland_AI were recently joined by @wandb Aman Arora to explore what's new in #computervision and how to publicly share your work. #artificialintelligence #machinelearning #deeplearning #data piped.video/watch?v=IYg46wNy…

Aman Arora · Jul 29, 2021 · 2:24 AM UTC

Aman Arora @amaarora

29 Jul 2021

Love this blog on @fastdotai's learning rate finder! Includes experimentation with different LRs on the PETs dataset and some pretty cool handwritten notes on discriminative learning rates too :) Also explains why we don't need to re-initialize learners after running lr_find()!

Vinayak Nayak @ElisonSherton

25 Jul 2021

An amazing week 7 of #fastbook session with @amaarora from @wandb ! Aman explains the importance of tuning learning rates for training #DeepLearning models and I have summarized my understanding below elisonsherton.github.io//fas…

Aman Arora · Mar 8, 2023 · 9:42 PM UTC

Aman Arora @amaarora

8 Mar 2023

Slightly old but gold. Illustrating Reinforcement Learning from Human Feedback (RLHF) by @huggingface huggingface.co/blog/rlhf

4,018

Aman Arora · Aug 11, 2021 · 7:58 AM UTC

Aman Arora @amaarora

11 Aug 2021

Got bronze in Covid competition by the finest of margins. But, also - 1. First time with "object detection" 2. Wrote a blog post about DETR and hosted PRG at @wandb 3. Another PRG next week on MDETR 4. Learnt about Detectron2, MMDetection 5. Working on a library

Aman Arora @amaarora

21 Jul 2021

There's so much to learn from @kaggle.

Aman Arora · Jul 15, 2021 · 9:54 PM UTC

Aman Arora @amaarora

15 Jul 2021

As deep learning practitioners, we are surrounded by frameworks that make our lives so much easier! One such framework that has been a part of almost all of my distributed training (multi-GPU/TPU) loops since its release has been -🤗accelerate! github.com/huggingface/accel… 1/4

Aman Arora · Mar 8, 2021 · 12:40 AM UTC

Aman Arora @amaarora

8 Mar 2021

Did you want to use the SGDR scheduler for your custom PyTorch training scripts? This tutorial shows you how you an implement it using timm and explains each of the hyperparameters. Oh btw, it is also possible to schedule other params apart from lr. fastai.github.io/timmdocs/SG…

Aman Arora · Jul 18, 2021 · 11:23 PM UTC

Aman Arora @amaarora

18 Jul 2021

Not long left now, in 2 days we are hosting the first ML-frameworks meetup at @wandb with @GuggerSylvain, for a deep-dive into hugging face accelerate! code: huggingface.co/docs/accelera… blog: huggingface.co/blog/accelera… rsvp: wandb.me/ml-frameworks See you all there!🤗

Aman Arora · Dec 28, 2019 · 9:59 PM UTC

Aman Arora @amaarora

28 Dec 2019

If you've tried and failed to understand Seq2Seq models (like I did many times), try @math_rachel's NLP course accompanied with these two excellent blog posts by @karpathy and @ch402 bit.ly/2ZuFwE0, bit.ly/34YugAU. Then, try implementing the model in code. :)

Aman Arora · Mar 9, 2021 · 10:15 AM UTC

Aman Arora @amaarora

9 Mar 2021

Did you know that the `timm` library can load the ImageNet pretrained weights for images with input number of channels != 3? Here is a tutorial that explains how this works. bit.ly/30tzftz 1/

Aman Arora · Aug 27, 2021 · 12:59 PM UTC

Aman Arora @amaarora

27 Aug 2021

I am so excited to share, that at CTDS we are stating out with a new series on TIMM! GitHub: github.com/rwightman/pytorch… In this series, @bhutanisanyam1 and I are going to dig deep dive into the source code of TIMM over the next few weeks. CTDS: piped.video/c/ChaiTimeDataSc… 1/n

Aman Arora · Jun 4, 2021 · 6:16 AM UTC

Aman Arora @amaarora

4 Jun 2021

It was lovely to host @jeremyphoward yesterday for our introductory session on fastbook reading group at @wandb! :) This session is also available on YouTube here - piped.video/watch?v=X3tjlZL9… 1/n

W&B Fastbook Reading Group — 0. Introduction with Jeremy Howard

In this introduction of the W&B Fastbook Reading Group, we're joine...

youtube.com

Aman Arora · Jul 29, 2021 · 2:39 AM UTC

Aman Arora @amaarora

29 Jul 2021

In case you missed @GuggerSylvain talk about 🤗Accelerate - the video is now available on YouTube! Plenty of good advice from the master himself. :) piped.video/watch?v=A7lnu-Zs…

ML Frameworks: Hugging Face Accelerate w/ Sylvain Gugger

In this edition of ML Frameworks W&B's host Aman Arora was joined b...

youtube.com

Aman Arora · Aug 26, 2021 · 4:26 AM UTC

Aman Arora @amaarora

26 Aug 2021

I am actually really happy with the way I explained convolutions today!:D piped.video/watch?v=Rmj8OILj…

W&B Fastbook Reading Group — 12. What is a convolution, really?

Aman discusses fastbook Chapter 13.🌟 Discussion: http://wandb.me/...

youtube.com

Aman Arora · Mar 1, 2023 · 11:38 PM UTC

Aman Arora @amaarora

1 Mar 2023

Have taken the first step towards updating my blog (and hopefully, making it better). You know what I am talking about, right? amaarora.github.io/

8,137

Aman Arora · Mar 5, 2021 · 12:29 AM UTC

Aman Arora @amaarora

5 Mar 2021

This is an exciting moment. I've successfully been able to re-implement `SGD`, `Momentum` & `RMSProp` from scratch. Blue is the loss curve when a model was trained using PyTorch's `RMSprop` and orange represents the new implementation from scratch. New blog post out soon! :)

Aman Arora · Aug 17, 2021 · 2:37 AM UTC

Aman Arora @amaarora

17 Aug 2021

Long way to go, but happy to be a Kaggle 2x expert. :)

Aman Arora · Aug 12, 2021 · 8:47 AM UTC

Aman Arora @amaarora

12 Aug 2021

Today, @bhutanisanyam1 and I will continue looking at the top solutions from "SIIM-FISABIO-RSNA Covid-19 Detection" Kaggle competition. Join us in ~3 hrs: piped.video/watch?v=HJDfV6Tj… We will train the Study Classification model from scratch using Segmentation AUX loss in PyTorch.

Aman Arora · Aug 23, 2020 · 11:35 PM UTC

Aman Arora @amaarora

23 Aug 2020

New Blogpost: In today's blogpost "SIIM-ISIC Melanoma Classification - my journey to a top 5% solution and first silver medal on Kaggle", I share my journey, solution summary and key learnings from having participated in this competition. amaarora.github.io/2020/08/2… 1/

Aman Arora · Jul 7, 2020 · 9:46 AM UTC

Aman Arora @amaarora

7 Jul 2020

Dear NLP experts, I want to train a model to do address segmentation. Trying to break a text address like: "Unit 12, 11-15 Myra Rd, Strathfield" to it's constituents like: Unit: 12 Street number: 11-15 Street name: Myra Rd Suburb: Strathfield How could I do this please?

Aman Arora · Aug 10, 2021 · 9:19 AM UTC

Aman Arora @amaarora

10 Aug 2021

We're discussing the DETR paper in our paper reading group at @wandb tomorrow! Paper: arxiv.org/abs/2005.12872 RSVP: wandb.me/prg

End-to-End Object Detection with Transformers

We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed...

arxiv.org

Aman Arora @amaarora

2 Aug 2021

Aman Arora · Dec 13, 2021 · 2:44 PM UTC

Aman Arora @amaarora

13 Dec 2021

Recently I integrated my "ResNet Strikes Back" (arxiv.org/abs/2110.00476) related experiments with @wandb and wrote a blog post about it too: wandb.ai/amanarora/resnet_st… 1/

Aman Arora · Jun 24, 2021 · 5:31 AM UTC

Aman Arora @amaarora

24 Jun 2021

Every week after the fastbook sessions I have a big smile on my face! I think it’s the @wandb and @fastdotai magic! Love the community and people attending! Thanks guys for making it so fun! 😁

Weights & Biases

@wandb

24 Jun 2021

In our upcoming Fastbook Reading Group session, @amaarora will wrap up Chapter 2 and start Chapter 3! 🗓 June 23, 8pm PT (<1 hr to go) 🚀 RSVP - wandb.me/fastbook

Aman Arora · Sep 15, 2021 · 2:02 AM UTC

Aman Arora @amaarora

15 Sep 2021

Hey everybody! I know I’ve been away, but not today. We are hosting our first beginner-friendly live coding session at @wandb on ResNet! Join me live at wandb.me/resnet-stream today at 10:30pm IST! We’ll build the architecture from scratch in @PyTorch.

Aman Arora · May 12, 2023 · 3:20 AM UTC

Aman Arora @amaarora

12 May 2023

May 5th, 2023: Release of StarCoder & StartCoderBase. nitter.app/BigCodeProject/s… I just finished reading the 54-page accompanying pre-print - arxiv.org/abs/2305.06161, & let me take you through all the finer details of dataset generation & curation, model training & evaluation below. Big thanks to @ServiceNow, @BigCodeProject & @huggingface for the open-source model, dataset & training recipe. ---------------------------------------------------- KEY FEATURES: 1. StarCoder is a finetuned version of StarCoderBase, that has been finetuned using 35B Python tokens! 2. StarCoderBase is a 15.5B parameter model with an 8K context length, trained on 1 trillion tokens from The Stack (arxiv.org/abs/2211.15533). 3. 1T tokens consist of 80+ programming languages, GitHub issues, Git commits & Jupyter Notebooks. 4. StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI `code-cushman-001` model. 5. Both StarCoderBase & StarCoder have 8k context length, support Fill-in-the-Middle (arxiv.org/abs/2207.14255) & inference through Multi-Query-Attention (arxiv.org/abs/1911.02150); I will write about these two papers in follow-up Twitter threads. 6. OpenRAIL-M license agreement, a new attribution tool into the VSCode demo that can help users detect and locate model generations that may have been copied from the training set & a significantly improved the PII redaction pipeline by collecting a PII dataset containing 12,000 files with 22,950 annotated entities. ---------------------------------------------------- DATA CURATION & CLEANING: 1. From the 358 programming languages in The Stack, 86 were chosen based on two filters: - Languages with more than 500MB data - Top-50 languages on GitHut (githut.info/) or TIOBE Index for December 2022. (full list in table-1 & table-2 attached as imgs) 2. Swift was not chosen in the final list of languages due to human error! 3. Data was visually inspected - eighteen community annotators evaluated 300 programming language extensions. Here's how the process looked like: - Randomly 30,000 files were selected, and categorized by extension - Keep max 1,000 files per extension - Annotators went through 50-100 files & confirmed if data appeared normal code. 4. For HTML: custom HTML filter that targets excessive HTML boilerplate and links; For YAML: keep files with 50–5000 characters, an average line length smaller than 100, a maximum line length smaller than 1000, and more than 50% alphabetic characters; For JSON: keep files with 50–5000 characters and more than 50% alphabetic characters, which removes around 70% of the files and 98% of the volume. 5. Jupyter Notebooks were transformed into two different datasets - Jupyter-scripts & Jupyter-structured. - For Jupyter-scripts, Jupytext (jupytext.readthedocs.io/) was used to convert notebooks to scripts. Some notebooks missing metadata about programming language within each notebook, Guesslang (guesslang.readthedocs.io/) was used to automatically identify programming languages in this case. - For Jupyter-structured, filter out notebooks that don't have Python code or Markdown text. Only notebooks explicitly marked as ‘Python’ in the metadata were kept, consecutive Markdown blocks or code blocks were merged into a large Markdown or code block respectively. Total 1M structured Jupyter Notebooks after preprocessing. 6. For GitHub Issues, conversations from PR's & Issues were collected as part of The Stack. These were then filtered as below: - Remove auto-generated text when users replied to issues via email. (see Regex expression as Listing A.1 img attached) - removed 18% of volume. - Exclude comments from bots. Done by searching for keywords in username & comment's author. - Keep conversations with two or more users, or total text within comment < 7,000 characters for single user. - Use `fasttext` (fasttext.cc/docs/en/language…) to filter out non-English issues. 7. For Git Commits, data collected from BigQuery (For ), remove repos from users that opted out of The Stack. Keep 50% sample and apply following filters: - Remove code files with >100k chars; - Remove commits with empty commit subject; - Subsample changes with ≤ 2 lines with 50% probability; - Subsample changes spanning ≥ 200 lines with 10% probability; - Remove commits with whitespace-separated words-to-character ratio >20; - Subsample data formats (JSON, YAML, XML, HTML) with 50% probability. 8. For DeDuplication, same approach as in arxiv.org/abs/2301.03988. - Calculate MiniHashes of all src code files followed by Locally Sensitive Hashing (LSH) to map similar code files to same bucket. * I am not sure about how this de-duplication part works, will have to further read about LSH & MiniHashes. 9. Regarding Weighting of Data Sources, authors decided not to up-sample or down-sample certain programming languages. Why? Because, after the deduplication process, it was found that several high-resource programming languages, such as C, C++, C#, Java, Javascript, Python, and PHP, had a similar amount of data ranging from 44–87 GB. ---------------------------------------------------- PII REDACTION Even though the Personally Identifiable Information (PII) redaction is a subset of Data Curation section before, I share it separately in this tweet as it's quite interesting. Consists of three parts: 1. Data Collection (identifying PII entities such as names, usernames, emails, IP addresses, passwords..): the collected dataset comprises of 12,000 files each containing approximately 50 lines of code in 31 programming languages. The annotators detected a total of 22,950 PII entities in the dataset. 2. Encoder only model called StarEncoder trained on data collected from step-1 using MLM (Masked Language Modelling) & NSP (Next Sentence Prediction) objectives - objectives from BERT! Takes ~2 days on 64 A100 GPUs for 400B tokens. 3. Finetune StarEncoder for NER (named entity recognition) task with 6 target classes: names, emails, keys, passwords, IP addresses, and usernames. The finetuned version baseline achieves F1 scores of more than 90% on names, emails, and IP addresses and 73.39% on passwords. The observed model’s performance is comparatively low on keys and usernames, with F1 scores of only 56.66% and 59.39%, respectively. Comparison against regex baseline: PII detection models still surpassed the regex approach in detecting all three entities supported by regex - Email, IP address & Key. All PII entities were replaced with the following tokens: <NAME>, <EMAIL>, <KEY>, <PASSWORD> ---------------------------------------------------- MODEL TRAINING StarCoderBase is the first model trained on 1 trillion tokens sourced from the curated dataset described above. StarCoder is the fine-tuned version of StarCoderBase, trained on another 35B Python tokens (roughly 2 epochs) 1. Data formatting using tokens performed prior to training. - For code, authors prepended repository name, file name, # of stars, & code. <reponame>REPONAME<filename>FILENAME<gh_stars>STARS\nCode<eos> - For Issues, special tokens used to separate comments. <issue_start>title + USERID: comment<issue_comment>USERID: Comment ... <issue_closed (optional)> <eos> - Jupyter scripts were formatted in the same manner as code. - For Git Commits, separated the code before the commit, the commit message, and the code after the commit with tokens. <commit_before>code<commit_msg>text<commit_after>code<eos> 2. Tokenizer: used the Hugging Face Tokenizers library to train a byte-level Byte-Pair-Encoding with a vocabulary size of 49,152 tokens—including the sentinel tokens. 3. Model Architecture: trained a 15.5B parameter model with the same architecture as SantaCoder. It is a decoder-only Transformer with Fill-in-the-Middle, Multi-Query-Attention & learned absolute positional embeddings.

BigCode @BigCodeProject

4 May 2023

Introducing: 💫StarCoder StarCoder is a 15B LLM for code with 8k context and trained only on permissive data in 80+ programming languages. It can be prompted to reach 40% pass@1 on HumanEval and act as a Tech Assistant. Try it here: shorturl.at/cYZ06r Release thread🧵

10,726

Aman Arora · Mar 9, 2020 · 9:18 AM UTC

Aman Arora @amaarora

9 Mar 2020

I'd like to think of Deep Learning in NLP and CV to broadly consist of: 1. Model architectures 2. Optimizers 3. Loss functions 4. Data augmentation techniques 5. Schedulers 6. Layer initializations What am I missing? Model distillation (can be part of category 1) and..?

Aman Arora · Jun 30, 2021 · 1:02 AM UTC

Aman Arora @amaarora

30 Jun 2021

Join me in 2 hours as we look into the EfficientNetV2 paper as part of our paper reading group at @wandb! arxiv: arxiv.org/abs/2104.00298 report: bit.ly/3y7ysx7

Aman Arora · May 6, 2021 · 12:41 AM UTC

Aman Arora @amaarora

6 May 2021

We asked and you answered! And now, we're excited to host our next paper reading group at @wandb this Sunday at 12pm PST on the "Vision Transformer"! Together, let's break this paper down into simple parts and learn all about it. Register here - wandb.ai/aarora/discussions/…

Paper Reading Group: Vision Transformers

The paper reading groups are supported by experiments, blogs & code implementation! This is your chance to come talk about the paper that interests you!.

wandb.ai

Aman Arora @amaarora

25 Apr 2021

If I were to host a paper reading group at @wandb, which paper would you want us to discuss together with code implementation? If any other, please let me know. :)

Aman Arora · May 3, 2023 · 5:02 AM UTC

Aman Arora @amaarora

3 May 2023

Great to see the classic matrix multiplication problem that we've seen in fast.ai as part of Modular keynote by @jeremyphoward ! Check out the keynote: modular.com/ 1/

5,627

Aman Arora · Aug 17, 2021 · 2:45 AM UTC

Aman Arora @amaarora

17 Aug 2021

Planning to host beginner friendly paper reading groups at @wandb! (possibly ResNet, SeNet, EfficientNet & DenseNet) We could also go through code implementation in TIMM! How does that sound?

33% Cool

64% Super cool

4% Not cool

318 votes • Final results

Aman Arora · Jul 28, 2021 · 1:12 AM UTC

Aman Arora @amaarora

28 Jul 2021

We're discussing the CaiT paper today in our paper reading group at @wandb in 2 hours - and I am also going to reference code from TIMM to show everybody the implementation of the paper in PyTorch. RSVP: wandb.me/prg Paper: arxiv.org/abs/2103.17239

Aman Arora · Aug 31, 2020 · 12:35 AM UTC

Aman Arora @amaarora

31 Aug 2020

It's Monday again, and in today's blog post we will be looking at "GeM Pooling" and also a brief introduction to the Image Retrieval. amaarora.github.io/2020/08/3… We also look at PyTorch implementation and run a small experiment as usual. Jupyter nb: nbviewer.jupyter.org/github/… 1/

Aman Arora · Sep 20, 2020 · 10:37 AM UTC

Aman Arora @amaarora

20 Sep 2020

After having spent everyday for the past two years working and learning continuously, I have decided to take a little break - relax and replenish my energy. It is also my birthday in around 10 days time. I promise to continue writing more blogs when I come back. :)

Aman Arora · Sep 21, 2021 · 2:02 PM UTC

Aman Arora @amaarora

21 Sep 2021

Today at @wandb at 10:30pm IST, we are hosting our second beginner-friendly paper reading group - this time on DenseNet! Paper: arxiv.org/abs/1608.06993 YouTube URL: piped.video/watch?v=Cbd-4ieB… Blog: amaarora.github.io/2020/08/0… Look forward to seeing you there!

Densely Connected Convolutional Networks

Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those...

arxiv.org

Aman Arora · Jul 15, 2022 · 2:22 AM UTC

Aman Arora @amaarora

15 Jul 2022

Something I just discovered today is that my blog just crossed the 100K unique users milestone! amaarora.github.io/ It has been now been read in 76 countries with over 150K sessions within ~2 years of starting it. This makes me very happy and motivated to write more! :)

Aman Arora · Aug 17, 2020 · 12:01 PM UTC

Aman Arora @amaarora

17 Aug 2020

People selling medals on Kaggle for money? kaggle.com/c/siim-isic-melan… This makes me sad.

SIIM-ISIC Melanoma Classification

Identify melanoma in lesion images

kaggle.com

Aman Arora · Apr 10, 2021 · 10:38 PM UTC

Aman Arora @amaarora

10 Apr 2021

I learnt "much more" from the friends I made in the `DS 101` course than from the actual curriculum itself. Then, someone pointed me to @fastdotai by @jeremyphoward. Having trained an img classifier within the first 2 hours of starting out, I was hooked to DL for life. 3/

Aman Arora · Mar 24, 2020 · 12:50 PM UTC

Aman Arora @amaarora

24 Mar 2020

It's the end of week-1 of "Deep learning for coders : Part-1" course and I spent this week looking in to the *DataBlocks API*. Here is a code-first introduction to the wonderful API using five different single label CV applications: amaarora.github.io/fastaiexp…

Aman Arora · Jan 5, 2022 · 11:47 AM UTC

Aman Arora @amaarora

5 Jan 2022

Just joined Twitch! Who are some deep learning folks I should follow? Recommendations, please! :)

Aman Arora · Nov 17, 2019 · 8:28 AM UTC

Aman Arora @amaarora

17 Nov 2019

Sometimes it takes 8 windows to follow @fastdotai source code! Thanks @jeremyphoward for introducing me to VIM and TMUX. My most comprehensive article on DataBlocks API coming out soon!! #fastai #datablocks #python #vim #tmux

Aman Arora · May 9, 2023 · 5:38 AM UTC

Aman Arora @amaarora

9 May 2023

Do you understand perplexity metric? If not, thats okay. I didn't understand it completely either and asked GPT-4 for help. The results are mind blowing! 🤯 "Can you please explain the "perplexity" with example sequence of words and predictions from a large language model?" 1/

5,517