anshuman · Aug 28, 2025 · 4:35 AM UTC

anshuman

anshuman

@athleticKoder

28 Aug 2025

She dumped me last night. Not because I don't listen. Not because I'm always on my phone. Not even because I forgot our anniversary (twice). But because, in her exact words: "You only pay attention to the parts of what I say that you think are important." I stared at her for a moment and realized... She just perfectly described the attention mechanism in transformers. Turns out I wasn't being a bad boyfriend. I was being mathematically optimal. See, in conversations (and transformers), you don't give equal weight to every word. Some words matter more for understanding context. Attention figures out exactly HOW important each word should be. Here's the beautiful math: Attention(Q, K, V) = softmax(QK^T / √d_k)V Breaking it down: Q (Query): "What am I looking for?" K (Key): "What info is available?" V (Value): "What is that info?" d_k: Key dimension (for scaling) Think library analogy: You have a question (Query). Books have titles (Keys) and content (Values). Attention finds which books are most relevant. Step-by-step with "The cat sat on the mat": Step 1: Create Q, K, VEach word → three vectors via learned matrices W_Q, W_K, W_V For "cat": Query: "What should I attend to when processing 'cat'?" Key: "I am 'cat'" Value: "Here's cat info" Step 2: Calculate scoresQK^T = how much each word should attend to others Processing "sat"? High similarity with "cat" (cats sit) and "mat" (where sitting happens). Step 3: Scale by √d_kPrevents dot products from getting too large, keeps softmax balanced. Step 4: SoftmaxConverts scores to probabilities: "cat": 0.4 (subject) "sat": 0.3 (action) "mat": 0.2 (location) "on": 0.1 (preposition) "the": 0.1 (article) Step 5: Weight valuesMultiply each word's value by attention weight, sum up. Now "sat" knows it's most related to "cat" and "mat". Multi-Head Magic:Transformers do this multiple times in parallel: Head 1: Subject-verb relationships Head 2: Spatial ("on", "in", "under") Head 3: Temporal ("before", "after") Head 4: Semantic similarity Each head learns different relationship types. Why This Changed Everything: Before: RNNs = reading with flashlight (one word at a time, forget the beginning) After: Attention = floodlights on entire sentence with dimmer switches This is why ChatGPT can: Remember 50 messages ago Know "it" refers to something specific Understand "bank" = money vs river based on context The Kicker:Models learn these patterns from data alone. Nobody programmed grammar rules. It figured out language structure just by predicting next words. Attention is how AI learned to read between the lines. Just like my therapist helped me understand my focus patterns, maybe understanding transformers helps us see how we decide what matters. Now if only I could implement multi-head attention in dating... 🤖 Still waiting for "scaled dot-product listening" to be invented.

614

985

11,672

719,320

anshuman · Dec 18, 2024 · 1:05 PM UTC

anshuman

@athleticKoder

18 Dec 2024

this paper changed my life

110

462

7,547

556,802

anshuman · Sep 18, 2025 · 12:50 PM UTC

anshuman

@athleticKoder

18 Sep 2025

You're in a ML Engineer interview at Perplexity, and the interviewer asks: "Your RAG system is hallucinating in production. How do you diagnose what's broken - the retriever or the generator?" Here's how you can answer:

214

2,887

361,062

anshuman · Sep 11, 2025 · 10:57 AM UTC

anshuman

@athleticKoder

11 Sep 2025

You're in an ML inference engineer interview at Anthropic, and the interviewer asks: "Can you explain speculative decoding and why we'd want to use it?" Here's how you can answer:

154

2,826

325,716

anshuman · Sep 21, 2025 · 12:47 PM UTC

anshuman

@athleticKoder

21 Sep 2025

249

2,633

196,192

anshuman · Nov 3, 2025 · 12:30 PM UTC

anshuman

@athleticKoder

3 Nov 2025

"Just use OpenAI API" Until you need: - Custom fine-tuned models - <50ms p99 latency - $0.001/1K tokens (not $1.25/1K input) Then you build your own inference platform. Here's how to do that:

131

2,377

377,369

anshuman · Sep 13, 2025 · 1:02 PM UTC

anshuman

@athleticKoder

13 Sep 2025

I rejected a job offer yesterday. Not because of the salary. Not because of the tech stack. Not even because of the long hours they warned me about. But because, when I asked how they evaluate their AI systems, the hiring manager said: "We just ask it some questions and see if the answers sound right." I stared at them for a moment and realized... They just described the biggest problem in AI today. See, "sounds right" isn't a measurement. It's a hope. Here's what proper LLM evaluation actually looks like: - Accuracy: Can it get factual questions right? (Not 80% of the time. Consistently.) - Hallucination rate: How often does it make things up? (This should be near zero for critical applications.) - Bias metrics: Does it treat all groups fairly? (Measured across demographics, not assumed.) Real Evaluation Frameworks: - BLEU scores for translation quality Perplexity for language modeling Human evaluation with inter-annotator agreement Adversarial testing (red teaming) Domain-specific benchmarks (legal, medical, financial) The Process: > Define success criteria BEFORE deployment > Create diverse test sets (not just happy paths) > Measure consistently across model versions > Track performance over time (models drift) Have humans validate edge cases Why This Matters: Before proper evals: "Our model is amazing!" (based on cherry-picked examples) After proper evals: "Our AI achieves 94.2% accuracy on domain X, with known failure modes Y and Z" The difference? One builds trust. The other destroys it when reality hits. The kicker: Most companies are still in the "sounds right" phase. They're deploying models evaluated by vibes, not metrics. Just like you wouldn't join a team that deploys code without tests, you shouldn't join one that deploys AI without proper evaluation. What's your experience with LLM evaluation? Are we measuring what actually matters?

142

1,757

283,375

anshuman · Sep 17, 2025 · 1:25 PM UTC

anshuman

@athleticKoder

17 Sep 2025

You're in a ML Engineer interview at Groq, and the interviewer asks: "How do you measure LLM inference performance? What metrics matter most for production systems?" Here's how you can answer

1,489

136,413

anshuman · Oct 9, 2025 · 5:43 AM UTC

anshuman

@athleticKoder

9 Oct 2025

career update: joined zomato as Machine Learning Engineer 2

115

1,436

166,212

anshuman · Oct 17, 2025 · 12:30 PM UTC

anshuman

@athleticKoder

17 Oct 2025

Techniques I’d master if I wanted to make LLMs faster + cheaper. 1. Quantization 2. KV-Cache Quantization 3. Flash Attention 4. Speculative Decoding 5. LoRA 6. Pruning 7. Knowledge Distillation 8. Weight Sharing 9. Sparse Attention 10. Batching & Dynamic Batching 11. Model Serving Optimization 12. Tensor Parallelism 13. Pipeline Parallelism 14. Paged Attention 15. Mixed Precision Inference 16. Early Exit / Token-Level Pruning

162

1,263

60,845

anshuman · Sep 29, 2025 · 1:06 PM UTC

anshuman

@athleticKoder

29 Sep 2025

You’re in a AI Engineer interview at Microsoft, and the interviewer asks: ‘Our team needs to build RAG over 10M documents. Which vector database and why?’ Here’s how you answer:

105

1,239

153,700

anshuman · Feb 18, 2025 · 11:39 AM UTC

anshuman

@athleticKoder

18 Feb 2025

software Engineers have a runway of 5 years left

1,109

85,184

anshuman · Sep 19, 2025 · 1:07 PM UTC

anshuman

@athleticKoder

19 Sep 2025

You're in a ML Engineer interview at Anthropic, and the interviewer asks: "Your LLM inference is running out of GPU memory with long conversations. How do you fix this?" Here's how you answer:

1,103

118,562

anshuman · Oct 16, 2025 · 12:30 PM UTC

anshuman

@athleticKoder

16 Oct 2025

ML concepts every data scientist should know for interviews: Bookmark this. 1. Bias-Variance Tradeoff 2. Cross-Validation Strategies 3. Regularization (L1, L2, Elastic Net) 4. Class Imbalance & Sampling Techniques 5. Feature Engineering & Selection 6. Overfitting vs Underfitting 7. Evaluation Metrics (beyond accuracy) 8. Hyperparameter Tuning 9. Train-Test Data Leakage 10. Ensemble Methods 11. Dimensionality Reduction 12. Model Interpretability (SHAP, LIME) 13. Gradient Descent Variants 14. Activation Functions & Neural Networks 15. Imbalanced Dataset Handling 16. Production Model Monitoring

112

1,068

57,626

anshuman · Nov 10, 2025 · 12:30 PM UTC

anshuman

@athleticKoder

10 Nov 2025

"Just use Vector Database" Until you need: - 100M+ vectors indexed - <10ms p95 search latency - $50/month (not $500/month) Then you build your own vector database. Here's what that actually means:

1,085

147,233

anshuman · Nov 12, 2025 · 12:30 PM UTC

anshuman

@athleticKoder

12 Nov 2025

“Just rent a GPU for training” Until you need: - Multi-node training for 70B+ models - $5/hour per GPU (not $30/hour) - 90%+ GPU utilization Then you build your own ml infra. Here’s the reality:

1,069

149,357

anshuman · Sep 30, 2025 · 11:42 AM UTC

anshuman

@athleticKoder

30 Sep 2025

Techniques I'd master if building RAG systems that actually work: Bookmark this. 1. Sliding Window Chunking 2. Semantic Chunking 3. Document Hierarchies 4. Metadata Enrichment 5. Query Expansion 6. Hybrid Search 7. Reranking Models 8. Context Window Packing 9. Lost in the Middle Problem 10. Hypothetical Document Embeddings (HyDE) 11. Multi-Query Retrieval 12. Contextual Compression 13. Sentence Window Retrieval 14. Auto-Merging Retrieval 15. Cross-Encoder Rescoring 16. Temporal Context Decay 17. Negative Sampling 18. MMR (Maximal Marginal Relevance) 19. Graph-Based Retrieval 20. Recursive Retrieval 21. Citation Trackingchunks 22. Context Ablation Testing 23. Adaptive Retrieval

101

1,037

62,698

anshuman · Sep 16, 2025 · 12:15 PM UTC

anshuman

@athleticKoder

16 Sep 2025

You're in a ML Inference engineer interview at Google, and the interviewer asks: "What's the real bottleneck in LLM serving throughput? How can PagedAttention help?" Here's how you can answer:

985

90,324

anshuman · Feb 22, 2025 · 5:14 PM UTC

anshuman

@athleticKoder

22 Feb 2025

one of my favourite ML Youtube Channel lately.

103

932

74,409

anshuman · Sep 20, 2025 · 1:18 PM UTC

anshuman

@athleticKoder

20 Sep 2025

You're in an ML Engineer interview at Perplexity, and the interviewer asks: "Your LLM generates millions of responses daily. How do you evaluate quality without manual review?" Here's how you answer:

946

168,886

anshuman · Sep 23, 2025 · 1:13 PM UTC

anshuman

@athleticKoder

23 Sep 2025

You're in a ML Engineer interview at Meta, and the interviewer asks: "Why does RL work better than supervised learning for LLMs?" Here's how you answer:

950

90,106

anshuman · Oct 29, 2024 · 6:07 AM UTC

anshuman

@athleticKoder

29 Oct 2024

life of a machine learning engineer

879

57,739

anshuman · Nov 13, 2025 · 11:50 AM UTC

anshuman

@athleticKoder

13 Nov 2025

We are hiring for 2 Machine Learning Engineers in Bangalore office. you'll work directly with me on super impactful projects Drop your best work in the comments👇 and I will personally reach out to you if you are a fit. Please Don't DM!

138

933

126,028

anshuman · Oct 15, 2025 · 12:30 PM UTC

anshuman

@athleticKoder

15 Oct 2025

Research Scientist interview at Google. Interviewer: "You need to quantize a model from FP16 to INT8. Walk me through how you'd do it without destroying quality." Your answer: "I'll just convert all weights to INT8 format" ❌ Rejected. Here's the critical mistake: Don't say: "Quantization reduces precision" or "Use 8 bits instead of 16." Too surface-level. The real answer is the outlier feature problem. INT8 quantization fails because 0.01% of activation values are 100× larger than the rest. Your quantization range is wasted on outliers. You're compressing a skyscraper and a house with the same ruler. Here's why naive quantization destroys quality: FP16 weights → Scale to [-127, 127] → Store as INT8 → 2× memory reduction Problem: Activation outliers exist at specific feature dimensions. 6,000 features: Normal distribution (-0.5 to +0.5) 144 features: Outliers (100× larger, -50 to +50) Those 144 features control 90% of model quality. btw subscribe to my newsletter to get these posts for free - fullstackagents.substack.com The outlier math is brutal: - Quantization range: [-127, 127] = 254 values - One outlier at value=50: Forces scale = 50/127 = 0.39 - Normal value 0.3: Quantized to round(0.3/0.39) = 1 - Actual range used: 2 out of 254 values - Precision loss: 99.2% You're using a bathroom scale to weigh an ant and an elephant together. Remember: Outliers are dimension-specific, not token-specific: - Feature dim 2,145: Always outlier (±40 to ±60) - Feature dim 891: Always normal (±0.2 to ±0.5) - Pattern stable across ALL prompts, ALL batches - Position-independent, dimension-dependent This changes everything. > Bad approach: Single quantization scale per tensor Convert all weights uniformly Model perplexity: 12.3 → 2,847 (destroyed) > Good approach (LLM.int8()): Identify outlier feature dimensions (top 0.5%) Keep outliers in FP16 (144 dims) Quantize normal features to INT8 (5,856 dims) Model perplexity: 12.3 → 12.6 (preserved) The difference is mixed precision, not uniform precision. Memory bandwidth bottleneck: - FP16: Load 140GB from VRAM per forward pass - INT8: Load 70GB from VRAM per forward pass - Throughput: 1.8× higher - Quality: 98% maintained Cost reduction: FP16: 70B model needs 4× A100 GPUs INT8: 70B model fits 2× A100 GPUs Cost: Cut in half

929

85,007

anshuman · Oct 28, 2025 · 12:39 PM UTC

anshuman

@athleticKoder

28 Oct 2025

You're in a ML Engineer interview at Perplexity , and they ask: "Your RAG system retrieves at 80% accuracy but only answers correctly 50% of the time. What's wrong?" Here's is how you answer:

911

117,137

anshuman · Sep 22, 2025 · 1:18 PM UTC

anshuman

@athleticKoder

22 Sep 2025

You're in a ML Engineer interview at Google, and the interviewer asks: "GPUs vs TPUs which one to choose?" Here's how you answer:

832

89,473

anshuman · Nov 19, 2025 · 12:01 PM UTC

anshuman

@athleticKoder

19 Nov 2025

Techniques to Master for Faster + Cheaper LLM Inference 1. Quantization (INT8/INT4/FP8) 2. KV-Cache Optimization (quantization, compression, eviction) 3. Flash Attention 4. Speculative Decoding 5. Continuous Batching 6. Paged Attention / vLLM-style memory management 7. Tensor Parallelism 8. Pipeline Parallelism 9. Prompt Caching (caching prefixes/system prompts) 10. Mixed Precision Inference 11. Chunked Prefill 12. Medusa / Multi-token prediction 13. Attention Sinks (streaming/infinite context) 14. Kernel Fusion & Custom CUDA kernels 15. Request Scheduling & Priority Queues

890

74,626

anshuman · Jan 13, 2025 · 2:30 PM UTC

anshuman

@athleticKoder

13 Jan 2025

I am told that this is the best playlist for learning distributed systems.

751

35,429

anshuman · Sep 15, 2025 · 12:39 PM UTC

anshuman

@athleticKoder

15 Sep 2025

You're in a Research engineer interview at OpenAI, and the interviewer asks: "How do you train your model for Computer Use? Can RL solve this? " Here's how you can answer:

730

95,410

anshuman · Sep 21, 2024 · 6:24 PM UTC

anshuman

@athleticKoder

21 Sep 2024

Okay, guys, it's time to give back. Juniors preparing for GSoC 2025, I will create an ultimate video series on Google Summer of Code, on YT and X & I will unveil the strategy that led me to selection. Not only that, I am gonna tell you how can GSoC turn your life around and open your path for a successful career, if you do it the right way! If there is any specific thing you wanna know or whatever questions you have, feel free to drop them in the comments.

anshuman

@athleticKoder

5 May 2023

I've waited for this email for one and a half year, Today I can't be happier to announce that I have been selected in the Google Summer of Code Program at @TensorFlow organization. #opensource #MachineLearning #Google #GSOC #GSoC2023

583

78,949

anshuman · Jan 15, 2025 · 2:30 PM UTC

anshuman

@athleticKoder

15 Jan 2025

this has to be the best playlist ever created on FastAPI

557

28,846

anshuman · Oct 23, 2024 · 9:18 AM UTC

anshuman

@athleticKoder

23 Oct 2024

how i started deep learning in 2019

531

38,635

anshuman · Oct 27, 2025 · 1:01 PM UTC

anshuman

@athleticKoder

27 Oct 2025

Techniques I'd master to fine-tune LLMs in production. Bookmark this 1. LoRA & QLoRA for parameter-efficient fine-tuning 2. PEFT library for adapter methods 3. Instruction tuning 4. Dataset formatting (ChatML, Alpaca, ShareGPT) 5. DeepSpeed ZeRO for memory optimization 6. Flash Attention 2 for efficient training 7. Gradient checkpointing for longer contexts 8. BitsAndBytes for 4-bit/8-bit quantization 9. RLHF & DPO for alignment 10. Tokenizer training & vocabulary extension 11. Evaluation metrics (perplexity, ROUGE, human eval) 12. Unsloth for 2x faster fine-tuning 13. Multi-GPU strategies (FSDP, DDP)

574

32,031

anshuman · Sep 24, 2025 · 2:09 PM UTC

anshuman

@athleticKoder

24 Sep 2025

You're in a ML Inference Engineer interview at Google, and the interviewer asks: "Our team wants to switch from Gemini API to a fine tuned. Which serving framework and why?" Here's how you answer:

568

59,019

anshuman · Sep 26, 2025 · 12:26 PM UTC

anshuman

@athleticKoder

26 Sep 2025

A girl at my gym approached me after her workout, clearly annoyed. "I've been watching and copying your entire routine for weeks, but I'm not seeing the same improvements you are!" I explained, "You can't just mimic what I do - you need to understand which exercises deserve more focus for your specific goals." She nodded. And then she said, "Wait, isn't that like attention mechanism in ChatGPT? " And I know you're sitting there like: WTF is Attention Mechanism? Attention Mechanism is like that gym bro who knows exactly which exercises deserve maximum effort during each workout. How does it work in LLMs? You feed a sentence with multiple words to the model Each word "examines" ALL other words in the sentence It calculates "how much attention should I pay to each word?" Creates weighted connections based on relevance Important words get higher attention scores, others get ignored The Complete Math: Step 1: Create Query, Key, and Value matrices Query (Q) = What am I looking for? Key (K) = What information is available? Value (V) = The actual content to extract For each word position i: Q_i = X_i × W_Q (input × query weight matrix) K_i = X_i × W_K (input × key weight matrix) V_i = X_i × W_V (input × value weight matrix) Step 2: Calculate Attention Scores Score(i,j) = Q_i × K_j^T This tells us how much word i should pay attention to word j. Step 3: Scale the scores Scaled_Score = Score / √d_k Where d_k is the dimension of the key vectors (prevents exploding gradients). Step 4: Apply Softmax Attention_Weight(i,j) = Softmax(Scaled_Score(i,j)) Softmax formula: e^(x_i) / Σ(e^(x_k)) for all k This ensures all attention weights sum to 1. Step 5: Weighted Sum Output_i = Σ(Attention_Weight(i,j) × V_j) for all j Complete Formula: Attention(Q,K,V) = Softmax(QK^T / √d_k)V Sentence: "She wants to deadlift heavy weights" Let's say we have 3-dimensional embeddings (simplified): Word Embeddings: She = [1, 0, 0] wants = [0, 1, 0] deadlift = [1, 1, 1] heavy = [0, 0, 1] weights = [1, 0, 1] When processing "deadlift": Query for "deadlift" = [1, 1, 1] Calculate dot products (attention scores): deadlift → She: [1,1,1] · [1,0,0] = 1 deadlift → wants: [1,1,1] · [0,1,0] = 1 deadlift → deadlift: [1,1,1] · [1,1,1] = 3 deadlift → heavy: [1,1,1] · [0,0,1] = 1 deadlift → weights: [1,1,1] · [1,0,1] = 2 Raw scores: [1, 1, 3, 1, 2] After Softmax: She: e^1/(e^1+e^1+e^3+e^1+e^2) = 0.04 wants: 0.04 deadlift: e^3/(total) = 0.66 heavy: 0.04 weights: e^2/(total) = 0.22 Final attention weights: [0.04, 0.04, 0.66, 0.04, 0.22] Multi-Head Attention (the gym analogy): Think of it like having multiple personal trainers, each focusing on different aspects: Head 1: Focuses on exercise form and technique Head 2: Focuses on muscle groups being targeted Head 3: Focuses on safety and proper progression Each head has its own Q, K, V matrices and calculates attention independently, then results are concatenated. Mathematical representation: MultiHead(Q,K,V) = Concat(head_1, head_2, ..., head_h) × W_O Where each head_i = Attention(QW_i^Q, KW_i^K, VW_i^V) Why this revolutionized NLP: > Context Understanding – Mathematical precision in determining word relationships > Parallel Processing – All attention scores calculated simultaneously, not sequentially > Gradient Flow – Softmax ensures smooth gradients for training > Scalability – Works efficiently with sequences of any length Final Result: Attention Mechanism gave AI mathematical precision in focusing on what matters - just like how you calculate exactly which muscle groups need the most work based on your goals!

559

45,736

anshuman · Jul 4, 2024 · 5:49 PM UTC

anshuman

@athleticKoder

4 Jul 2024

I wanted to learn Rust for a long time, and finally, it's time to open source what I've been up to these last 2 weekends. Presenting picograd, inspired by Andrej Karpathy (@karpathy)'s good old micrograd : github.com/shivance/picograd Here are my 7 takeaways from this project: 1. I used to think that Rust is similar to C++. It's not. It's way different!! You have to learn a whole bunch of new concepts and the learning curve is steep. 2. Rust design patterns are difficult and weird. If you are coming from a strong Object Oriented Programming (OOP) background, it might be a hard pill to swallow. 3. Rust is not for every Python developer. You must move from a dynamically typed language to a strongly typed one. And you face tons of warnings while writing your lines of code. 4. I loved the concept of ownership. And I think Mojo🔥 by @Modular does that too! I feel that Mojo is a hybrid of Python and Rust. MLIR Compiler instead of LLVM is a story for another day. 5. If you come from a C++ background, you are gonna be reminded of your good old friend templates. Generics used to be so much fun. 6. I just love the @rustlang 's build system, man it's so easy to set up than C++. 7. Rust compiler is your excellent friend, although it's annoying to see (until you don't) an error every time you compile your code, it's for the greater good. Rust wants to ensure your code is safe and free of bugs. I think Rust is a very good language for production code. Not to mention its performance & speed. It's been a fun ride at @_buildspace and @_nightsweekends, where I'm seeing so many people like me coming together to build stuff. I love the community's energy.

GitHub - kanpuriyanawab/picograd: Rust Implementation of micrograd

Rust Implementation of micrograd. Contribute to kanpuriyanawab/picograd development by creating an account on GitHub.

github.com

519

79,718

anshuman · Mar 18, 2025 · 2:34 PM UTC

anshuman

@athleticKoder

18 Mar 2025

I wrote Qwen 2.5 from scratch. Works with JAX, PyTorch and Tensorflow. This marks my return to open source after an year. github.com/keras-team/keras-…

530

32,625

anshuman · Nov 17, 2025 · 12:30 PM UTC

anshuman

@athleticKoder

17 Nov 2025

LoRA sounds simple: 1. Add low-rank matrices 2. Train with less memory 3. Get same results Reality: - Learning rate needs 10x adjustment - Layer selection changes everything - Batch size tolerance is different

503

38,170

anshuman · Oct 13, 2025 · 12:31 PM UTC

anshuman

@athleticKoder

13 Oct 2025

Google ML Engineer Interview - Final Round Question: "Your inference costs are 10x higher than expected due to KV cache. How do you diagnose and fix this?" You: "I'll just increase my GPU memory to store more cache" Awkward silence Interview over. Here's why you failed: Don't say: "Add more memory" or "Optimize the cache size." Wrong framing. The real answer isn't about capacity - it's about memory fragmentation from contiguous allocation. Small sequences vs long sequences vs mixed batches = completely different memory behaviors. btw subscribe to my newsletter to get these posts in your inbox daily - fullstackagents.substack.com Most teams debug by checking total memory usage, not allocation patterns. Your memory problem isn't size - it's holes. Traditional allocation wastes 40% of your GPU memory on unusable gaps. PagedAttention isn't magic - it's just virtual memory for KV cache. "More memory" doesn't fix fragmentation. The allocation reality everyone misses: Contiguous allocation = Giant blocks that can't be split Non-contiguous allocation = Small blocks that fit anywhere Growing sequences = Constant reallocation and copying Finished sequences = Holes that new sequences can't use Memory layout drives the cost, not memory amount. "But what about cache size?" Interviewer: "How do you handle variable-length sequences efficiently?" Cache size without allocation strategy is meaningless. PagedAttention gives you near-zero fragmentation, but only if sequences of mixed lengths were your bottleneck. Fixed-size blocks only help when you need flexible allocation. The infrastructure framework that matters: > Long uniform sequences + Known lengths = Contiguous is fine > Mixed lengths + Dynamic batching = PagedAttention essential > Short sequences + High turnover = Block-based allocation wins > Rare long sequences + Mostly short = Fragmentation kills you > Match the allocation strategy to the workload pattern. The evolution path most deployments miss: Start: Simple contiguous allocation (easy to implement) Scale: Hit fragmentation issues (40% waste) Optimize: PagedAttention (near-zero fragmentation) Production: 5-6x throughput improvement on same hardware It's not about buying bigger GPUs - it's about using them better.The answer that gets you hired:"KV cache cost isn't about total memory. It's about allocation fragmentation. Contiguous blocks waste space through holes. PagedAttention uses virtual memory concepts - fixed-size blocks that can live anywhere. Pick based on your sequence length variance, not your memory budget."

497

45,817

anshuman · Oct 7, 2025 · 1:02 PM UTC

anshuman

@athleticKoder

7 Oct 2025

You’re in a Machine Learning interview at Perplexity, and the interviewer asks: “Why do we need hybrid search? Isn’t vector search with embeddings enough?” Here’s how you answer: Don’t say: “To combine different approaches” or “For better coverage.” Too generic. The real answer is the semantic-lexical gap. Your embeddings understand meaning but ignore exact matches. Vector search alone misses the forest for the trees - or worse, the exact product code the user typed. Here’s why pure vector search fails: Your query is “iPhone 15 Pro Max 256GB.” Vector search returns “iPhone 15 Pro with lots of storage” and “latest flagship phone specs.” But the user wants EXACT model + EXACT capacity. Semantic understanding ≠ Precision matching. btw get this kinda content on your email for free, daily, subscribe to my newsletter -fullstackagents.substack.com… The retrieval failure modes are brutal: Pure vector search: > Query: “ML-2847 error code” → Returns: General ML troubleshooting (0% useful) > Query: “React 18.2.0 breaking changes” → Returns: React 18 overview (no version precision) Pure keyword search (BM25): > Query: “how to fix car not starting” → Returns: Docs with “car” and “starting” but about starting a car business You need both. Always. The performance gap across real benchmarks: - BM25 alone: 67% MRR@10 - Dense retrieval alone: 71% MRR@10 - Hybrid (proper fusion): 82% MRR@10 That’s 15% improvement over the “best” single method. In production, that’s thousands of better answers per day. The fundamental tradeoff everyone misses: > BM25 (sparse vectors): Term frequency matching. Perfect for exact keywords, acronyms, codes. Fails at synonyms. > Dense embeddings: Semantic similarity. Perfect for meaning, paraphrases. Fails at exact matches. This is why you can’t pick one. You need intelligent fusion. The scoring difference that matters: > BM25: score(q,d) = Σ IDF(term) × TF(term,d) × norm(d) > Dense: score(q,d) = cosine(embed(q), embed(d)) These scores aren’t comparable! BM25 gives 0-15, cosine gives 0.7-0.95. This is why naive averaging fails. You need score normalization. The fusion algorithms you must know: 1. Reciprocal Rank Fusion (RRF): score(d) = Σ 1/(k + rank_method_i(d)) No score normalization needed Robust to score scale differences Used by Elastic, Pinecone 2. Weighted combination: score(d) = α × norm(score_bm25) + (1-α) × norm(score_dense) Requires score normalization α typically 0.3-0.5 More control but more tuning “So how do you choose the hybrid ratio?” Interviewer leans in. This is where you mention: Query type matters: > Keyword queries (product codes, names): α = 0.7 (favor BM25) > Natural language questions: α = 0.3 (favor dense) > Hybrid queries (”best iPhone under $500”): α = 0.5 > Measure and tune on YOUR data. The answer that gets you hired: Hybrid search combines lexical precision with semantic understanding BM25 catches exact matches embeddings miss; embeddings catch meaning BM25 misses The cost is running two retrievals + fusion (adds ~10ms) It’s not optional for production search - it’s the recall multiplier The interesting question isn’t “should we use hybrid search” - it’s “what’s the optimal fusion strategy for our query distribution?” Use RRF? Simple but less control. Use weighted combo? More tuning but better fit. The answer: Start with RRF, measure the gap, upgrade if needed. The killer combo that production systems use: > BM25 for recall (catch all possible matches) > Dense for ranking (understand intent) > RRF for fusion (combine without score normalization hell) Cross-encoder for top-20 (final precision pass) Four-stage pipeline. Each stage does what it’s best at.

490

50,776

anshuman · Sep 22, 2024 · 5:26 PM UTC

anshuman

@athleticKoder

22 Sep 2024

The first episode of the GSoC Series is LIVE! College Students who are planning to prepare for GSoC 2025 (and onwards), this video is for you! → My Google Summer of Code story is very unique in its way, I wasn't eligible to apply in the first two years of my college, in the third year I applied and got rejected and in the fourth year I finally made it. → Through this video, I am unveiling my strategy for how I got selected in the do-or-die situation. I have tried to guide you on how you can replicate the strategy! → This video is an excellent starting place if you are figuring out How to Contribute to Open Source! → If there is any specific thing you wanna know or whatever questions you have, feel free to drop them in the comments. → This is my first ever Video, so I would be grateful if you point out how can I improve my speaking or video editing skills through Direct Messages! → I have also published the video on YT, please check comments for more details

436

35,904

anshuman · Dec 30, 2024 · 3:09 PM UTC

anshuman

@athleticKoder

30 Dec 2024

This is how I started Machine Learning in 2019. I was broke af. Borrowed my sister's laptop for BTech. Took education loan to pay the fee. Completed my undergrad in Electronics from NIT Warangal. Glory to Hanuman.

himanshu

@himanshustwts

30 Dec 2024

Replying to @himanshustwts

"From humble beginnings come great things" and I lived it. (This was me learning diffusion models three months back.) You can just do things, guys.

419

30,594

anshuman · Oct 4, 2025 · 12:39 PM UTC

anshuman

@athleticKoder

4 Oct 2025

You're in a Machine Learning Interview at Perplexity, and the interviewer asks: "Why do we need rerankers in RAG? Isn't semantic search enough?" Here's how you answer:

438

38,595

anshuman · Oct 3, 2025 · 7:38 AM UTC

anshuman

@athleticKoder

3 Oct 2025

You're in a Machine Learning interview at Google, and the interviewer asks: Why is scaling context length so hard? What's the fundamental bottleneck? Here's how you answer:

395

36,549

anshuman · Nov 22, 2022 · 10:11 AM UTC

anshuman

@athleticKoder

22 Nov 2022

Totally mind blown up 🤯 @JuliaLanguage has a package called Measurements.jl which let's you add errors and uncertainty in datatypes Thank you @MoseGiordano

339

anshuman · Oct 1, 2025 · 12:47 PM UTC

anshuman

@athleticKoder

1 Oct 2025

In one of my interviews for ML Engineer position, I was asked, “What is quantization and how does it help with LLM inference?” Here is how you can answer:

360

25,247

anshuman · Dec 3, 2023 · 3:16 PM UTC

anshuman

@athleticKoder

3 Dec 2023

ML Community loved @karpathy 's introduction to Large Language Models. In the following blogpost, I think out loud about the implementation of Intelligent Operating Systems: The LLM OS. huggingface.co/blog/shivance… #MachineLearning #LLM #GPT #OperatingSystem #jarvis

Illustrated LLM OS: An Implementational Perspective

A Blog post by Anshuman Mishra on Hugging Face

huggingface.co

342

43,837

anshuman · Dec 17, 2024 · 6:46 AM UTC

anshuman

@athleticKoder

17 Dec 2024

meet your mutuals anon

327

26,554

anshuman · Oct 8, 2024 · 4:52 AM UTC

anshuman

@athleticKoder

8 Oct 2024

Open Source will Change your Life Just published the second Video of the GSoC Series. You can watch it here on X or YT. This video is about getting started with Open Source and making the first contribution.

anshuman

@athleticKoder

22 Sep 2024

Done Recording the first video for the GSoC Series. Now it's time to edit ✂️📷 (pretty nervous and pretty excited as this is my debut in YT & X video content) If you have tips for editing please let me know in the comments. If you have questions about Google Summer of Code, let me know in the comments. I'll try to include it in the series ✨!

326

20,877

anshuman · Oct 6, 2025 · 1:01 PM UTC

anshuman

@athleticKoder

6 Oct 2025

You’re in a Machine Learning interview at Groq, and the interviewer asks: “Why do we need prompt caching? Can’t we just resend the full context every time?” Here’s how you answer:

329

37,157

anshuman · Jan 7, 2025 · 7:45 AM UTC

anshuman

@athleticKoder

7 Jan 2025

best reinforcement learning playlist there is

327

58,224

anshuman · Dec 9, 2024 · 7:23 PM UTC

anshuman

@athleticKoder

9 Dec 2024

Building a Cold Email Generator using PydanticAI Agents and LLaMA3 via Groq This video is for you if you? 1. Are interested in building AI applications 2. Curious about AI Agents 3. Chill guy who loves to tinker with new stuff 4. A student, interested in learning AI 5. An indie hacker, trying to tap into AI development What's this video about? In this video, we build an end-to-end generative AI project from scratch. Problem Statement: You are an individual working in a headhunting company catering to startups and big tech. You want to automate the process of cold-emailing. Given the job post link, this video generates the email to be written to hiring manager. Outline 0:00 - 0:50 - Introduction 0:51 - 5:00 - Setting up @GroqInc 5:01 - 11:00 - @pydantic AI Hello World! 11:01 - 44:21 - Implementing your Agents (In-depth) 44:22 - 57:00 - Setting up HuggingFace @gradio project 57:01 - 1:13:11 - Building end-to-end application The project source code is available on GitHub and the video is available on YouTube as well! for more content like this follow @1smollcoder nitter.app/1smollcoder/status/186…

anshuman

@athleticKoder

7 Dec 2024

Building something cool with PydanticAI. Working on prototype now. Will create a long video and post it here on X as well as YT.

329

30,957

anshuman · Oct 30, 2025 · 12:18 PM UTC

anshuman

@athleticKoder

30 Oct 2025

Techniques I'd master to learn Reinforcement Learning. Bookmark this 👇 1. Markov Decision Processes (MDPs) & Bellman equations 2. Value iteration & policy iteration algorithms 3. Q-learning & Deep Q-Networks (DQN) 4. Experience replay & target networks 5. Policy gradients & Reinforce algorithm 6. Actor-Critic methods (A2C, A3C) 7. Proximal Policy Optimization (PPO) 8. Trust Region Policy Optimization (TRPO) 9. Soft Actor-Critic (SAC) for continuous control 10. Twin Delayed DDPG (TD3) algorithm 11. Exploration strategies (epsilon-greedy, UCB, entropy) 12. Reward shaping & discount factor selection 13. Multi-armed bandits & contextual bandits 14. Monte Carlo methods & temporal difference learning 15. Function approximation & neural network policies 16. OpenAI Gym & custom environment design 17. Stable Baselines3 & RLlib frameworks 18. Model-based RL (Dyna-Q, World Models) 19. Imitation learning & behavioral cloning 20. Multi-agent RL & game theory basics

331

18,279

anshuman · Jun 10, 2025 · 3:05 PM UTC

anshuman

@athleticKoder

10 Jun 2025

I coded Qwen3 from scratch recently. > be me > read the paper > implement the model > contribute to keras > see the impact

312

21,779

anshuman · Oct 8, 2025 · 1:02 PM UTC

anshuman

@athleticKoder

8 Oct 2025

You’re in a Machine Learning interview at OpenAI, and the interviewer asks: “Why is everyone switching from RLHF to DPO? Isn’t RLHF the proven approach?” Here’s how you answer: Don’t say: “DPO is simpler” or “RLHF is too complex.” Too surface-level. The real answer is the reward model bottleneck. RLHF trains a separate reward model that becomes a noisy proxy for human preferences. DPO directly optimizes the policy on preference data. You’re eliminating the broken telephone. Here’s why RLHF is fundamentally flawed: Your training pipeline: Human preferences → Train reward model → Use RL to optimize policy against reward model. Problem: The reward model is trained on limited data (10k-100k comparisons), but then used to generate 1M+ training signals. It’s overconfident on out-of-distribution outputs. Reward model accuracy ≠ Alignment quality. btw subscribe to my newsletter to get these posts for free - fullstackagents.substack.com… The RLHF failure modes are brutal: > Reward hacking: Model finds adversarial outputs that score high but are gibberish > Mode collapse: Policy degenerates to only generate “safe” high-reward outputs > Reward model brittleness: 75% accuracy on test set → 100% confident predictions in RL > Training instability: PPO hyperparameters require black magic to converge You’re building a skyscraper on quicksand. One unstable component breaks everything. The complexity comparison: RLHF pipeline: - SFT on demonstrations (1 week) - Train reward model on preferences (2 days) - PPO training against reward model (1-2 weeks, often fails) - Extensive hyperparameter tuning (pray to the RL gods) DPO pipeline: - SFT on demonstrations (1 week) - Train directly on preferences (2 days) Done. No RL, no reward model. RLHF: 3+ weeks, unstable. DPO: 9 days, stable. The fundamental difference that matters: RLHF objective: > maximize E[reward_model(policy(x))] - β × KL(policy || base) > Requires RL (PPO/REINFORCE) > Reward model is separate neural net > Unstable optimization landscape DPO objective: > maximize log(σ(β × log(π(y_w|x)/π(y_l|x)/π_ref))) > Direct supervised learning on preferences > No reward model needed > Stable gradient descent That eliminated reward model changes everything. No more broken telephone. The performance gap that surprised everyone: MT-Bench scores (GPT-4 as judge): Llama 2 base: 4.2/10 Llama 2 + RLHF: 6.9/10 Llama 2 + DPO: 7.1/10 DPO beats RLHF despite being “just” supervised learning. The simplicity IS the feature.

319

33,045

anshuman · Oct 21, 2025 · 12:30 PM UTC

anshuman

@athleticKoder

21 Oct 2025

Techniques I'd master to build great evals for AI apps. 1. LLM-as-Judge 2. Reference-based similarity metrics 3. Pairwise comparison tournaments 4. Human-in-the-loop evaluation 5. Synthetic data generation 6. Adversarial test case creation 7. Multi-dimensional rubrics 8. Regression testing on golden datasets 9. A/B testing with live traffic 10. Statistical significance testing 11. Evaluation dataset curation & versioning 12. Domain-specific benchmarks 13. Red teaming & jailbreak testing 14. Latency & cost monitoring 15. User feedback loops 16. Calibration & confidence scoring

304

15,778

anshuman · May 3, 2025 · 1:51 PM UTC

anshuman

@athleticKoder

3 May 2025

You won't believe this is Bengaluru.

294

14,170

anshuman · Nov 5, 2025 · 5:27 AM UTC

anshuman

@athleticKoder

5 Nov 2025

gm chat what are you cooking today?

291

20,851

anshuman · Sep 19, 2025 · 3:50 AM UTC

anshuman

@athleticKoder

19 Sep 2025

over past week I've been studying RL environments deeply. a blog is coming up soon. i can say this for now, evals are good enough for LLMs, but for agents we need environments where it can learn with feedback. this blog will be mostly about writing environments with verifiers. @willccbb and @PrimeIntellect have been doing some very impactful work!

289

27,643

anshuman · May 28, 2025 · 4:33 AM UTC

anshuman

@athleticKoder

28 May 2025

Super excited to announce that I've started working as AI Consultant for Google. I'll mainly be working with the Keras team, contributing to KerasHub.

280

14,724

anshuman · Oct 23, 2025 · 12:30 PM UTC

anshuman

@athleticKoder

23 Oct 2025

You're in a ML Inference Engineer Interview at Meta, and the interviewer asks: "Why do we need KV cache? Can't we just recompute attention for every new token?" Here's how you answer:

280

26,869

anshuman · Nov 5, 2025 · 12:30 PM UTC

anshuman

@athleticKoder

5 Nov 2025

I spent 2 weeks building an eval framework from scratch. Then I saw how Anthropic and OpenAI actually do evals. I did literally everything wrong. Here is what I learned.

283

35,964

anshuman · Sep 11, 2024 · 6:40 PM UTC

anshuman

@athleticKoder

11 Sep 2024

Daily ML & CS GRIND > distributed databases > asynchronous replication > multi leader replication across multiple datacenters > designing data intensive applications 12/n

234

12,442

anshuman · Oct 14, 2025 · 12:30 PM UTC

anshuman

@athleticKoder

14 Oct 2025

You're in a Research Scientist interview at Meta. The interviewer asks: "How would you implement speculative decoding to improve inference latency? What's the fundamental tradeoff?" You answer: "I'll use a smaller model to predict tokens ahead of time" Interview over. Here's what you missed:

247

28,407

anshuman · Sep 22, 2024 · 1:41 PM UTC

anshuman

@athleticKoder

22 Sep 2024

anshuman

@athleticKoder

21 Sep 2024

229

43,120

anshuman · Sep 11, 2025 · 10:57 AM UTC

anshuman

@athleticKoder

11 Sep 2025

The Core Concept - Speculative decoding is an inference optimization that speeds up autoregressive generation without sacrificing quality. It uses two models working together – a small, fast 'draft model' that proposes multiple tokens ahead, and our main 'target model' that verifies these proposals in parallel.

241

37,122

anshuman · Oct 24, 2025 · 1:04 PM UTC

anshuman

@athleticKoder

24 Oct 2025

Concepts I'd master to build production ML systems with PyTorch. Bookmark this 👇 1. TorchScript for model serialization 2. torch.compile for 2x speedups 3. Distributed training with DDP/FSDP 4. Mixed precision with torch.amp 5. Custom CUDA kernels with Triton 6. Model quantization (PTQ & QAT) 7. TorchServe for model deployment 8. Lightning for cleaner training loops 9. Dataset optimization with DataLoader workers 10. Profiling with torch.profiler 11. ONNX export for cross-platform inference 12. Gradient accumulation for large batches 13. Learning rate scheduling strategies 14. Model checkpointing & recovery 15. TorchVision/Audio/Text domain libraries 16. Integration with HuggingFace ecosystem

228

11,222

anshuman · Sep 25, 2025 · 1:37 PM UTC

anshuman

@athleticKoder

25 Sep 2025

People often get confused between Self-Attention and Cross-Attention in transformers. Here’s the difference:

203

16,755

anshuman · Jul 6, 2024 · 6:14 PM UTC

anshuman

@athleticKoder

6 Jul 2024

Presenting minbpe.c: a Pure C implementation of minbpe originally authored by Andrej Karpathy (@karpathy) in Python github.com/shivance/minbpe.c minbpe.c is a minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C in just a single file. it's been a while since I coded and C and my knowledge got rusty. (rust != @rustlang here xD). what better project than implementing a basic tokenizer to refresh it? Unfortunately, we don't have by default doc builder in C, unlike Rust, so I have implemented detailed docstrings, for ease of understanding of users. Byte Pair Encoding was introduced in 1994 and became a widely used tokenizing strategy in Natural Language Processing (a field of Machine Learning). anon, what are you building @_buildspace / @_nightsweekends ?

GitHub - kanpuriyanawab/minbpe.c: a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm...

a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C. - kanpuriyanawab/minbpe.c

github.com

203

18,476

anshuman · Nov 3, 2024 · 3:40 PM UTC

anshuman

@athleticKoder

3 Nov 2024

The job market for freshers & recent college grads in 2024 is TOUGH. And it is going to get HARDER in 2025. Moreover, applying through job portals is not helpful at all. In this video, I reveal the personal strategy that I used when I was in college: I hope my juniors find it useful.

202

22,486

anshuman · Sep 13, 2024 · 5:50 PM UTC

anshuman

@athleticKoder

13 Sep 2024

Daily ML & CS GRIND > batch processing starters > unix as a simple batch processor - log analysis > Testing ml models > ML Unit testing 14/n (link to resources in comment)

168

9,986

anshuman · Jul 6, 2024 · 6:41 PM UTC

anshuman

@athleticKoder

6 Jul 2024

Karpathy Senpai noticed . This day shall be remembered.

179

14,556

anshuman · Jul 15, 2025 · 11:21 AM UTC

anshuman

@athleticKoder

15 Jul 2025

Perplexity Comet got me hooked, but I felt something was missing. So I built an Open Source AI browser from scratch! I am calling it Nebula and unlike any other browser out there it enables you to create apps as a tab! In this preview launch, I demonstrate: Building a tic-tac-toe game, excalidraw like drawing tool and a stock-portfolio tracker right from your search bar! In this approach, I bring GPT4o to browser and let it take control of chromium's V8 javascript engine. And this is quite contrary to what cursor / windsurf do - bringing AI to the IDE. Building apps should be easy, and I think that this is the way to do so. By the way the code for this preview launch is available on Github. I drove inspiration from @PerplexityComet and @browsercompany 's Dia.

218

37,909

anshuman · Nov 4, 2025 · 2:22 PM UTC

anshuman

@athleticKoder

4 Nov 2025

whatsapp web down. time to touch some grass.

161

25,921

anshuman · Dec 22, 2024 · 5:03 PM UTC

anshuman

@athleticKoder

22 Dec 2024

December 2023 - overweight & sick December 2024 - big guns & long runs

anshuman

@athleticKoder

15 Dec 2024

I did it. I finally did it. I ran 10k in Phonpe Midnight Marathon (my first ever). Road to 10k was long. I am so grateful to following people for unlocking this version of me: 1. @Naina_2728 for showing me what is possible with running, how fun it can be and how many new connections you can make through it. 2. @lucifer_x007 and @pathikghugare for pushing me always and staying with me for first 2.5k 3. @asmitaakamboj was like a guiding light i guess, seeing her active and posting her progress daily made me believe that consistency is possible. 4. @xerefic and @theadityadas for filling me in the information I needed to make the long run possible 5. Mahendra for encouragement I needed on the D-Day. Thank you soooo much and love you guys❤️

145

9,269

anshuman · Jan 14, 2024 · 8:23 AM UTC

anshuman

@athleticKoder

14 Jan 2024

To the new beginnings ✨. Moving to Bangalore from Lucknow 🙃!

152

20,335

anshuman · May 7, 2025 · 12:14 PM UTC

anshuman

@athleticKoder

7 May 2025

I wrote Qwen Mixture-of-Experts Model from scratch. Works with JAX, PyTorch and Tensorflow. It's Open Source. github.com/keras-team/keras-…

153

6,135

anshuman · Jan 3, 2025 · 5:08 AM UTC

anshuman

@athleticKoder

3 Jan 2025

i have some personal news to share. our paper titled KerasCV and KerasNLP: Multi Framework Models has been finally published in Journal of Machine Learning Research. jmlr.org/papers/v25/24-0404.… here is to my journey - started as self learner in ML in 2019 - contributed to Open Source heavily over last two year - got a golden opportunity during google summer of code 2023 to contribute to keras 3 working closely with the team (@fchollet , @mattdangerw , @martin_gorner et. al) - contributed on fundamental building blocks of transformers when LLMs were not a cool thing - multi-backend implementation (Tensorflow, pytorch and jax agnostic) of llama2, gpt-neox and more. these became reference for gemma! - did all this in last semester, handling gsoc, the internship (through uni), final year project.. all simultaneously involving a lot of commute between Pune and Warangal. we made it guys! 🧿 thanks @penstrokes75 for all your support💪🏻

142

21,175

anshuman · Aug 13, 2025 · 4:28 AM UTC

anshuman

@athleticKoder

13 Aug 2025

I'm actively looking for my next full time role. Over last 5-6 months I've been working as AI Consultant for Google, part-time. I quit my previous role as Machine Learning Engineer ~2.5 months back. While I took small break, I explored a lot of cool shit, wrote some blogs that I really enjoyed. You could find link to my portfolio in the comments

138

34,659

anshuman · Aug 28, 2025 · 11:25 AM UTC

anshuman

@athleticKoder

28 Aug 2025

Replying to @UpTopAli

sed

131

25,160

anshuman · Aug 11, 2025 · 4:56 PM UTC

anshuman

@athleticKoder

11 Aug 2025

I'm genius. Launching Falafel, a free open-source plugin for Davinci resolve powered by @FAL . Now you can generate videos from images right from your video editor. In this launch, only Wan 2.2 model is supported.

130

20,336

anshuman · Sep 10, 2025 · 5:13 AM UTC

anshuman

@athleticKoder

10 Sep 2025

Got in a huge fight last night. She saw a dating app on my phone. She asked : "Why is that still here? Are you still looking for someone better?” I didn't have an answer. But now I do. I wasn't looking for someone better, I was just terrified of missing out on the best. This is the core dilemma of RL Agents: Exploration vs. Exploitation. It’s the hardest choice for any thinking thing, human or machine. Exploitation: Enjoying the great thing you have. (Deleting the app and being happy with my amazing girlfriend). Exploration: Searching for a potentially perfect option. (Keeping the app, just in case). You can't do both. This choice is the engine behind how AI learns. Think of it as the 90/10 Rule. Most of the time (90%), an AI will exploit. It sticks with the best choice it's found so far because it's a proven winner. But sometimes (10%), it explores. It takes a random chance on something new, just to gather more data and make sure it hasn't settled too early. This is how everything learns: - YouTube shows you a video from a creator you love (exploit), then suggests a totally new channel (explore). - Robots in a factory use the motion they know works (exploit), but will try a new one to see if it's faster (explore). - Doctors use a trusted treatment (exploit), while researchers test new drugs (explore). The magic is that the AI teaches itself how to balance this. It learns when to stop looking and commit to the best strategy it has found. It’s learning how to conquer the fear of missing out. So yeah, I guess I'm just an algorithm that’s afraid of settling for a good answer before it's found the great one. Anyway, I'm single now. My phone is fully dedicated to exploration.

134

12,982

anshuman · May 5, 2023 · 5:53 AM UTC

anshuman

@athleticKoder

5 May 2023

128

78,849

anshuman · Sep 18, 2025 · 12:50 PM UTC

anshuman

@athleticKoder

18 Sep 2025

Retrieval Metrics (Did we get the right context?) 1️⃣ Contextual Relevancy: What % of retrieved chunks actually matter? 2️⃣ Contextual Recall: Did we retrieve ALL the info needed? 3️⃣ Contextual Precision: Are relevant chunks ranked higher than junk?

126

34,416

anshuman · Aug 28, 2025 · 2:02 PM UTC

anshuman

@athleticKoder

28 Aug 2025

Replying to @co_foundr

it iz wat it iz

124

31,342

anshuman · Jan 8, 2025 · 1:39 PM UTC

anshuman

@athleticKoder

8 Jan 2025

i left competitive programming for ML in 2021

118

6,713

anshuman · Sep 18, 2025 · 12:50 PM UTC

anshuman

@athleticKoder

18 Sep 2025

The fundamental insight: RAG quality = Retriever Performance × Generator Performance If either component scores zero, your entire system fails. It's multiplication, not addition. You can't compensate for bad retrieval with a better LLM.

117

35,656

anshuman · Sep 11, 2025 · 10:57 AM UTC

anshuman

@athleticKoder

11 Sep 2025

How It Works Technically The process runs in a loop: Draft model predicts the next K tokens Target model runs one forward pass to verify all K tokens in parallel We accept the longest prefix where both models agree Target model generates the next token after the accepted sequence Repeat with the extended sequence The key insight is that verification is much cheaper than generation when done in parallel.

117

27,276

anshuman · Oct 5, 2025 · 1:33 PM UTC

anshuman

@athleticKoder

5 Oct 2025

love has a train/test split: dating is training data. candlelit dinners, best behavior, everyone performing their highest eval scores. you're both optimizing for the same metric: being chosen. but marriage is the test set. food poisoning at 3am. financial stress. grief that doesn't end. the third time you've had the same fight. boredom that stretches for months. the real eval isn't 'can you be lovable when you're trying?' it's 'who are you when the conditions are out-of-distribution?' when there's no audience. when you're not getting positive reinforcement. when the reward signal is delayed by years or maybe never comes. in ML, we do train/test split because models can just memorize. they can achieve perfect training accuracy by learning the dataset's quirks instead of the underlying pattern. people do this too. they learn to perform 'good partner' - the right words, the right gestures, the right conflict resolution scripts. perfect training accuracy. then life deploys them into production and everything breaks. because they memorized the pattern of being loved. they didn't learn how to love. the test set reveals what you actually optimized for. and you can't hack the test set. it's held-out for a reason. that person who's patient in month two of dating but cruel in year two of marriage? overfit to the training distribution. the one who's generous when dopamine is high but withholding when things are routine? memorized the easy examples. real love is generalization. it's performing well on data you've never seen. on versions of each other you didn't know existed. on challenges that weren't in the training set. and here's the thing: you can't know if love is real until you hit the test set. you have to deploy to production to find out what you actually built. maybe that's why they call it a leap of faith.

115

16,995

anshuman · Sep 23, 2024 · 3:22 AM UTC

anshuman

@athleticKoder

23 Sep 2024

Finally published my first-ever video on YouTube. piped.video/UVAItmSkkvE?si=fhJM… Hack the GSoC: The Untold Strategy for Preparation and Selection in Google Summer of Code I hope it helps out College Students.

111

7,581

anshuman · Sep 24, 2024 · 5:40 PM UTC

anshuman

@athleticKoder

24 Sep 2024

Daily ML Grind - 17/n > Started with RAG++ course > Learning about Advanced Retrieval Augmented Generation systems > Wandbot is amazing > Completed first module > 80/20 rule: no system is perfect, but excellence is attainable Resource: wandb.courses/courses/take/r… Thanks, instructors!

ALT Advanced RAG Course by weights & biases

109

5,410

anshuman · Sep 11, 2025 · 3:40 PM UTC

anshuman

@athleticKoder

11 Sep 2025

Looking to follow people doing cool work in ML - Inference, Eval, Training, Applied. Please recommend below 👇

113

11,464

anshuman · Nov 24, 2023 · 4:57 PM UTC

anshuman

@athleticKoder

24 Nov 2023

such a comprehensive notebook on parallelism and distributed training by @A_K_Nain . kaggle.com/code/aakashnain/p… still don't understand how @kaggle algorithm can be this bad, promoting linear regression notebooks and supressing rich contents like this. disappointed.

Parallelization and Distributed Training in JAX

Explore and run AI code with Kaggle Notebooks | Using data from No attached data sources

kaggle.com

106

11,635

anshuman · Dec 24, 2024 · 12:49 PM UTC

anshuman

@athleticKoder

24 Dec 2024

Neural Networks Explained 🧵✨ 1/ What Are Neural Networks? Neural networks are the backbone of modern AI, mimicking the human brain to recognize patterns, make decisions, and learn from data. Whether it's powering voice assistants or driving cars, they're everywhere! 🚗💡 2/ The Building Blocks: Neurons & Layers At their core, neural networks consist of neurons organized in layers: Input Layer: Receives the initial data. Hidden Layers: Process and extract features. Output Layer: Produces the final result. 🔄🔍 3/ Understanding the Input Layer 📊 Imagine your network starts with inputs like x₁ and x₂. The raw data points fed into the system are represented as squares in the diagram. They're the foundation upon which everything builds! 4/ Diving into Hidden Layers Hidden layers are where the magic happens! Each neuron applies activation functions (labeled 'z; f') to transform inputs into meaningful patterns. Multiple blue and green nodes symbolize this complex processing. 🌀✨ 5/ Output Layer & Predictions After processing, the network delivers an output. A single purple node represents the prediction (y') in our diagram. This is the network's final answer based on the learned patterns. 🎯📈 6/ Forward Propagation Explained Data flows from left to right (shown by the coral red arrow) through the network layers, transforming input into output. This is called forward propagation – the first step in making predictions! ⬅️➡️ 7/ Backward Propagation: Learning from Mistakes To improve, the network adjusts itself by flowing information backward (pink arrow). This backward propagation helps minimize errors by tweaking the connections. 🔄📉 8/ Loss Function & Optimization Our diagram shows predictions vs. true values connected to a loss function, which calculates the error. The optimizer then adjusts the network to reduce this loss, refining accuracy over time. 📉🔧 9/ Iterative Training Process Training is an ongoing loop! Blue arrows indicate the iterative process of adjusting until the loss is minimized. It's like teaching the network to get better with each cycle. 🔁💪 10/ Real-World Applications From image recognition to natural language processing, neural networks are revolutionizing industries. Understanding their structure helps us harness their full potential! 🚀🌐 Thanks for following along! Want to dive deeper into neural networks? Drop your questions below! 👇✨

5,847

anshuman · Dec 15, 2024 · 4:34 PM UTC

anshuman

@athleticKoder

15 Dec 2024

101

15,892

anshuman · Aug 16, 2024 · 5:55 AM UTC

anshuman

@athleticKoder

16 Aug 2024

I'm so excited and ecstatic to announce that ... ( JMLR it is ) The paper that I co-authored with the Keras ( @fchollet et. al.) team at @Google , has been accepted in the prestigious Journal of Machine Learning Research ( @JmlrOrg ). My journey began with the incredible honor of being accepted into the Google Summer of Code program, a turning point that led me to become a dedicated, long-time contributor to Keras. I was fortunate to have the opportunity to shape the evolution of multi-backend Keras3, contributing while it was still a work in progress. Now, as I reflect on the countless model backbones, pipelines, and tutorials I helped build, a wave of nostalgia washes over me, reminding me of the profound impact those moments have had on my path.

anshuman

@athleticKoder

14 Aug 2024

HUGE NEWS INCOMING

101

17,425

anshuman · Sep 19, 2023 · 9:51 AM UTC

anshuman

@athleticKoder

19 Sep 2023

Graduated from @warangal_nit in Class of '23. #Graduation #convocation

3,958

anshuman · Sep 18, 2025 · 12:50 PM UTC

anshuman

@athleticKoder

18 Sep 2025

Most candidates say "check accuracy" or "run more tests." Wrong approach. RAG systems fail at TWO distinct stages, and you need different metrics for each. Generic accuracy won't tell you WHERE the problem is.

36,698

anshuman · Aug 28, 2025 · 7:48 AM UTC

anshuman

@athleticKoder

28 Aug 2025

Replying to @infinit3e_

Good

15,866

anshuman · Sep 11, 2025 · 10:57 AM UTC

anshuman

@athleticKoder

11 Sep 2025

Why We Need It? Traditional LLM inference has two major bottlenecks. First, we generate tokens sequentially – each token requires a full forward pass before we can start the next one. Second, this leaves our GPUs underutilized since we can't parallelize future token computation. With speculative decoding, when the target model verifies multiple draft tokens simultaneously, we're making much better use of our compute resources.

33,656