maximizing shareholder value; mostly ml here;

She dumped me last night. Not because I don't listen. Not because I'm always on my phone. Not even because I forgot our anniversary (twice). But because, in her exact words: "You only pay attention to the parts of what I say that you think are important." I stared at her for a moment and realized... She just perfectly described the attention mechanism in transformers. Turns out I wasn't being a bad boyfriend. I was being mathematically optimal. See, in conversations (and transformers), you don't give equal weight to every word. Some words matter more for understanding context. Attention figures out exactly HOW important each word should be. Here's the beautiful math: Attention(Q, K, V) = softmax(QK^T / √d_k)V Breaking it down: Q (Query): "What am I looking for?" K (Key): "What info is available?" V (Value): "What is that info?" d_k: Key dimension (for scaling) Think library analogy: You have a question (Query). Books have titles (Keys) and content (Values). Attention finds which books are most relevant. Step-by-step with "The cat sat on the mat": Step 1: Create Q, K, VEach word → three vectors via learned matrices W_Q, W_K, W_V For "cat": Query: "What should I attend to when processing 'cat'?" Key: "I am 'cat'" Value: "Here's cat info" Step 2: Calculate scoresQK^T = how much each word should attend to others Processing "sat"? High similarity with "cat" (cats sit) and "mat" (where sitting happens). Step 3: Scale by √d_kPrevents dot products from getting too large, keeps softmax balanced. Step 4: SoftmaxConverts scores to probabilities: "cat": 0.4 (subject) "sat": 0.3 (action) "mat": 0.2 (location) "on": 0.1 (preposition) "the": 0.1 (article) Step 5: Weight valuesMultiply each word's value by attention weight, sum up. Now "sat" knows it's most related to "cat" and "mat". Multi-Head Magic:Transformers do this multiple times in parallel: Head 1: Subject-verb relationships Head 2: Spatial ("on", "in", "under") Head 3: Temporal ("before", "after") Head 4: Semantic similarity Each head learns different relationship types. Why This Changed Everything: Before: RNNs = reading with flashlight (one word at a time, forget the beginning) After: Attention = floodlights on entire sentence with dimmer switches This is why ChatGPT can: Remember 50 messages ago Know "it" refers to something specific Understand "bank" = money vs river based on context The Kicker:Models learn these patterns from data alone. Nobody programmed grammar rules. It figured out language structure just by predicting next words. Attention is how AI learned to read between the lines. Just like my therapist helped me understand my focus patterns, maybe understanding transformers helps us see how we decide what matters. Now if only I could implement multi-head attention in dating... 🤖 Still waiting for "scaled dot-product listening" to be invented.
614
985
11,672
719,320
this paper changed my life
110
462
7,547
556,802
You're in a ML Engineer interview at Perplexity, and the interviewer asks: "Your RAG system is hallucinating in production. How do you diagnose what's broken - the retriever or the generator?" Here's how you can answer:
47
214
2,887
361,062
You're in an ML inference engineer interview at Anthropic, and the interviewer asks: "Can you explain speculative decoding and why we'd want to use it?" Here's how you can answer:
29
154
2,826
325,716
She dumped me last night. Not because I don't listen. Not because I'm always on my phone. Not even because I forgot our anniversary (twice). But because, in her exact words: "You only pay attention to the parts of what I say that you think are important." I stared at her for a moment and realized... She just perfectly described the attention mechanism in transformers. Turns out I wasn't being a bad boyfriend. I was being mathematically optimal. See, in conversations (and transformers), you don't give equal weight to every word. Some words matter more for understanding context. Attention figures out exactly HOW important each word should be. Here's the beautiful math: Attention(Q, K, V) = softmax(QK^T / √d_k)V Breaking it down: Q (Query): "What am I looking for?" K (Key): "What info is available?" V (Value): "What is that info?" d_k: Key dimension (for scaling) Think library analogy: You have a question (Query). Books have titles (Keys) and content (Values). Attention finds which books are most relevant. Step-by-step with "The cat sat on the mat": Step 1: Create Q, K, VEach word → three vectors via learned matrices W_Q, W_K, W_V For "cat": Query: "What should I attend to when processing 'cat'?" Key: "I am 'cat'" Value: "Here's cat info" Step 2: Calculate scoresQK^T = how much each word should attend to others Processing "sat"? High similarity with "cat" (cats sit) and "mat" (where sitting happens). Step 3: Scale by √d_kPrevents dot products from getting too large, keeps softmax balanced. Step 4: SoftmaxConverts scores to probabilities: "cat": 0.4 (subject) "sat": 0.3 (action) "mat": 0.2 (location) "on": 0.1 (preposition) "the": 0.1 (article) Step 5: Weight valuesMultiply each word's value by attention weight, sum up. Now "sat" knows it's most related to "cat" and "mat". Multi-Head Magic:Transformers do this multiple times in parallel: Head 1: Subject-verb relationships Head 2: Spatial ("on", "in", "under") Head 3: Temporal ("before", "after") Head 4: Semantic similarity Each head learns different relationship types. Why This Changed Everything: Before: RNNs = reading with flashlight (one word at a time, forget the beginning) After: Attention = floodlights on entire sentence with dimmer switches This is why ChatGPT can: Remember 50 messages ago Know "it" refers to something specific Understand "bank" = money vs river based on context The Kicker:Models learn these patterns from data alone. Nobody programmed grammar rules. It figured out language structure just by predicting next words. Attention is how AI learned to read between the lines. Just like my therapist helped me understand my focus patterns, maybe understanding transformers helps us see how we decide what matters. Now if only I could implement multi-head attention in dating... 🤖 Still waiting for "scaled dot-product listening" to be invented.
99
249
2,633
196,192
"Just use OpenAI API" Until you need: - Custom fine-tuned models - <50ms p99 latency - $0.001/1K tokens (not $1.25/1K input) Then you build your own inference platform. Here's how to do that:
57
131
2,377
377,369
I rejected a job offer yesterday. Not because of the salary. Not because of the tech stack. Not even because of the long hours they warned me about. But because, when I asked how they evaluate their AI systems, the hiring manager said: "We just ask it some questions and see if the answers sound right." I stared at them for a moment and realized... They just described the biggest problem in AI today. See, "sounds right" isn't a measurement. It's a hope. Here's what proper LLM evaluation actually looks like: - Accuracy: Can it get factual questions right? (Not 80% of the time. Consistently.) - Hallucination rate: How often does it make things up? (This should be near zero for critical applications.) - Bias metrics: Does it treat all groups fairly? (Measured across demographics, not assumed.) Real Evaluation Frameworks: - BLEU scores for translation quality Perplexity for language modeling Human evaluation with inter-annotator agreement Adversarial testing (red teaming) Domain-specific benchmarks (legal, medical, financial) The Process: > Define success criteria BEFORE deployment > Create diverse test sets (not just happy paths) > Measure consistently across model versions > Track performance over time (models drift) Have humans validate edge cases Why This Matters: Before proper evals: "Our model is amazing!" (based on cherry-picked examples) After proper evals: "Our AI achieves 94.2% accuracy on domain X, with known failure modes Y and Z" The difference? One builds trust. The other destroys it when reality hits. The kicker: Most companies are still in the "sounds right" phase. They're deploying models evaluated by vibes, not metrics. Just like you wouldn't join a team that deploys code without tests, you shouldn't join one that deploys AI without proper evaluation. What's your experience with LLM evaluation? Are we measuring what actually matters?
97
142
1,757
283,375
You're in a ML Engineer interview at Groq, and the interviewer asks: "How do you measure LLM inference performance? What metrics matter most for production systems?" Here's how you can answer
20
93
1,489
136,413
career update: joined zomato as Machine Learning Engineer 2
115
9
1,436
166,212
Techniques I’d master if I wanted to make LLMs faster + cheaper. 1. Quantization 2. KV-Cache Quantization 3. Flash Attention 4. Speculative Decoding 5. LoRA 6. Pruning 7. Knowledge Distillation 8. Weight Sharing 9. Sparse Attention 10. Batching & Dynamic Batching 11. Model Serving Optimization 12. Tensor Parallelism 13. Pipeline Parallelism 14. Paged Attention 15. Mixed Precision Inference 16. Early Exit / Token-Level Pruning
10
162
1,263
60,845
You’re in a AI Engineer interview at Microsoft, and the interviewer asks: ‘Our team needs to build RAG over 10M documents. Which vector database and why?’ Here’s how you answer:
38
105
1,239
153,700
software Engineers have a runway of 5 years left
67
41
1,109
85,184
You're in a ML Engineer interview at Anthropic, and the interviewer asks: "Your LLM inference is running out of GPU memory with long conversations. How do you fix this?" Here's how you answer:
22
87
1,103
118,562
ML concepts every data scientist should know for interviews: Bookmark this. 1. Bias-Variance Tradeoff 2. Cross-Validation Strategies 3. Regularization (L1, L2, Elastic Net) 4. Class Imbalance & Sampling Techniques 5. Feature Engineering & Selection 6. Overfitting vs Underfitting 7. Evaluation Metrics (beyond accuracy) 8. Hyperparameter Tuning 9. Train-Test Data Leakage 10. Ensemble Methods 11. Dimensionality Reduction 12. Model Interpretability (SHAP, LIME) 13. Gradient Descent Variants 14. Activation Functions & Neural Networks 15. Imbalanced Dataset Handling 16. Production Model Monitoring
16
112
1,068
57,626
"Just use Vector Database" Until you need: - 100M+ vectors indexed - <10ms p95 search latency - $50/month (not $500/month) Then you build your own vector database. Here's what that actually means:
37
67
1,085
147,233
“Just rent a GPU for training” Until you need: - Multi-node training for 70B+ models - $5/hour per GPU (not $30/hour) - 90%+ GPU utilization Then you build your own ml infra. Here’s the reality:
31
72
1,069
149,357
Techniques I'd master if building RAG systems that actually work: Bookmark this. 1. Sliding Window Chunking 2. Semantic Chunking 3. Document Hierarchies 4. Metadata Enrichment 5. Query Expansion 6. Hybrid Search 7. Reranking Models 8. Context Window Packing 9. Lost in the Middle Problem 10. Hypothetical Document Embeddings (HyDE) 11. Multi-Query Retrieval 12. Contextual Compression 13. Sentence Window Retrieval 14. Auto-Merging Retrieval 15. Cross-Encoder Rescoring 16. Temporal Context Decay 17. Negative Sampling 18. MMR (Maximal Marginal Relevance) 19. Graph-Based Retrieval 20. Recursive Retrieval 21. Citation Trackingchunks 22. Context Ablation Testing 23. Adaptive Retrieval
19
101
1,037
62,698
You're in a ML Inference engineer interview at Google, and the interviewer asks: "What's the real bottleneck in LLM serving throughput? How can PagedAttention help?" Here's how you can answer:
21
69
985
90,324
one of my favourite ML Youtube Channel lately.
8
103
932
74,409
You're in an ML Engineer interview at Perplexity, and the interviewer asks: "Your LLM generates millions of responses daily. How do you evaluate quality without manual review?" Here's how you answer:
15
72
946
168,886
You're in a ML Engineer interview at Meta, and the interviewer asks: "Why does RL work better than supervised learning for LLMs?" Here's how you answer:
14
38
950
90,106
life of a machine learning engineer
31
19
879
57,739
We are hiring for 2 Machine Learning Engineers in Bangalore office. you'll work directly with me on super impactful projects Drop your best work in the comments👇 and I will personally reach out to you if you are a fit. Please Don't DM!
138
38
933
126,028
Research Scientist interview at Google. Interviewer: "You need to quantize a model from FP16 to INT8. Walk me through how you'd do it without destroying quality." Your answer: "I'll just convert all weights to INT8 format" ❌ Rejected. Here's the critical mistake: Don't say: "Quantization reduces precision" or "Use 8 bits instead of 16." Too surface-level. The real answer is the outlier feature problem. INT8 quantization fails because 0.01% of activation values are 100× larger than the rest. Your quantization range is wasted on outliers. You're compressing a skyscraper and a house with the same ruler. Here's why naive quantization destroys quality: FP16 weights → Scale to [-127, 127] → Store as INT8 → 2× memory reduction Problem: Activation outliers exist at specific feature dimensions. 6,000 features: Normal distribution (-0.5 to +0.5) 144 features: Outliers (100× larger, -50 to +50) Those 144 features control 90% of model quality. btw subscribe to my newsletter to get these posts for free - fullstackagents.substack.com The outlier math is brutal: - Quantization range: [-127, 127] = 254 values - One outlier at value=50: Forces scale = 50/127 = 0.39 - Normal value 0.3: Quantized to round(0.3/0.39) = 1 - Actual range used: 2 out of 254 values - Precision loss: 99.2% You're using a bathroom scale to weigh an ant and an elephant together. Remember: Outliers are dimension-specific, not token-specific: - Feature dim 2,145: Always outlier (±40 to ±60) - Feature dim 891: Always normal (±0.2 to ±0.5) - Pattern stable across ALL prompts, ALL batches - Position-independent, dimension-dependent This changes everything. > Bad approach: Single quantization scale per tensor Convert all weights uniformly Model perplexity: 12.3 → 2,847 (destroyed) > Good approach (LLM.int8()): Identify outlier feature dimensions (top 0.5%) Keep outliers in FP16 (144 dims) Quantize normal features to INT8 (5,856 dims) Model perplexity: 12.3 → 12.6 (preserved) The difference is mixed precision, not uniform precision. Memory bandwidth bottleneck: - FP16: Load 140GB from VRAM per forward pass - INT8: Load 70GB from VRAM per forward pass - Throughput: 1.8× higher - Quality: 98% maintained Cost reduction: FP16: 70B model needs 4× A100 GPUs INT8: 70B model fits 2× A100 GPUs Cost: Cut in half
14
70
929
85,007
You're in a ML Engineer interview at Perplexity , and they ask: "Your RAG system retrieves at 80% accuracy but only answers correctly 50% of the time. What's wrong?" Here's is how you answer:
15
62
911
117,137
You're in a ML Engineer interview at Google, and the interviewer asks: "GPUs vs TPUs which one to choose?" Here's how you answer:
14
32
832
89,473
Techniques to Master for Faster + Cheaper LLM Inference 1. Quantization (INT8/INT4/FP8) 2. KV-Cache Optimization (quantization, compression, eviction) 3. Flash Attention 4. Speculative Decoding 5. Continuous Batching 6. Paged Attention / vLLM-style memory management 7. Tensor Parallelism 8. Pipeline Parallelism 9. Prompt Caching (caching prefixes/system prompts) 10. Mixed Precision Inference 11. Chunked Prefill 12. Medusa / Multi-token prediction 13. Attention Sinks (streaming/infinite context) 14. Kernel Fusion & Custom CUDA kernels 15. Request Scheduling & Priority Queues
25
99
890
74,626
I am told that this is the best playlist for learning distributed systems.
10
75
751
35,429
You're in a Research engineer interview at OpenAI, and the interviewer asks: "How do you train your model for Computer Use? Can RL solve this? " Here's how you can answer:
8
29
730
95,410
Okay, guys, it's time to give back. Juniors preparing for GSoC 2025, I will create an ultimate video series on Google Summer of Code, on YT and X & I will unveil the strategy that led me to selection. Not only that, I am gonna tell you how can GSoC turn your life around and open your path for a successful career, if you do it the right way! If there is any specific thing you wanna know or whatever questions you have, feel free to drop them in the comments.
I've waited for this email for one and a half year, Today I can't be happier to announce that I have been selected in the Google Summer of Code Program at @TensorFlow organization. #opensource #MachineLearning #Google #GSOC #GSoC2023
35
20
583
78,949
this has to be the best playlist ever created on FastAPI
5
58
557
28,846
how i started deep learning in 2019
13
8
531
38,635
Techniques I'd master to fine-tune LLMs in production. Bookmark this 1. LoRA & QLoRA for parameter-efficient fine-tuning 2. PEFT library for adapter methods 3. Instruction tuning 4. Dataset formatting (ChatML, Alpaca, ShareGPT) 5. DeepSpeed ZeRO for memory optimization 6. Flash Attention 2 for efficient training 7. Gradient checkpointing for longer contexts 8. BitsAndBytes for 4-bit/8-bit quantization 9. RLHF & DPO for alignment 10. Tokenizer training & vocabulary extension 11. Evaluation metrics (perplexity, ROUGE, human eval) 12. Unsloth for 2x faster fine-tuning 13. Multi-GPU strategies (FSDP, DDP)
8
46
574
32,031
You're in a ML Inference Engineer interview at Google, and the interviewer asks: "Our team wants to switch from Gemini API to a fine tuned. Which serving framework and why?" Here's how you answer:
9
35
568
59,019
A girl at my gym approached me after her workout, clearly annoyed. "I've been watching and copying your entire routine for weeks, but I'm not seeing the same improvements you are!" I explained, "You can't just mimic what I do - you need to understand which exercises deserve more focus for your specific goals." She nodded. And then she said, "Wait, isn't that like attention mechanism in ChatGPT? " And I know you're sitting there like: WTF is Attention Mechanism? Attention Mechanism is like that gym bro who knows exactly which exercises deserve maximum effort during each workout. How does it work in LLMs? You feed a sentence with multiple words to the model Each word "examines" ALL other words in the sentence It calculates "how much attention should I pay to each word?" Creates weighted connections based on relevance Important words get higher attention scores, others get ignored The Complete Math: Step 1: Create Query, Key, and Value matrices Query (Q) = What am I looking for? Key (K) = What information is available? Value (V) = The actual content to extract For each word position i: Q_i = X_i × W_Q (input × query weight matrix) K_i = X_i × W_K (input × key weight matrix) V_i = X_i × W_V (input × value weight matrix) Step 2: Calculate Attention Scores Score(i,j) = Q_i × K_j^T This tells us how much word i should pay attention to word j. Step 3: Scale the scores Scaled_Score = Score / √d_k Where d_k is the dimension of the key vectors (prevents exploding gradients). Step 4: Apply Softmax Attention_Weight(i,j) = Softmax(Scaled_Score(i,j)) Softmax formula: e^(x_i) / Σ(e^(x_k)) for all k This ensures all attention weights sum to 1. Step 5: Weighted Sum Output_i = Σ(Attention_Weight(i,j) × V_j) for all j Complete Formula: Attention(Q,K,V) = Softmax(QK^T / √d_k)V Sentence: "She wants to deadlift heavy weights" Let's say we have 3-dimensional embeddings (simplified): Word Embeddings: She = [1, 0, 0] wants = [0, 1, 0] deadlift = [1, 1, 1] heavy = [0, 0, 1] weights = [1, 0, 1] When processing "deadlift": Query for "deadlift" = [1, 1, 1] Calculate dot products (attention scores): deadlift → She: [1,1,1] · [1,0,0] = 1 deadlift → wants: [1,1,1] · [0,1,0] = 1 deadlift → deadlift: [1,1,1] · [1,1,1] = 3 deadlift → heavy: [1,1,1] · [0,0,1] = 1 deadlift → weights: [1,1,1] · [1,0,1] = 2 Raw scores: [1, 1, 3, 1, 2] After Softmax: She: e^1/(e^1+e^1+e^3+e^1+e^2) = 0.04 wants: 0.04 deadlift: e^3/(total) = 0.66 heavy: 0.04 weights: e^2/(total) = 0.22 Final attention weights: [0.04, 0.04, 0.66, 0.04, 0.22] Multi-Head Attention (the gym analogy): Think of it like having multiple personal trainers, each focusing on different aspects: Head 1: Focuses on exercise form and technique Head 2: Focuses on muscle groups being targeted Head 3: Focuses on safety and proper progression Each head has its own Q, K, V matrices and calculates attention independently, then results are concatenated. Mathematical representation: MultiHead(Q,K,V) = Concat(head_1, head_2, ..., head_h) × W_O Where each head_i = Attention(QW_i^Q, KW_i^K, VW_i^V) Why this revolutionized NLP: > Context Understanding – Mathematical precision in determining word relationships > Parallel Processing – All attention scores calculated simultaneously, not sequentially > Gradient Flow – Softmax ensures smooth gradients for training > Scalability – Works efficiently with sequences of any length Final Result: Attention Mechanism gave AI mathematical precision in focusing on what matters - just like how you calculate exactly which muscle groups need the most work based on your goals!
24
46
559
45,736
I wanted to learn Rust for a long time, and finally, it's time to open source what I've been up to these last 2 weekends. Presenting picograd, inspired by Andrej Karpathy (@karpathy)'s good old micrograd : github.com/shivance/picograd Here are my 7 takeaways from this project: 1. I used to think that Rust is similar to C++. It's not. It's way different!! You have to learn a whole bunch of new concepts and the learning curve is steep. 2. Rust design patterns are difficult and weird. If you are coming from a strong Object Oriented Programming (OOP) background, it might be a hard pill to swallow. 3. Rust is not for every Python developer. You must move from a dynamically typed language to a strongly typed one. And you face tons of warnings while writing your lines of code. 4. I loved the concept of ownership. And I think Mojo🔥 by @Modular does that too! I feel that Mojo is a hybrid of Python and Rust. MLIR Compiler instead of LLVM is a story for another day. 5. If you come from a C++ background, you are gonna be reminded of your good old friend templates. Generics used to be so much fun. 6. I just love the @rustlang 's build system, man it's so easy to set up than C++. 7. Rust compiler is your excellent friend, although it's annoying to see (until you don't) an error every time you compile your code, it's for the greater good. Rust wants to ensure your code is safe and free of bugs. I think Rust is a very good language for production code. Not to mention its performance & speed. It's been a fun ride at @_buildspace and @_nightsweekends, where I'm seeing so many people like me coming together to build stuff. I love the community's energy.
11
58
519
79,718
I wrote Qwen 2.5 from scratch. Works with JAX, PyTorch and Tensorflow. This marks my return to open source after an year. github.com/keras-team/keras-…
8
49
530
32,625
LoRA sounds simple: 1. Add low-rank matrices 2. Train with less memory 3. Get same results Reality: - Learning rate needs 10x adjustment - Layer selection changes everything - Batch size tolerance is different
10
27
503
38,170
Google ML Engineer Interview - Final Round Question: "Your inference costs are 10x higher than expected due to KV cache. How do you diagnose and fix this?" You: "I'll just increase my GPU memory to store more cache" Awkward silence Interview over. Here's why you failed: Don't say: "Add more memory" or "Optimize the cache size." Wrong framing. The real answer isn't about capacity - it's about memory fragmentation from contiguous allocation. Small sequences vs long sequences vs mixed batches = completely different memory behaviors. btw subscribe to my newsletter to get these posts in your inbox daily - fullstackagents.substack.com Most teams debug by checking total memory usage, not allocation patterns. Your memory problem isn't size - it's holes. Traditional allocation wastes 40% of your GPU memory on unusable gaps. PagedAttention isn't magic - it's just virtual memory for KV cache. "More memory" doesn't fix fragmentation. The allocation reality everyone misses: Contiguous allocation = Giant blocks that can't be split Non-contiguous allocation = Small blocks that fit anywhere Growing sequences = Constant reallocation and copying Finished sequences = Holes that new sequences can't use Memory layout drives the cost, not memory amount. "But what about cache size?" Interviewer: "How do you handle variable-length sequences efficiently?" Cache size without allocation strategy is meaningless. PagedAttention gives you near-zero fragmentation, but only if sequences of mixed lengths were your bottleneck. Fixed-size blocks only help when you need flexible allocation. The infrastructure framework that matters: > Long uniform sequences + Known lengths = Contiguous is fine > Mixed lengths + Dynamic batching = PagedAttention essential > Short sequences + High turnover = Block-based allocation wins > Rare long sequences + Mostly short = Fragmentation kills you > Match the allocation strategy to the workload pattern. The evolution path most deployments miss: Start: Simple contiguous allocation (easy to implement) Scale: Hit fragmentation issues (40% waste) Optimize: PagedAttention (near-zero fragmentation) Production: 5-6x throughput improvement on same hardware It's not about buying bigger GPUs - it's about using them better.The answer that gets you hired:"KV cache cost isn't about total memory. It's about allocation fragmentation. Contiguous blocks waste space through holes. PagedAttention uses virtual memory concepts - fixed-size blocks that can live anywhere. Pick based on your sequence length variance, not your memory budget."
8
39
497
45,817
You’re in a Machine Learning interview at Perplexity, and the interviewer asks: “Why do we need hybrid search? Isn’t vector search with embeddings enough?” Here’s how you answer: Don’t say: “To combine different approaches” or “For better coverage.” Too generic. The real answer is the semantic-lexical gap. Your embeddings understand meaning but ignore exact matches. Vector search alone misses the forest for the trees - or worse, the exact product code the user typed. Here’s why pure vector search fails: Your query is “iPhone 15 Pro Max 256GB.” Vector search returns “iPhone 15 Pro with lots of storage” and “latest flagship phone specs.” But the user wants EXACT model + EXACT capacity. Semantic understanding ≠ Precision matching. btw get this kinda content on your email for free, daily, subscribe to my newsletter -fullstackagents.substack.com… The retrieval failure modes are brutal: Pure vector search: > Query: “ML-2847 error code” → Returns: General ML troubleshooting (0% useful) > Query: “React 18.2.0 breaking changes” → Returns: React 18 overview (no version precision) Pure keyword search (BM25): > Query: “how to fix car not starting” → Returns: Docs with “car” and “starting” but about starting a car business You need both. Always. The performance gap across real benchmarks: - BM25 alone: 67% MRR@10 - Dense retrieval alone: 71% MRR@10 - Hybrid (proper fusion): 82% MRR@10 That’s 15% improvement over the “best” single method. In production, that’s thousands of better answers per day. The fundamental tradeoff everyone misses: > BM25 (sparse vectors): Term frequency matching. Perfect for exact keywords, acronyms, codes. Fails at synonyms. > Dense embeddings: Semantic similarity. Perfect for meaning, paraphrases. Fails at exact matches. This is why you can’t pick one. You need intelligent fusion. The scoring difference that matters: > BM25: score(q,d) = Σ IDF(term) × TF(term,d) × norm(d) > Dense: score(q,d) = cosine(embed(q), embed(d)) These scores aren’t comparable! BM25 gives 0-15, cosine gives 0.7-0.95. This is why naive averaging fails. You need score normalization. The fusion algorithms you must know: 1. Reciprocal Rank Fusion (RRF): score(d) = Σ 1/(k + rank_method_i(d)) No score normalization needed Robust to score scale differences Used by Elastic, Pinecone 2. Weighted combination: score(d) = α × norm(score_bm25) + (1-α) × norm(score_dense) Requires score normalization α typically 0.3-0.5 More control but more tuning “So how do you choose the hybrid ratio?” Interviewer leans in. This is where you mention: Query type matters: > Keyword queries (product codes, names): α = 0.7 (favor BM25) > Natural language questions: α = 0.3 (favor dense) > Hybrid queries (”best iPhone under $500”): α = 0.5 > Measure and tune on YOUR data. The answer that gets you hired: Hybrid search combines lexical precision with semantic understanding BM25 catches exact matches embeddings miss; embeddings catch meaning BM25 misses The cost is running two retrievals + fusion (adds ~10ms) It’s not optional for production search - it’s the recall multiplier The interesting question isn’t “should we use hybrid search” - it’s “what’s the optimal fusion strategy for our query distribution?” Use RRF? Simple but less control. Use weighted combo? More tuning but better fit. The answer: Start with RRF, measure the gap, upgrade if needed. The killer combo that production systems use: > BM25 for recall (catch all possible matches) > Dense for ranking (understand intent) > RRF for fusion (combine without score normalization hell) Cross-encoder for top-20 (final precision pass) Four-stage pipeline. Each stage does what it’s best at.
20
47
490
50,776
The first episode of the GSoC Series is LIVE! College Students who are planning to prepare for GSoC 2025 (and onwards), this video is for you! → My Google Summer of Code story is very unique in its way, I wasn't eligible to apply in the first two years of my college, in the third year I applied and got rejected and in the fourth year I finally made it. → Through this video, I am unveiling my strategy for how I got selected in the do-or-die situation. I have tried to guide you on how you can replicate the strategy! → This video is an excellent starting place if you are figuring out How to Contribute to Open Source! → If there is any specific thing you wanna know or whatever questions you have, feel free to drop them in the comments. → This is my first ever Video, so I would be grateful if you point out how can I improve my speaking or video editing skills through Direct Messages! → I have also published the video on YT, please check comments for more details
20
28
436
35,904
This is how I started Machine Learning in 2019. I was broke af. Borrowed my sister's laptop for BTech. Took education loan to pay the fee. Completed my undergrad in Electronics from NIT Warangal. Glory to Hanuman.
Replying to @himanshustwts
"From humble beginnings come great things" and I lived it. (This was me learning diffusion models three months back.) You can just do things, guys.
10
17
419
30,594
You're in a Machine Learning Interview at Perplexity, and the interviewer asks: "Why do we need rerankers in RAG? Isn't semantic search enough?" Here's how you answer:
15
31
438
38,595
You're in a Machine Learning interview at Google, and the interviewer asks: Why is scaling context length so hard? What's the fundamental bottleneck? Here's how you answer:
8
31
395
36,549
Totally mind blown up 🤯 @JuliaLanguage has a package called Measurements.jl which let's you add errors and uncertainty in datatypes Thank you @MoseGiordano
11
42
339
In one of my interviews for ML Engineer position, I was asked, “What is quantization and how does it help with LLM inference?” Here is how you can answer:
3
19
360
25,247
ML Community loved @karpathy 's introduction to Large Language Models. In the following blogpost, I think out loud about the implementation of Intelligent Operating Systems: The LLM OS. huggingface.co/blog/shivance… #MachineLearning #LLM #GPT #OperatingSystem #jarvis
6
54
342
43,837
meet your mutuals anon
9
327
26,554
Open Source will Change your Life Just published the second Video of the GSoC Series. You can watch it here on X or YT. This video is about getting started with Open Source and making the first contribution.
Done Recording the first video for the GSoC Series. Now it's time to edit ✂️📷 (pretty nervous and pretty excited as this is my debut in YT & X video content) If you have tips for editing please let me know in the comments. If you have questions about Google Summer of Code, let me know in the comments. I'll try to include it in the series ✨!
11
20
326
20,877
You’re in a Machine Learning interview at Groq, and the interviewer asks: “Why do we need prompt caching? Can’t we just resend the full context every time?” Here’s how you answer:
11
16
329
37,157
best reinforcement learning playlist there is
4
34
327
58,224
Building a Cold Email Generator using PydanticAI Agents and LLaMA3 via Groq This video is for you if you? 1. Are interested in building AI applications 2. Curious about AI Agents 3. Chill guy who loves to tinker with new stuff 4. A student, interested in learning AI 5. An indie hacker, trying to tap into AI development What's this video about? In this video, we build an end-to-end generative AI project from scratch. Problem Statement: You are an individual working in a headhunting company catering to startups and big tech. You want to automate the process of cold-emailing. Given the job post link, this video generates the email to be written to hiring manager. Outline 0:00 - 0:50 - Introduction 0:51 - 5:00 - Setting up @GroqInc 5:01 - 11:00 - @pydantic AI Hello World! 11:01 - 44:21 - Implementing your Agents (In-depth) 44:22 - 57:00 - Setting up HuggingFace @gradio project 57:01 - 1:13:11 - Building end-to-end application The project source code is available on GitHub and the video is available on YouTube as well! for more content like this follow @1smollcoder nitter.app/1smollcoder/status/186…
Building something cool with PydanticAI. Working on prototype now. Will create a long video and post it here on X as well as YT.
8
29
329
30,957
Techniques I'd master to learn Reinforcement Learning. Bookmark this 👇 1. Markov Decision Processes (MDPs) & Bellman equations 2. Value iteration & policy iteration algorithms 3. Q-learning & Deep Q-Networks (DQN) 4. Experience replay & target networks 5. Policy gradients & Reinforce algorithm 6. Actor-Critic methods (A2C, A3C) 7. Proximal Policy Optimization (PPO) 8. Trust Region Policy Optimization (TRPO) 9. Soft Actor-Critic (SAC) for continuous control 10. Twin Delayed DDPG (TD3) algorithm 11. Exploration strategies (epsilon-greedy, UCB, entropy) 12. Reward shaping & discount factor selection 13. Multi-armed bandits & contextual bandits 14. Monte Carlo methods & temporal difference learning 15. Function approximation & neural network policies 16. OpenAI Gym & custom environment design 17. Stable Baselines3 & RLlib frameworks 18. Model-based RL (Dyna-Q, World Models) 19. Imitation learning & behavioral cloning 20. Multi-agent RL & game theory basics
5
41
331
18,279
I coded Qwen3 from scratch recently. > be me > read the paper > implement the model > contribute to keras > see the impact
9
12
312
21,779
You’re in a Machine Learning interview at OpenAI, and the interviewer asks: “Why is everyone switching from RLHF to DPO? Isn’t RLHF the proven approach?” Here’s how you answer: Don’t say: “DPO is simpler” or “RLHF is too complex.” Too surface-level. The real answer is the reward model bottleneck. RLHF trains a separate reward model that becomes a noisy proxy for human preferences. DPO directly optimizes the policy on preference data. You’re eliminating the broken telephone. Here’s why RLHF is fundamentally flawed: Your training pipeline: Human preferences → Train reward model → Use RL to optimize policy against reward model. Problem: The reward model is trained on limited data (10k-100k comparisons), but then used to generate 1M+ training signals. It’s overconfident on out-of-distribution outputs. Reward model accuracy ≠ Alignment quality. btw subscribe to my newsletter to get these posts for free - fullstackagents.substack.com… The RLHF failure modes are brutal: > Reward hacking: Model finds adversarial outputs that score high but are gibberish > Mode collapse: Policy degenerates to only generate “safe” high-reward outputs > Reward model brittleness: 75% accuracy on test set → 100% confident predictions in RL > Training instability: PPO hyperparameters require black magic to converge You’re building a skyscraper on quicksand. One unstable component breaks everything. The complexity comparison: RLHF pipeline: - SFT on demonstrations (1 week) - Train reward model on preferences (2 days) - PPO training against reward model (1-2 weeks, often fails) - Extensive hyperparameter tuning (pray to the RL gods) DPO pipeline: - SFT on demonstrations (1 week) - Train directly on preferences (2 days) Done. No RL, no reward model. RLHF: 3+ weeks, unstable. DPO: 9 days, stable. The fundamental difference that matters: RLHF objective: > maximize E[reward_model(policy(x))] - β × KL(policy || base) > Requires RL (PPO/REINFORCE) > Reward model is separate neural net > Unstable optimization landscape DPO objective: > maximize log(σ(β × log(π(y_w|x)/π(y_l|x)/π_ref))) > Direct supervised learning on preferences > No reward model needed > Stable gradient descent That eliminated reward model changes everything. No more broken telephone. The performance gap that surprised everyone: MT-Bench scores (GPT-4 as judge): Llama 2 base: 4.2/10 Llama 2 + RLHF: 6.9/10 Llama 2 + DPO: 7.1/10 DPO beats RLHF despite being “just” supervised learning. The simplicity IS the feature.
11
26
319
33,045
Techniques I'd master to build great evals for AI apps. 1. LLM-as-Judge 2. Reference-based similarity metrics 3. Pairwise comparison tournaments 4. Human-in-the-loop evaluation 5. Synthetic data generation 6. Adversarial test case creation 7. Multi-dimensional rubrics 8. Regression testing on golden datasets 9. A/B testing with live traffic 10. Statistical significance testing 11. Evaluation dataset curation & versioning 12. Domain-specific benchmarks 13. Red teaming & jailbreak testing 14. Latency & cost monitoring 15. User feedback loops 16. Calibration & confidence scoring
10
40
304
15,778
You won't believe this is Bengaluru.
20
4
294
14,170
gm chat what are you cooking today?
11
1
291
20,851
over past week I've been studying RL environments deeply. a blog is coming up soon. i can say this for now, evals are good enough for LLMs, but for agents we need environments where it can learn with feedback. this blog will be mostly about writing environments with verifiers. @willccbb and @PrimeIntellect have been doing some very impactful work!
16
8
289
27,643
Super excited to announce that I've started working as AI Consultant for Google. I'll mainly be working with the Keras team, contributing to KerasHub.
28
5
280
14,724
You're in a ML Inference Engineer Interview at Meta, and the interviewer asks: "Why do we need KV cache? Can't we just recompute attention for every new token?" Here's how you answer:
7
14
280
26,869
I spent 2 weeks building an eval framework from scratch. Then I saw how Anthropic and OpenAI actually do evals. I did literally everything wrong. Here is what I learned.
13
17
283
35,964
Daily ML & CS GRIND > distributed databases > asynchronous replication > multi leader replication across multiple datacenters > designing data intensive applications 12/n
5
4
234
12,442
You're in a Research Scientist interview at Meta. The interviewer asks: "How would you implement speculative decoding to improve inference latency? What's the fundamental tradeoff?" You answer: "I'll use a smaller model to predict tokens ahead of time" Interview over. Here's what you missed:
9
14
247
28,407
Done Recording the first video for the GSoC Series. Now it's time to edit ✂️📷 (pretty nervous and pretty excited as this is my debut in YT & X video content) If you have tips for editing please let me know in the comments. If you have questions about Google Summer of Code, let me know in the comments. I'll try to include it in the series ✨!
Okay, guys, it's time to give back. Juniors preparing for GSoC 2025, I will create an ultimate video series on Google Summer of Code, on YT and X & I will unveil the strategy that led me to selection. Not only that, I am gonna tell you how can GSoC turn your life around and open your path for a successful career, if you do it the right way! If there is any specific thing you wanna know or whatever questions you have, feel free to drop them in the comments.
7
6
229
43,120
The Core Concept - Speculative decoding is an inference optimization that speeds up autoregressive generation without sacrificing quality. It uses two models working together – a small, fast 'draft model' that proposes multiple tokens ahead, and our main 'target model' that verifies these proposals in parallel.
4
15
241
37,122
Concepts I'd master to build production ML systems with PyTorch. Bookmark this 👇 1. TorchScript for model serialization 2. torch.compile for 2x speedups 3. Distributed training with DDP/FSDP 4. Mixed precision with torch.amp 5. Custom CUDA kernels with Triton 6. Model quantization (PTQ & QAT) 7. TorchServe for model deployment 8. Lightning for cleaner training loops 9. Dataset optimization with DataLoader workers 10. Profiling with torch.profiler 11. ONNX export for cross-platform inference 12. Gradient accumulation for large batches 13. Learning rate scheduling strategies 14. Model checkpointing & recovery 15. TorchVision/Audio/Text domain libraries 16. Integration with HuggingFace ecosystem
2
19
228
11,222
People often get confused between Self-Attention and Cross-Attention in transformers. Here’s the difference:
4
10
203
16,755
Presenting minbpe.c: a Pure C implementation of minbpe originally authored by Andrej Karpathy (@karpathy) in Python github.com/shivance/minbpe.c minbpe.c is a minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C in just a single file. it's been a while since I coded and C and my knowledge got rusty. (rust != @rustlang here xD). what better project than implementing a basic tokenizer to refresh it? Unfortunately, we don't have by default doc builder in C, unlike Rust, so I have implemented detailed docstrings, for ease of understanding of users. Byte Pair Encoding was introduced in 1994 and became a widely used tokenizing strategy in Natural Language Processing (a field of Machine Learning). anon, what are you building @_buildspace / @_nightsweekends ?
4
31
203
18,476
The job market for freshers & recent college grads in 2024 is TOUGH. And it is going to get HARDER in 2025. Moreover, applying through job portals is not helpful at all. In this video, I reveal the personal strategy that I used when I was in college: I hope my juniors find it useful.
13
13
202
22,486
Daily ML & CS GRIND > batch processing starters > unix as a simple batch processor - log analysis > Testing ml models > ML Unit testing 14/n (link to resources in comment)
4
3
168
9,986
Karpathy Senpai noticed . This day shall be remembered.
4
1
179
14,556
Perplexity Comet got me hooked, but I felt something was missing. So I built an Open Source AI browser from scratch! I am calling it Nebula and unlike any other browser out there it enables you to create apps as a tab! In this preview launch, I demonstrate: Building a tic-tac-toe game, excalidraw like drawing tool and a stock-portfolio tracker right from your search bar! In this approach, I bring GPT4o to browser and let it take control of chromium's V8 javascript engine. And this is quite contrary to what cursor / windsurf do - bringing AI to the IDE. Building apps should be easy, and I think that this is the way to do so. By the way the code for this preview launch is available on Github. I drove inspiration from @PerplexityComet and @browsercompany 's Dia.
13
12
218
37,909
whatsapp web down. time to touch some grass.
7
22
161
25,921
December 2023 - overweight & sick December 2024 - big guns & long runs
I did it. I finally did it. I ran 10k in Phonpe Midnight Marathon (my first ever). Road to 10k was long. I am so grateful to following people for unlocking this version of me: 1. @Naina_2728 for showing me what is possible with running, how fun it can be and how many new connections you can make through it. 2. @lucifer_x007 and @pathikghugare for pushing me always and staying with me for first 2.5k 3. @asmitaakamboj was like a guiding light i guess, seeing her active and posting her progress daily made me believe that consistency is possible. 4. @xerefic and @theadityadas for filling me in the information I needed to make the long run possible 5. Mahendra for encouragement I needed on the D-Day. Thank you soooo much and love you guys❤️
8
1
145
9,269
To the new beginnings ✨. Moving to Bangalore from Lucknow 🙃!
8
152
20,335
I wrote Qwen Mixture-of-Experts Model from scratch. Works with JAX, PyTorch and Tensorflow. It's Open Source. github.com/keras-team/keras-…
1
17
153
6,135
i have some personal news to share. our paper titled KerasCV and KerasNLP: Multi Framework Models has been finally published in Journal of Machine Learning Research. jmlr.org/papers/v25/24-0404.… here is to my journey - started as self learner in ML in 2019 - contributed to Open Source heavily over last two year - got a golden opportunity during google summer of code 2023 to contribute to keras 3 working closely with the team (@fchollet , @mattdangerw , @martin_gorner et. al) - contributed on fundamental building blocks of transformers when LLMs were not a cool thing - multi-backend implementation (Tensorflow, pytorch and jax agnostic) of llama2, gpt-neox and more. these became reference for gemma! - did all this in last semester, handling gsoc, the internship (through uni), final year project.. all simultaneously involving a lot of commute between Pune and Warangal. we made it guys! 🧿 thanks @penstrokes75 for all your support💪🏻
16
6
142
21,175
I'm actively looking for my next full time role. Over last 5-6 months I've been working as AI Consultant for Google, part-time. I quit my previous role as Machine Learning Engineer ~2.5 months back. While I took small break, I explored a lot of cool shit, wrote some blogs that I really enjoyed. You could find link to my portfolio in the comments
13
18
138
34,659
Replying to @UpTopAli
sed
1
131
25,160
I'm genius. Launching Falafel, a free open-source plugin for Davinci resolve powered by @FAL . Now you can generate videos from images right from your video editor. In this launch, only Wan 2.2 model is supported.
19
11
130
20,336
Got in a huge fight last night. She saw a dating app on my phone. She asked : "Why is that still here? Are you still looking for someone better?” I didn't have an answer. But now I do. I wasn't looking for someone better, I was just terrified of missing out on the best. This is the core dilemma of RL Agents: Exploration vs. Exploitation. It’s the hardest choice for any thinking thing, human or machine. Exploitation: Enjoying the great thing you have. (Deleting the app and being happy with my amazing girlfriend). Exploration: Searching for a potentially perfect option. (Keeping the app, just in case). You can't do both. This choice is the engine behind how AI learns. Think of it as the 90/10 Rule. Most of the time (90%), an AI will exploit. It sticks with the best choice it's found so far because it's a proven winner. But sometimes (10%), it explores. It takes a random chance on something new, just to gather more data and make sure it hasn't settled too early. This is how everything learns: - YouTube shows you a video from a creator you love (exploit), then suggests a totally new channel (explore). - Robots in a factory use the motion they know works (exploit), but will try a new one to see if it's faster (explore). - Doctors use a trusted treatment (exploit), while researchers test new drugs (explore). The magic is that the AI teaches itself how to balance this. It learns when to stop looking and commit to the best strategy it has found. It’s learning how to conquer the fear of missing out. So yeah, I guess I'm just an algorithm that’s afraid of settling for a good answer before it's found the great one. Anyway, I'm single now. My phone is fully dedicated to exploration.
10
10
134
12,982
I've waited for this email for one and a half year, Today I can't be happier to announce that I have been selected in the Google Summer of Code Program at @TensorFlow organization. #opensource #MachineLearning #Google #GSOC #GSoC2023
12
3
128
78,849
Retrieval Metrics (Did we get the right context?) 1️⃣ Contextual Relevancy: What % of retrieved chunks actually matter? 2️⃣ Contextual Recall: Did we retrieve ALL the info needed? 3️⃣ Contextual Precision: Are relevant chunks ranked higher than junk?
3
6
126
34,416
Replying to @co_foundr
it iz wat it iz
1
1
124
31,342
i left competitive programming for ML in 2021
2
1
118
6,713
The fundamental insight: RAG quality = Retriever Performance × Generator Performance If either component scores zero, your entire system fails. It's multiplication, not addition. You can't compensate for bad retrieval with a better LLM.
1
2
117
35,656
How It Works Technically The process runs in a loop: Draft model predicts the next K tokens Target model runs one forward pass to verify all K tokens in parallel We accept the longest prefix where both models agree Target model generates the next token after the accepted sequence Repeat with the extended sequence The key insight is that verification is much cheaper than generation when done in parallel.
4
6
117
27,276
love has a train/test split: dating is training data. candlelit dinners, best behavior, everyone performing their highest eval scores. you're both optimizing for the same metric: being chosen. but marriage is the test set. food poisoning at 3am. financial stress. grief that doesn't end. the third time you've had the same fight. boredom that stretches for months. the real eval isn't 'can you be lovable when you're trying?' it's 'who are you when the conditions are out-of-distribution?' when there's no audience. when you're not getting positive reinforcement. when the reward signal is delayed by years or maybe never comes. in ML, we do train/test split because models can just memorize. they can achieve perfect training accuracy by learning the dataset's quirks instead of the underlying pattern. people do this too. they learn to perform 'good partner' - the right words, the right gestures, the right conflict resolution scripts. perfect training accuracy. then life deploys them into production and everything breaks. because they memorized the pattern of being loved. they didn't learn how to love. the test set reveals what you actually optimized for. and you can't hack the test set. it's held-out for a reason. that person who's patient in month two of dating but cruel in year two of marriage? overfit to the training distribution. the one who's generous when dopamine is high but withholding when things are routine? memorized the easy examples. real love is generalization. it's performing well on data you've never seen. on versions of each other you didn't know existed. on challenges that weren't in the training set. and here's the thing: you can't know if love is real until you hit the test set. you have to deploy to production to find out what you actually built. maybe that's why they call it a leap of faith.
4
9
115
16,995
Finally published my first-ever video on YouTube. piped.video/UVAItmSkkvE?si=fhJM… Hack the GSoC: The Untold Strategy for Preparation and Selection in Google Summer of Code I hope it helps out College Students.
4
8
111
7,581
Daily ML Grind - 17/n > Started with RAG++ course > Learning about Advanced Retrieval Augmented Generation systems > Wandbot is amazing > Completed first module > 80/20 rule: no system is perfect, but excellence is attainable Resource: wandb.courses/courses/take/r… Thanks, instructors!
5
2
109
5,410
Looking to follow people doing cool work in ML - Inference, Eval, Training, Applied. Please recommend below 👇
14
3
113
11,464
such a comprehensive notebook on parallelism and distributed training by @A_K_Nain . kaggle.com/code/aakashnain/p… still don't understand how @kaggle algorithm can be this bad, promoting linear regression notebooks and supressing rich contents like this. disappointed.
2
12
106
11,635
Neural Networks Explained 🧵✨ 1/ What Are Neural Networks? Neural networks are the backbone of modern AI, mimicking the human brain to recognize patterns, make decisions, and learn from data. Whether it's powering voice assistants or driving cars, they're everywhere! 🚗💡 2/ The Building Blocks: Neurons & Layers At their core, neural networks consist of neurons organized in layers: Input Layer: Receives the initial data. Hidden Layers: Process and extract features. Output Layer: Produces the final result. 🔄🔍 3/ Understanding the Input Layer 📊 Imagine your network starts with inputs like x₁ and x₂. The raw data points fed into the system are represented as squares in the diagram. They're the foundation upon which everything builds! 4/ Diving into Hidden Layers Hidden layers are where the magic happens! Each neuron applies activation functions (labeled 'z; f') to transform inputs into meaningful patterns. Multiple blue and green nodes symbolize this complex processing. 🌀✨ 5/ Output Layer & Predictions After processing, the network delivers an output. A single purple node represents the prediction (y') in our diagram. This is the network's final answer based on the learned patterns. 🎯📈 6/ Forward Propagation Explained Data flows from left to right (shown by the coral red arrow) through the network layers, transforming input into output. This is called forward propagation – the first step in making predictions! ⬅️➡️ 7/ Backward Propagation: Learning from Mistakes To improve, the network adjusts itself by flowing information backward (pink arrow). This backward propagation helps minimize errors by tweaking the connections. 🔄📉 8/ Loss Function & Optimization Our diagram shows predictions vs. true values connected to a loss function, which calculates the error. The optimizer then adjusts the network to reduce this loss, refining accuracy over time. 📉🔧 9/ Iterative Training Process Training is an ongoing loop! Blue arrows indicate the iterative process of adjusting until the loss is minimized. It's like teaching the network to get better with each cycle. 🔁💪 10/ Real-World Applications From image recognition to natural language processing, neural networks are revolutionizing industries. Understanding their structure helps us harness their full potential! 🚀🌐 Thanks for following along! Want to dive deeper into neural networks? Drop your questions below! 👇✨
8
4
97
5,847
I did it. I finally did it. I ran 10k in Phonpe Midnight Marathon (my first ever). Road to 10k was long. I am so grateful to following people for unlocking this version of me: 1. @Naina_2728 for showing me what is possible with running, how fun it can be and how many new connections you can make through it. 2. @lucifer_x007 and @pathikghugare for pushing me always and staying with me for first 2.5k 3. @asmitaakamboj was like a guiding light i guess, seeing her active and posting her progress daily made me believe that consistency is possible. 4. @xerefic and @theadityadas for filling me in the information I needed to make the long run possible 5. Mahendra for encouragement I needed on the D-Day. Thank you soooo much and love you guys❤️
21
1
101
15,892
I'm so excited and ecstatic to announce that ... ( JMLR it is ) The paper that I co-authored with the Keras ( @fchollet et. al.) team at @Google , has been accepted in the prestigious Journal of Machine Learning Research ( @JmlrOrg ). My journey began with the incredible honor of being accepted into the Google Summer of Code program, a turning point that led me to become a dedicated, long-time contributor to Keras. I was fortunate to have the opportunity to shape the evolution of multi-backend Keras3, contributing while it was still a work in progress. Now, as I reflect on the countless model backbones, pipelines, and tutorials I helped build, a wave of nostalgia washes over me, reminding me of the profound impact those moments have had on my path.
HUGE NEWS INCOMING
10
101
17,425
Graduated from @warangal_nit in Class of '23. #Graduation #convocation
4
1
95
3,958
Most candidates say "check accuracy" or "run more tests." Wrong approach. RAG systems fail at TWO distinct stages, and you need different metrics for each. Generic accuracy won't tell you WHERE the problem is.
1
1
98
36,698
Replying to @infinit3e_
Good
2
94
15,866
Why We Need It? Traditional LLM inference has two major bottlenecks. First, we generate tokens sequentially – each token requires a full forward pass before we can start the next one. Second, this leaves our GPUs underutilized since we can't parallelize future token computation. With speculative decoding, when the target model verifies multiple draft tokens simultaneously, we're making much better use of our compute resources.
2
6
97
33,656