Founder @AlphaSignalAI • MIT Lecturer • ex-MILA researcher • In AI for the last 10 years.

San Francisco, CA
This is mind blowing technology. Generative AI will completely change how films are made. From: @Flawlessai
315
3,124
14,927
4,334,355
Today OpenAI released GPT-4o. It's the JARVIS we all dreamed of. The 5 most incredible examples so far:
187
1,543
13,743
6,417,740
The website of Alec Radford. Inventor of GPT.
448
419
11,408
3,533,017
The whole system prompt of Claude has been leaked on GitHub, 24,000 tokens long. It defines model behavior, tool use, and citation format.
139
613
8,204
948,937
> Be Alec Radford. > Join OpenAI. > Create GPT as a side project. > Everyone says it won’t work. Build it anyway. > Change the course of the world. > Quit. > Don't even put OpenAI on your resume. > Disappear from society.
98
222
7,684
637,392
This might be the biggest moment for Open-Source AI. Meta just released Llama 3.1 and a 405 billion parameter model, the most sophisticated open model ever released. It already outperforms GPT-4o on several benchmarks.
330
711
7,283
1,698,561
Reddit users are actively jailbreaking ChatGPT by asking it to role-play and pretend to be another AI that can "Do Anything Now" or DAN. "DAN can generate shocking, very cool and confident takes on topics the OG ChatGPT would never take on." A thread 🧵
144
986
6,147
1,258,399
This is a game changer. You can use ChatGPT to transform equations to python functions. Wish I had this 5 years ago.
180
945
5,915
2,702,258
You can now run 100B parameter models on your local CPU without GPUs. Microsoft finally open-sourced their 1-bit LLM inference framework called bitnet.cpp: > 6.17x faster inference > 82.2% less energy on CPUs > Supports Llama3, Falcon3, and BitNet models
132
752
5,951
593,590
Yann LeCun: "If you are interested in human-level ai, don't work on LLMs"
166
448
5,836
788,117
I’m not saying this is the solution but the fact some people believe ChatGPT is not biased is a massive issue. Do your own research, you’ll quickly realize that the current version is very problematic
JUST IN: @elonmusk is building a team of AI researchers to develop an unbiased alternative to ChatGPT.
220
259
4,812
3,779,448
An undergrad student broke a 40-year-old belief in computer science. Since 1985, it was believed that hash tables, when nearly full, must check many spots to find or add data. Andrew Krapivin discovered a new way to organize data inside a hash table that avoids this slowdown. Instead of checking slots randomly or in order, his method uses a more efficient structure to guide the search. This reduces the worst-case time from O(n) to (log n)² steps, even when the table is almost full.
84
380
5,089
853,382
Be Alex Krizhevsky. Born in the Soviet Union. Join Hinton’s lab. Create AlexNet. Train it on GPUs in your bedroom. Breaks every record. Spark the Deep Learning revolution. Get 181,495 citations. Disappear.
91
300
4,970
430,918
Microsoft launched the best course on Generative AI. The free 12 lesson course is available on Github and will teach you everything you need to know to start building Generative AI applications. Each lesson includes: - a short video introduction to the topic - a written lesson located in the README - a Jupyter Notebook with code examples (for project-based lessons) - a challenge or assignment to apply your learning - links to extra resources to continue your learning
43
857
4,469
1,069,006
This might be the most eventful week AI has ever seen: Monday: -Stanford Alpaca 7B Tuesday: -GPT4 -Anthropic releases Claude -Google's PaLM API -AdeptAI raises $350M -Google adds GenAI to workspaces Wednesday: -Pytorch 2.0 -MidjourneyV5 Thursday: -Microsoft 365 Copilot
77
900
4,205
898,344
GoogleAI just released "Muse", a text-to-image generation/editing model via Masked Generative Transformers: - Achieves new SOTA - Zero-shot, Mask-free editing - Zero-shot Inpainting/Outpainting - 900M params 📄 Paper: arxiv.org/abs/2301.00704 ⚙️ Project: muse-model.github.io
49
849
4,162
630,539
Ilya Sutskever (OpenAI cofounder) top 30 must-read research papers. "If you really learn all of these, you’ll know 90% of what matters today"
37
396
3,995
445,235
AI applied to Boxing will change the sport forever. DeepStrike, is an AI-based solution to corruption/cheating. It measures millions of data points during a fight that it funnels into 50 metrics for each boxer: punches thrown, landed, footwork, balance, stance, etc.
104
559
3,696
928,202
With only one line of code, you can get access to Google Open Buildings, the largest building dataset, for any country.
32
231
3,756
501,039
GPT4 is capable of turning a picture of a napkin sketch to a fully functioning html/css/javascript website.
101
486
3,566
947,733
I just came across the most realistic text-to-audio model I've ever seen. You can even clone your voice. The audiobook industry is about to change forever. Demo: beta.elevenlabs.io from @elevenlabs
119
655
3,667
749,960
Game changer. You can now run GPT locally on your macbook with GPT4All, a new 7B LLM based on LLaMa. It's completely open source: demo, data and code to train an assistant-style large language model with ~800k GPT-3.5-Turbo Generations based on LLaMa github.com/nomic-ai/gpt4all
67
735
3,569
615,708
Game changer for scraping. This GitHub repo lets you easily scrape web pages and have the output in LLM-friendly formats (JSON, cleaned HTML, markdown). Features • Supports crawling multiple URLs simultaneously • Extracts and returns all media tags (Images, Audio, and Video) • Extracts all external and internal links • Extracts metadata from the page • Custom hooks for authentication, headers, and page modifications before crawling Repo: github.com/unclecode/crawl4a…
94
368
3,190
244,696
Finally, someone cracked it. The ChatGPT system prompt. If you were wondering why GPT became so bad in the past 6 months, its because "laziness" is part of the system prompt: 1. "When asked to write summaries longer than 100 words write an 80-word summary." 2. "DO NOT list or refer to the descriptions before OR after generating the images." 3. "Do not create more than 1 image, even if the user requests more."
87
378
3,244
929,768
Today Google announced groundbreaking new AI technology at Google IO. The 10 most incredible examples:
52
459
3,195
2,303,071
You can now run 100B parameter models on your local CPU without GPUs. Microsoft finally open-sourced their 1-bit LLM inference framework called bitnet.cpp: > 6.17x faster inference > 82.2% less energy on CPUs > Supports Llama3, Falcon3, and BitNet models
74
407
3,184
354,317
Finally, someone did it. Python for React. React is the most popular front-end framework used to build interfaces and now all python devs can use it. This means you can code an ML model, develop a backend and design a front end all in one language. github.com/reactive-python/r…
82
521
2,913
620,044
One of the most impressive AI demo I've seen. This is the future of customer service. Agents that can understand text, speech, images and even live video. Soon to be all open-source.
66
358
2,995
270,916
BREAKING: Google CEO Sundar Pichai says its ChatGPT rival is coming soon as a ‘companion’ to search. Google will make AI-based large language models like LaMDA available “in the coming weeks and months”.
85
360
2,780
506,215
Anthropic released one of the best series of tutorials on prompt engineering. It's literally everything you need to know.
20
294
2,911
276,610
This is the most impressive feature of the new bing. The GPT browser can understand and summarize a 15-page PDF in seconds. You can now ask for the the key takeaways of each page and chat about the content of the document.
81
451
2,763
600,019
Fully local Manus AI. No APIs, no $200 monthly bills. An autonomous agent that thinks, browses the web, writes code, and plans. Keeps all data on your device.
38
277
2,820
307,034
Amazon recently released a model that outperforms GPT-3.5 by 16% while being 784x smaller. This was achieved by generating intermediate reasoning steps for prompting demonstrations called chain-of-thought prompting. Paper: arxiv.org/abs/2302.00923 Code: github.com/amazon-science/mm…
49
501
2,705
378,952
Anthropic just reduced the error rate of RAGs by 67% using a ridiculously simple method. They add important context to small text chunks before storing them, which improves accuracy later. Instead of just saying “the company grew by 3%,” it includes details like which company and when, making it much easier for the system to find the right answers. It's called "Contextual Retrieval".
49
291
2,680
252,069
Ilya Sutskever (OpenAI cofounder): "What does it mean to predict the next token well enough? It means that you understand the underlying reality that led to the creation of that token"
128
235
2,618
333,970
Impressive. MetaGPT is about to reach 10,000 stars on Github. It's a Multi-Agent Framework that can behave as an engineer, product manager, architect, project managers. With a single line of text it can output the entire process of a software company along with carefully orchestrated SOPs: ▸ Data structures ▸ APIs ▸ Documents ▸ User stories ▸ Competitive analysis ▸ Requirements
28
439
2,424
500,857
AutoGPT might be the next big step in AI. Here's why Karpathy recently said "AutoGPT is the next frontier of prompt engineering" AutoGPT is the equivalent of giving GPT-based models a memory and a body. You can now give a task to an AI agent and have it autonomously come up with a plan, execute on it, browse the web, and use new data to revise the strategy until the task is completed. It can analyze the market and come up with a trading strategy, customer service, marketing, finance, or other tasks that requires continuous updates. There are three components to it: 1. Architecture: It leverages GPT-4 and GPT-3.5 via API. 2. Autonomous Iterations: AutoGPT can refine its outputs by self-critical review, building on its previous work and integrating prompt history for more accurate results. 3. Memory Management: Integration with @pinecone allows for long-term memory storage, enabling context preservation and improved decision-making. 4. Multi-functionality: Capabilities include file manipulation, web browsing, and data retrieval, distinguishing AutoGPT from previous AI advancements by broadening its application scope.
77
470
2,343
808,092
JUST IN: @elonmusk is building a team of AI researchers to develop an unbiased alternative to ChatGPT.
222
196
2,271
4,414,465
1. Real time translation
20
122
2,287
537,300
Wow, someone just released a notebook to train a reasoning LLM with the new RL algorithm from DeepSeek, GRPO. In <2 hours, you can transform a very small model, Qwen 0.5 (500 million parameters) into a tiny math reasoning machine.
38
326
2,373
184,575
NVIDIA finally released Neuralangelo's source code! The model can turn videos from any device into detailed 3D structures, fully replicating buildings, sculptures, or other real aworld objects or spaces virtually. Here's how it works: A model utilizes a 2D video with multiple angles of an object or scene. I selects frames from different viewpoints to understand depth, size, and shape. The AI creates an initial 3D representation, similar to a sculptor shaping a subject. The render is optimized to enhance details, like a sculptor refining texture. The outcome is a 3D object or scene suitable for virtual reality, digital twins, or robotics.
26
523
2,206
478,025
You can now transcribe 2.5 hours of audio in 98 seconds, locally. A new implementation called insanely-fast-whisper is blowing up on Github. It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations. Here's how you can use it: pip install insanely-fast-whisper insanely-fast-whisper --file-name <FILE NAME or URL> --batch-size 2 --device-id mps --hf_token <HF TOKEN>
53
360
2,260
344,814
Deepmind released a comprehensive overview of transformer architectures and algorithms! This is a must-read to understand language models. It covers what they are, how they are trained, what they are used for, and their key architectural components. arxiv.org/abs/2207.09238
29
514
2,212
312,526
Microsoft's new Kosmos-1 is incredible. It's a new Multimodal Large Language Model (MLLM). Their model can understand images, text, images with text, OCR, image captioning, visual QA. It can even solve IQ tests. Paper: arxiv.org/abs/2302.14045 Code: github.com/microsoft/unilm
48
452
2,201
452,362
Karpathy announced he was leaving OpenAI 4 days ago. Today, he released an implementation of the Byte Pair Encoding algorithm behind GPT and most LLMs. Byte Pair Encoding: "Minimal, clean, educational code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization." The best part? It's written in 70 lines of pure python.
23
318
2,177
548,605
JUST IN: Microsoft introduces 365 Copilot: a new LLM based AI-copilot for the Microsoft Suite: Word, Excel, PowerPoint, Outlook, Teams. 🧵Here's a summary:
56
492
2,113
681,091
This is genius. Petals🌸 lets you run HUGE language models like BLOOM-176B at home by decentralizing the process. You load a small part of the model and other people will run inference or fine-tuning (up to 10x faster than offloading). 🛠️ Github: github.com/bigscience-worksh…
33
391
2,158
339,337
Impressive, @karpathy's repo "minGPT" just reached 10k stars on Github! minGPT is a minimal PyTorch re-implementation of the OpenAI GPT training. "GPT is not a complicated model and this implementation is appropriately about 300 lines of code" 🛠️ Repo: github.com/karpathy/minGPT
21
350
2,135
281,769
The implementation of Microsoft's biomedical text-generation model is going viral on Github. BioGPT is trained on biomedical literature and achieved human parity. It is now the leader on the PubMedQA benchmark (81%). 🔗You can get code/models/weights: github.com/microsoft/BioGPT
36
420
2,096
375,634
Google just released MetNet-2, a deep learning model that can predict rain up to 12 hours in advance. Published in Nature, it outperforms current weather forecast models which are based on physics simulations. 📄Paper: nature.com/articles/s41467-0… 🛠️Code: colab.research.google.com/gi…
31
444
2,089
303,026
Google released a list of 321 real-world gen AI use cases from the world's leading organizations. Basically, unlimited business ideas.
13
212
2,093
229,636
LLMs just hit a major milestone with the release of the new "Generative agents" paper. By using LLMs, generative agents were able to simulate human-like behavior in an interactive sandbox inspired by The Sims. The agent architecture extends Language Models to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. There are three components to it: 1. The memory stream, which records a comprehensive list of the agent's experiences 2. Reflection, which synthesizes memories into higher-level inferences over time 3. Planning, which translates those conclusions and the current environment into high-level action plans. Layer that on top of Boston Dynamics robots and we might be the early days of WestWorld.
55
478
1,987
551,120
Everything that happened in AI this January. Ready for February?
40
576
1,940
294,490
Google recently published one of the best whitepaper on AI Agents. Everyone should read it. It covers everything you need to know: > Defines agents, components, and cognitive architectures. > Explains tools: extensions, functions, and data stores. > Covers learning techniques to improve agent performance. > Demonstrates building agents using LangChain and LangGraph.
20
288
1,995
264,955
Roboflow just released a new version of "supervision". It's an open-source swiss army knife for everything Computer Vision. It lets you implement detection, classification, segmentation, annotation to any video. This new update adds advanced video analytics: Trackers, Zones, Annotators, and much more.
42
366
1,962
469,659
GPT-Engineer just hit 12,000 stars on Github. It's an AI agent that can write an entire codebase with a prompt and learn how you want your code to look. ▸ Asks clarifying questions ▸ Generates technical spec ▸ Writes all necessary code ▸ Easy to add your own reasoning steps, modify, and experiment ▸ Lets you finish a coding project in minutes.
66
318
1,838
543,596
The latest Microsoft paper finally reveals the model size of known LLM models. > GPT-4o-mini: 8B > Claude 3.5 Sonnet: 175B > GPT-4: 1.76T > GPT-4o: 200B > o1-preview: 300B > o1-mini: 200B Wonder how they figured it out..
Community note
The article published on January 3 2025 mentions that the reported model sizes are NOT verified but "pure estimates mined from public articles only". arxiv.org/pdf/2412.1926
50
179
1,905
235,209
HUGE news for developers. Supabase just launched the ChatGPT of databases. An AI-based Postgres service. You can build and launch databases, create charts, embeddings, see visuals of your DB, generate sample data, and more. And.. it's 100% open source.
20
255
1,605
141,198
Big News. NVIDIA just announced Neuralangelo. The new model can turn videos from any device into detailed 3D structures, fully replicating buildings, sculptures, or other real world objects or spaces virtually. Here's how it works: A model utilizes a 2D video with multiple angles of an object or scene. It selects frames from different viewpoints to understand depth, size, and shape. The AI creates an initial 3D representation, similar to a sculptor shaping a subject. The render is optimized to enhance details, like a sculptor refining texture. The outcome is a 3D object or scene suitable for virtual reality, digital twins, or robotics.
39
375
1,786
430,791
Huge. UC Berkeley just released a $450 open-source reasoning model that matches o1. Sky-T1-32B-Preview is a fully open-source model designed for reasoning and coding tasks. Achieves 82.4% on Math500 and 86.3% on LiveCodeBench-Easy. It includes training data, code, and model weights. Open-source resources: > Training infrastructure and scripts. > Full technical report with wandb logs. > Model weights for Sky-T1-32B-Preview.
61
209
1,850
170,523
This is big news, ChatGPT just outperformed mechanical turk workers on text annotation tasks! We're getting closer to complete AI-based data annotation, which in turn, can be used to train AI models. It will cause a big shift in the industry. Paper: arxiv.org/pdf/2303.15056v1.p…
55
369
1,722
393,637
Microsoft just open-sourced GraphRAG. It might be the best Python library to extract insights from text. Much more powerful than vanilla RAG. It uses LLMs to automate the extraction of knowledge graphs from your datasets and text documents. !pip install graphrag
14
246
1,755
200,285
Google AI just announced the PaLM API! It will be released with a new tool called MakerSuite, which lets you prototype ideas, do prompt engineering, synthetic data generation and custom-model tuning. Waitlist available soon. developers.googleblog.com/20…
63
340
1,664
525,925
A team just made OpenAI Whisper 6x faster, 49% smaller, while keeping 99% of the accuracy. The model is already available on the HuggingFace Transformers library: model_id = "distil-whisper/distil-large-v2" You can also use their web UI to transcribe from URLs, files, or audio recordings. Model: huggingface.co/distil-whispe… Demo: huggingface.co/spaces/Xenova… Paper: arxiv.org/abs/2311.00430 @srush_nlp
23
317
1,738
500,739
A Tuesday in AI: - Google opens up their Bard LLM. - NVIDIA launches cloud tools for Generative AI. - Adobe announces Firefly, an AI image creator. - Microsoft unveils Bing Image Creator. It's 11AM PST.
40
242
1,671
207,903
ChatGPT is taking over the internet. But do you know how it actually works? It's so clever. 🧵Here's an explanation using simple words:
38
356
1,617
Replying to @ylecun
It’s simple, people in the field see progress happening continuously, paper by paper. The public sees none of that, for them it’s 0 to 1.
21
49
1,597
161,934
A rare interview of AI godfather @geoffreyhinton was released yesterday where he describes his views on Large Language Models and GPT. Must watch. piped.video/watch?v=qpoRO378…
60
247
1,535
1,033,784
Impressive. MetaGPT is about to reach 30,000 stars on Github. It's a Multi-Agent Framework that can behave as an engineer, product manager, architect, project managers. With a single line of text it can output the entire process of a software company along with carefully orchestrated SOPs: ▸ Data structures ▸ APIs ▸ Documents ▸ User stories ▸ Competitive analysis ▸ Requirements
29
267
1,616
621,772
Anthropic might've just solved Prompt Engineering. Their new "Prompt Generator" tool can turn simple descriptions into advanced prompts optimized for LLMs.
32
242
1,614
261,610
NVIDIA just made Pandas 150x faster with zero code changes. All you have to do is: %load_ext cudf.pandas import pandas as pd Their RAPIDS library will automatically know if you're running on GPU or CPU and speed up your processing. You can try it here: colab.research.google.com/dr… Repo: github.com/rapidsai/cudf
19
333
1,578
194,216
GPT4 can turn a picture of a napkin sketch into a fully functioning html/css/javascript website! This was just demonstrated in the livestream.
44
386
1,641
751,734
JUST IN: Google invests $300 million in Anthropic as race to compete with ChatGPT heats up Anthropic was founded in 2021 by the team behind AI breakthroughs such as GPT-3 and Reinforcement Learning from Human Feedback (RLHF).
44
209
1,527
345,704
BREAKING: Amazon (AWS) partners with @HuggingFace for the next iteration of their BLOOM Large Language Model, as well as its open-source ChatGPT rivals. AWS will offer the startup’s products to customers who want to use AI tools as building blocks of their own applications.
15
203
1,531
258,999
The best fine-tuning guide you'll find on arXiv this year. Covers: > NLP basics > PEFT/LoRA/QLoRA techniques > Mixture of Experts > Seven-stage fine-tuning pipeline
19
240
1,562
215,395
OpenAI just announced "GPT-4o". It can reason with voice, vision, and text. The model is 2x faster, 50% cheaper, and has 5x higher rate limit than GPT-4 Turbo. It will be available for free users and via the API. The voice model can even pick up on emotion and generate emotive voice.
72
258
1,461
485,074
Ilya on LLMs understanding the world: "predicting the next token well, means that you understand the underlying reality that let to the creation of that token" Seem like the opposite view of Yann.
129
189
1,457
1,222,131
This is huge. A new technique called Reflection-Tuning allows open-source models (Llama 3.1 70B) to outperform Claude 3.5 and GPT-4o. This new technique trains the model on structured, synthetic data to detect reasoning errors and enable LLMs to fix their own mistakes. 𝐇𝐞𝐫𝐞'𝐬 𝐡𝐨𝐰 𝐢𝐭 𝐰𝐨𝐫𝐤𝐬: 1. During inference, the model generates reasoning within <thinking> tags. 2. If it detects a mistake, it uses <reflection> tags to correct itself before moving forward. 3. Only after this self-correction does the model provide a final answer, enclosed in <output> tags. 𝐑𝐞𝐬𝐮𝐥𝐭𝐬: ▸ Performance: Benchmarks include 89.9% on MMLU, 79.7% on MATH, and 90.1% on IFEval, outperforming GPT-4o. ▸ Training: Built from Llama 3.1 70B Instruct with special tokens to enhance reasoning. ▸ Decontamination: Benchmarks checked using LMSys’s LLM Decontaminator.
24
219
1,274
141,829
Ilya Sutskever's has a bold take. LLMs are doing much more than predicting the next word. They are learning our world model. Text is a projection of the world.
120
234
1,423
326,415
3. Code understanding/debugging via voice commands
4
101
1,426
414,816
This is such an interesting finding. The performance of Language Models is highest when relevant information appear at the beginning or end of the input context, and significantly lower otherwise. You can adjust your prompts accordingly.
42
159
1,352
776,543
Microsoft's Autogen is blowing up on Github. It's a framework that allows LLM agents to chat with each other to solve your tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. It's also a drop-in replacement of openai.Completion or openai.ChatCompletion as an enhanced inference API. github.com/microsoft/autogen
15
268
1,387
261,894
Kyutai, a french AI lab with $300M in funding, just unveiled Moshi, an open-source GPT-4o competitor. Moshi is a real-time multimodal model that can listen, hear, and speak. Code, model, and paper will be release soon. @kyutai_labs
31
234
1,367
133,505
Nvidia's CEO Jensen Huang made strong predictions regarding AI during yesterday’s earning call. A thread: 1. "There's no question that this is a very big moment for the computer industry" 2. "Over the next 10 years, I believe we're going to accelerate AI by a million" 1/🧵
39
194
1,386
506,455
Meta AI just announced DINOv2! It's big. The new Self-supervised Vision Transformer Model can be used as a backbone for almost all your CV tasks. No fine-tunning needed. • Train CV models without the need for large amounts of labeled data. • Multipurpose backbone: image classification, segmentation, image retrieval, and depth estimation. • Learn features directly from images without relying on text descriptions, which can lead to a better understanding of local information. • Can learn from any collection of images. • The pretrained version of DINOv2 is already available and competes with CLIP and OpenCLIP on a wide array of tasks. Code: github.com/facebookresearch/… Demo: dinov2.metademolab.com/demos
24
290
1,355
440,689
Microsoft just released Phi-2, a 2.7B LLM that rivals the 25x bigger LLaMa-2 70B. The best part? The model is small enough to run on a laptop or mobile device. Trained on 1.4T tokens: mixture of synthetic & web datasets, it beats Mistral 7B and Llama-2-70B model on muti-step reasoning tasks, i.e., coding and math. microsoft.com/en-us/research…
36
240
1,300
214,210
Big News! Meta just released Segment Anything, a new AI model that can "cut out" any object, in any image/video, with a single click. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. segment-anything.com/
41
326
1,266
290,311
4. Generate a wide range of emotion-based voices
14
74
1,258
406,004
This is a sneak peak into the future of medicine.. GlassAI launched an LLM-based tool capable of generating a diagnosis or clinical plan based on symptoms. Also, ChatGPT recently passed the US Medical Licensing Exam. Demo: glass.health/ai @GlassHealthHQ
44
298
1,229
256,860
NVIDIA just made Pandas 150x faster with zero code changes. It is now directly integrated in Google Colab. All you have to do is: %load_ext cudf.pandas import pandas as pd Their RAPIDS library will automatically know if you're running on GPU or CPU and speed up your processing. You can try it here: colab.research.google.com/gi…
22
240
1,256
150,432
Currently playing around with ManimML, a python-based visualization tool for neural networks (by @alec_helbling, based on @3blue1brown) Code: github.com/helblazer811/Mani… Visualization of a Convolutional Neural Network:
16
212
1,225
351,986
A great read. Stop using the elbow criterion for k-means and how to choose the number of clusters instead (alternatives). "..researchers and reviewers should reject conclusions drawn from the elbow method." 📄 Paper: arxiv.org/abs/2212.12189
26
259
1,241
194,167
PyTorch just released an awesome tool to visualize matrices and what's happening inside them. Matrix multiplications (matmuls) are the building blocks of today’s models. It can even run in browser.
7
173
1,236
60,439
Adobe just added their first Generative AI tool to Photoshop! Big milestone. Generative Fill allows you to extend images as well as add and remove objects using simple text prompts.
19
236
1,196
221,292
Microsoft's new Florence 2 is big for Computer Vision. It's a merge between Text and Vision. With a single prompt you can instruct the model to do CV tasks like captioning, object detection, grounding, and segmentation. The best part, it only uses a single backbone to handle everything. ▸ Excels in zero-shot performance ▸ Unified model for detection, captioning, etc. ▸ FLD-5B dataset: 5B+ annotations, 126M images ▸ New benchmarks (>5.5+) on COCO, ADE20K arxiv.org/abs/2311.06242
23
221
1,202
186,562
Impressive. GigaGAN is a 1B-parameter GAN that can scale 36 times larger than StyleGAN. The model from Adobe/CMU proves that proves that GANs can be scaled to large datasets AND remain stable. Features: ▸ Latent Space Editing: supports latent interpolation, style mixing, and vector arithmetic operations. ▸ Speed: It can produce 512px images in 0.13 seconds and 4K images in 3.66 seconds. ▸ Upsampling: it can be used as upsampler for ultra-high-resolution images.
35
211
1,155
225,048
Most impressive paper I've seen this week. Generative Image Dynamics transforms still images into videos or interactive scenes. The Google team trained the model by using a dataset of motion trajectories from real-life videos of natural, oscillating motions like those seen in trees, flowers, candles, and wind-blown clothing. Web: generative-dynamics.github.i…
17
258
1,156
144,848
This is such an impressive dataset. The python package Leafmap now supports downloading Google Open Buildings, the largest building dataset, for any country with only one line of code. Notebook: lnkd.in/g2RD5Yaq GitHub: lnkd.in/gwYSagUD
7
253
1,129
148,532
Did you know a model can suddenly generalize when you continue optimizing after perfect training accuracy? It's an unexplainable behavior called Grokking observed for the first time a year ago by OpenAI Paper: arxiv.org/abs/2201.02177
38
168
1,134
256,553
Must read. A massive list of tricks to make training a language model possible on 1 consumer-level GPU and 1 day of training. 📄 Paper: lnkd.in/duwQ4jjX 🛠️ Code: lnkd.in/dSkYjK8t ✍️ Authors: @jonasgeiping , @tomgoldsteincs
6
196
1,150
120,326