We deploy robots to solve real world problems. Get in touch: menlo.ai/talk

🍓 Ichigo-llama3.1: Local Real-Time Voice AI We bring 2 key improvements to Ichigo: - It can talk back (Yes!) - It recognizes when it can't comprehend input You can now run this little strawberry on your device! Demo on a single @nvidia 3090 GPU. 1/10
17
103
593
67,189
Meet Jan-nano, a 4B model that outscores DeepSeek-v3-671B using MCP. It's built on Qwen3-4B with DAPO fine-tuning, it handles: - real-time web search - deep research Model + GGUF: huggingface.co/collections/M… To run it locally: - Install Jan Beta: jan.ai/docs/desktop/beta - Download Jan-nano in Jan Hub - Settings -> MCP, enable MCP and add your Serper API key for web tools Full technical report will be published shortly.
12
75
435
56,609
Introducing Lucy: 1.7B model that Google for you It's an agentic‑search model that can even run on your phone. - Agentic search on tap - Lucy calls tools (<think></think>‑aware) - Fits in your pocket - runs on CPU or mobile Under the hood: - Built on @Alibaba_Qwen's Qwen3‑1.7B - Smooth multi‑category rewards replace brittle if‑else scoring - Task‑vector RLVR optimizes the "thinking" tag for targeted search moves. Benchmarks: - SimpleQA + MCP = 78.3 - Close to Jan‑Nano-4B (80.7) Run locally: - Demo uses vLLM in Jan - You can spin it up with @ggerganov's llama.cpp or vLLM Models on @huggingface: - Lucy 1.7B 40k: huggingface.co/Menlo/Lucy - Lucy 1.7B 128K: huggingface.co/Menlo/Lucy-12…
10
60
367
20,521
AlphaMaze: Teaching LLMs to think visually We're excited to share our findings—now live in a blog, paper, and open-source release. Language models are good at words—but what about visual-spatial reasoning? AlphaMaze is a decoder-only model trained to think visually by solving maze puzzles—tasks humans find intuitive but LLMs typically struggle with. It's a decoder-only language model that improves spatial intelligence through GRPO. We trained it in two steps: 1) Supervised Fine-Tuning (SFT): Taught it to predict each move. 2) Group Relative Policy Optimization (GRPO): Improved decision-making and reasoning. Here's how it performed: - Baseline model: couldn't solve the maze - After SFT: 86% accuracy - With GRPO: 93% accuracy and stronger self-correction If a model can "see" spatial relationships from text, it opens up possibilities for robotics, navigation, and other tasks that need visual and sequential reasoning. - GitHub: github.com/janhq/visual-thin… - Model (@huggingface): huggingface.co/homebrewltd/A… - Paper: arxiv.org/abs/2502.14669 - Demo: alphamaze.menlo.ai/ - Blog: homebrew.ltd/blog/alpha-maze We're opening this up for collaboration - join us in exploring new ways to improve visual reasoning in language models: discord.gg/Exe46xPMbK Shoutout to @UnslothAI for efficient training tools, @deepseek_ai and @Alibaba_Qwen for the base models, MVoT for inspiring visual reasoning approaches, and Understanding Search for the maze-dataset framework!
12
63
321
22,883
Meet 🍓 Mini Ichigo: Ichigo-llama3.5-3B-s Mini Ichigo is a local real-time voice AI built on @AIatMeta's Llama3.5-3B. It scored 59.61 on MMLU, close to larger models like llama3.5-instruct-8b (69.40), and held up well in AudioBench with 2.58 on Open-hermes and 2.07 on Alpaca. Details on 🤗 @huggingface: huggingface.co/homebrewltd/m…
6
31
190
16,141
Control AI with your voice. The upcoming Llama3-s checkpoint introduces voice-based function calling. It's an early-fusion model built to give @AIatMeta's Llama 3.1 model listening capabilities. Coming soon to @huggingface & @Gradio.
3
38
159
19,840
Our homebrewed early-fusion speech model now has a name and a voice. Say hi to 🍓 Ichigo, the local real-time voice AI. Next up: You can try Ichigo soon on @huggingface & @Gradio.
8
23
152
17,128
ReZero: A small model that learns to search - it never gives up 🔥 ReZero trains with synthetic search engines that force the model to retry, refine, and persist until it finds a better answer (never give up 💪). It's built on Meta's Llama 3.2B. Instead of optimizing for recall or speed, we train the model to retry when it's wrong - using reinforcement learning to build persistence into the search process. - Model: huggingface.co/Menlo/ReZero-… - Code: github.com/menloresearch/ReZ… Thanks to @AIatMeta for the Llama 3.2B base, @UnslothAI for AutoDidact (the framework we built on), and @bartowski1182 for quantizing the model!
7
38
146
14,171
🍓 Ichigo is smarter now with v0.4 - MMLU score up to 64.66 - Better at picking out voices in noisy settings - Tracks complex, multi-turn conversations better Links: - GitHub: <github.com/homebrewltd/ichig…> - Live demo: ichigo.homebrew.ltd/ - Model weights: huggingface.co/collections/h…
6
21
114
9,728
IchigoWhisper: Speech recognition for low-resource languages. We're building an ASR for low-resource languages, including Southeast Asian languages like Vietnamese. Low-resource languages face challenges like limited datasets and high word error rates. Existing models, including Whisper, often fail to perform well under these conditions. IchigoWhisper is built to solve this. It focuses on delivering higher accuracy and lower word error rates specifically for low-resource scenarios, making it a better solution for underrepresented languages. IchigoWhisper outperforms Whisper on low-resource languages by using regularization techniques that improve accuracy for these languages while maintaining strong English performance. #homebrewaidaily
4
9
68
3,028
We made progress on a robotic hand manipulation model, PoseLess. PoseLess reproduces 2D images of hands directly in 3D simulation, without complex inverse kinematics. Here's how 👇 PoseLess is a depth-free vision-to-joint model that eliminates the intermediate pose estimation step. By leveraging a Vision Language Model (Qwen 2.5 3B), we map 2D images directly to joint angles with surprising accuracy. Two core components make this work: - VLM Image Projection: The model processes images directly to predict joint angles, eliminating error buildup from multi-stage pipelines. - Synthetic Data Engine: Generates training examples by randomizing joint angles and visual features - creating perfect ground truth without human labeling. What this enables: - Control from monocular images (no depth sensors) - Real-world performance without real-world training data - Cross-morphology transfer from robot to human hands - Competitive accuracy with a simpler pipeline - Lower latency control Paper: arxiv.org/abs/2503.07111 GitHub: github.com/janhq/poseless HuggingFace: huggingface.co/homebrewltd/P… This work is very much early stage and untested, and we might explore tangential routes. But it worked and we found the results interesting nevertheless. Join our Discord community to discuss this work and collaborate on future developments: discord.gg/Exe46xPMbK
1
2
52
11,428
We're training a speech model with the speechless method. First, training speech models is tough - collecting Q&A speech data is expensive, slow, and doesn't scale. Traditional pipelines rely on generating audio (TTS -> ASR), adding unnecessary complexity. With the Speechless method, we skip audio entirely. How Speechless training works: Text is directly converted into semantic speech tokens, cutting out the noise (literally). It's simpler, saves resources, and makes training for low-resource languages actually doable. Plus, this approach scales infinitely - and lets 🍓 Ichigo handle larger context windows in the next release. #homebrewaidaily
6
8
56
4,419
Homebrew (Jan) is now Menlo Research. Same mission, sharper focus: Menlo is an open R&D lab building the brain for robots. The work has grown. So has the team. Robotic agents, hardware — it made sense to give it all one name. Same values: - Jan, Cortex, and models stay open - Research stays public - We'll keep building in public: menlo.ai/handbook Join our "anti dumb" robot club: discord.gg/Exe46xPMbK
2
13
54
6,196
Llama3.1 just got ears with llama3s! 🦙👂 We're teaching Llama3.1 to listen - an open, ongoing experiment with @AIatMeta's llama3. This v0.2 can understand human speech, but it's in its early days with limitations/bugs.
1
19
52
5,883
🤖 Cortex: Local AI API Platform With Cortex, you can run and customize AI models locally on your device or server. We're launching the preview of Cortex, our journey to implement a local version of the OpenAI API Platform. Web: cortex.so/ - Straightforward CLI (inspired by @ollama) - Full C++ implementation, packageable into Desktop and Mobile apps - Pull from @huggingface or from Built-in Model Library - Swappable Engines (default: @ggerganov's llama.cpp, future: @onnxruntime, @nvidia's TensorRT-LLM) - Models stored in universal file formats (vs. blobs) Cortex's roadmap is to implement the full @OpenAI API including Tools, Runs, Multi-modal and Realtime APIs. Jan will use Cortex as a backend, as we implement a local alternative to "Advanced Voice Mode" (i.e. Realtime API running Speech models, such as 🍓 Ichigo). - Give it a try: cortex.so/ - Source Code: github.com/janhq/cortex.cpp - API Reference: cortex.so/api-reference/ - Community: discord.gg/Exe46xPMbK
3
12
48
5,945
We've welcomed some new friends on X after our 🍓 Ichigo launch, so let us introduce ourselves again! 🖖 Homebrew is a Local AI Company. We are the creators and lead maintainers of: - 👋 @jandotai: Local AI Assistant (~1.7mn downloads) - 🤖 @cortex_so: Local AI Toolkit We train open source models: - 🍓 Ichigo: Local Real-Time Voice AI We tinker with hardware: - 💡 Menlo: AI Hardware Coming Soon - ⛩️ Xanadu: GPU Cluster Coming Soon Join our Discord to see our journey: discord.com/invite/FTk2MvZwJ… We welcome business inquiries: homebrew.ltd/work-with-us
6
44
2,060
Latency? 🍓 Ichigo is a local real-time voice AI with <150ms.
Alan
6
4
42
3,762
We explained 🍓 Ichigo, the local real-time voice AI, in our first paper. It's a mixed-modal early fusion model with 111 ms latency to first token. arxiv.org/abs/2410.15316
2
8
31
1,410
Why we're building IchigoWhisper when Whisper already exists? We love Whisper, but it struggles with low-resource languages - those with less available training data. - IchigoWhisper outperforms Whisper in languages like Vietnamese - On Vietnamese (viVoice), IchigoWhisper achieves a ~7.5% lower Word Error Rate (WER) compared to Whisper - Despite its focus on low-resource languages, it performs on par with Whisper in English (LibriTTS-R) IchigoWhisper is built with a new regularization technique we developed (paper coming soon).
3
4
31
1,141
Introducing 🖖 Homebrew. We started as an open-source team working on Jan, a personal AI. Along the way, we encountered challenges in product, software, hardware, and even data center levels. These challenges have inspired to us think bigger to find better solutions.
4
7
28
5,144
Open call for LLM researchers and audio experts! We can brew AI together - join our Discord: discord.gg/hTmEwgyrEg Blog for details: homebrew.ltd/blog/llama-lear… Code: github.com/homebrewltd/ichig… Demo on a single 3090: ichigo.homebrew.ltd/ 10/10
2
4
28
1,766
Inspired by the @AIatMeta's Chameleon and Llama Herd papers, llama3-s (Ichigo) is an early-fusion, audio and text, multimodal model. We're experimenting with this research entirely in the open, with an open-source codebase, open data, and open weights. 2/10
1
3
30
2,781
Menlo Research is hiring, come build cool stuff with us. Some open roles: - Sr Backend Engineer (Tauri) - Sr Software Engineer - Research Engineer (PyTorch) - Lead Electromechanical Systems Engineer - Mechanical Engineer - Management Associate menlo.bamboohr.com/careers/
5
28
5,366
Generating 250k samples in under 30 minutes with a scalable, both VRAM- and RAM-efficient synthetic data pipeline. We've developed a pipeline built using @vllm_project and a custom Text-to-Semantic model (@AIatMeta's Llama 3.2 1B), running distributed processing on Ray across multiple GPUs. Minimal VRAM requirements keep it efficient, while high throughput enables generating 250k samples in just 20-30 minutes. For comparison, our old pipeline took 5-6 hours to generate the same number of samples and required up to 900GB of RAM. The new approach drastically improves efficiency. Sharing this in case it's helpful for others working on scalable synthetic data generation.
2
4
22
968
Speechless: instruction tuning for speech, without speech. We're releasing new models, datasets, and a method for training speech instruction models without using any audio data. No waveforms, no TTS. Instead, Speechless learns to generate Whisper-compatible semantic tokens directly from text. That means you can train LLMs to follow spoken instructions without ever training on speech. This paper was just accepted to Interspeech 2025! This came out of a practical challenge: building a speech assistant without enough speech data and no high-quality TTS system. So we stopped trying to synthesize audio and started simulating the representation of speech instead. How it works: - Train a quantizer to tokenize Whisper's speech embeddings - Train Speechless (1B decoder-only LM) to translate text → semantic tokens - Use these to instruction-tune a Llama model - At inference, the model accepts real speech via Whisper, even though it was only trained on synthetic tokens. Results: - On CommonVoice VI, our synthetic-token model beats Whisper in noisy conditions - Comparable to Llama-Omni on VoiceBench, without any real spoken instructions - Some MMLU degradation, expected from mixed-modality tuning This is for teams building speech systems in languages with scarce data or low TTS quality, so it's a practical pipeline to bootstrap instruction-following speech models. - Paper: arxiv.org/abs/2505.17417 - Model: huggingface.co/Menlo/Speechl… - Dataset: huggingface.co/datasets/Menl… - LLM: huggingface.co/Menlo/Ichigo-… - Code: github.com/menloresearch/ich…
1
5
23
928
Jan (Menlo Research) is on TechCrunch's list of the 20 hottest open source startups of 2024. Built in the open. Always will be. Appreciate everyone building with us.
The 20 hottest open source startups of 2024 tcrn.ch/4isx53P
5
23
1,711
We're publishing the 🍓 Ichigo paper: Mixed-Modal Early-Fusion Realtime Voice Assistant Ichigo processes speech and text together with just 111 ms latency to the first token. No separate adapters – unified handling of both modalities. Optimized for fast and accurate speech. huggingface.co/papers/2410.1…
2
2
22
877
Check @huggingface's homepage, Jan‑nano is trending.
4
22
1,464
We're using @AIatMeta's Chameleon approach in our early-fusion speech model. Here's why you should too: Context: We're training Llama3-s publicly. Llama3-s is an early-fusion speech model, that extends Llama3 with native listening capabilities. 🧵1/5
1
7
21
1,352
Meet Rick, the Unitree G1 we're hacking in the lab. When he's not sneaking out for K-TV, we're busy turning him into something useful.
4
20
1,242
AlphaSpace: Giving LLMs 3D capabilities for robotics use-cases. We're exploring how language models handle simple spatial tasks—like placing objects on a table—using a tokenization approach we call AlphaSpace. The model predicts [x, y, z] coordinates using symbolic reasoning in a constrained setup. It builds on ideas from AlphaMaze, but shifts the focus from navigation to manipulation. The current version uses the DeepSeek-R1-Distil-Qwen-1.5B Instruct architecture. Paper: arxiv.org/abs/2503.18769 Model: huggingface.co/homebrewltd/A… Dataset: huggingface.co/datasets/home… Code: github.com/menloresearch/spa… Give it a try 💻, give it a like ❤️, give it a hand 🦾
1
5
20
3,232
ReZero is a reinforcement learning framework that doesn't give up on web search. It keeps refining its answer until it gets it right. Trained to search, not guess. Built with synthetic search engines that force it to persist. - Paper: arxiv.org/pdf/2504.11001 - Model: huggingface.co/Menlo/ReZero-… - Code: github.com/menloresearch/ReZ…
1
3
20
615
How we're training llama3-s in the next phase We're publicly training Llama3-s, an early-fusion speech model that extends @AIatMeta's Llama3 with native listening capabilities. Currently llama3-s v0.2 works ok in a quiet environment. Demo: demo.homebrew.ltd/ But it very quickly encounters limitations in a production setting: - Sensitive to sound pre-compression in incoming audio, e.g. from old AirPods - Gets confused on soundbites longer than 10 seconds - Weak to nonsensical noise - Not multi-turn, English only For the upcoming training run, we're keeping the same early-fusion technique as before, as inspired by @AIatMeta's Chameleon paper. For the encoder we’re using semantic tokens from @OpenAI's WhisperVQ v3. This approach has worked great for our small scale experimentation, and we're excited to push it forward with new sound features. Training-wise, we plan to introduce a training stage to train the model on organic and synthetically generated non-speech audio. For the scope of this small scaling law experiment, the goal is simply to improve model hallucination. Data-wise, we're leaning on #OpenSLR, which has been a very rich speech dataset of read audiobooks from @librivox and extends to 8 languages. We're working on cleaning up this dataset, generating new synthetic semantic tokens, and overall improving the underlying data dataset we’re working with. We're eager to hear your thoughts! Join the discussion on our Discord: discord.gg/uakhH2S7sZ
8
21
1,061
We're in Rotterdam for Interspeech 2025. If you're attending, feel free to stop by and say hello!
3
21
1,229
Yesterday we shared AlphaSpace — a 1.5B parameter LLM that predicts [x, y, z] for simple manipulation tasks. Today, we built a little simulations portal to demonstrate its capabilities. Try it here: alphaspace.menlo.ai/ You can test the model, run prompts, and see how it places objects in 3D space — all in-browser. Built with: - Genesis for physics - A distill of @Alibaba_Qwen 1.5B from @deepseek_ai R1 for thinking - Paper: arxiv.org/abs/2503.18769 p.s. please don't DDoS us again 🙏
1
5
21
787
Give LLMs 3D understanding. New Menlo Research models now on @Kaggle: - AlphaMaze: visual reasoning tasks (ARC-AGI) - AlphaSpace: 3D spatial data for LLMs (no vision needed) Test and explore: kaggle.com/organizations/men…
1
5
19
6,559
Most LLMs fail the moment you ask them to do anything physical. AlphaSpace fixes that. It's ~2x better than GPT-4o on object manipulation without using vision. New blog's up: how AlphaSpace teaches models to reason about space. 🧵
1
6
19
932
If you have at least 12GB VRAM and you're running Llama3.1 at Q4, you’re over-quantizing. This is a quick comparison from our researcher @pokachi2023 on max VRAM utilization in practice, relative to the actual context size you are getting from your LLMs. - Typically, Llama3.1 8bn BF16 will require at least 24GB of VRAM to run. - But Llama3.1 8b with Q8 and Q6 KV cache quantization actually fits in 12GB. It runs on many consumer PCs obtainable for under $2000. - Performance-wise, Q8 model with Q6 KV (MMLU 68.59) is close to the BF16 Baseline (69.4). You'll be getting the level of performance as the model authors originally intended! Here's the toolkit: github.com/hsiehjackson/RULE… Kudos to the RULER paper, which gives us a way to measure the real context size of our long-context language models. 🙏
8
17
1,212
Phase 3: Teaching "I cannot hear" - Fine-tuned for inaudible inputs and multi-turn conversations - Used 513 WhisperSpeech sound tokens - Leveraged @alibaba_cloud's Qwen2.5-72B and @argilla_io's Distillabel - 644 steps, 3+ hours on 8x @nvidia H100s 6/10
1
17
1,961
Huge shoutout to the teams and individuals whose work has been instrumental in this project: - @AIatMeta for the Chameleon and Llama Herd papers - @PyTorch for Torchtune - @huggingface for hosting our demo and datasets - @alibaba_cloud for their Qwen2.5-72B model - @argilla_io for Distillabel - @Collabora for WhisperSpeech - OpenSLR for speech recognition resources - @nvidia for their GPUs! ❤️ - Discord contributors gau.nerst, hydroxide, Blanchon.jl for amazing insights! - Yip Jia Qi for introducing us to the paper Discrete Audio and Speech Benchmarks. Plus, special thanks to r/LocalLlama members for comments to help us to improve Ichigo-llama3.1! 💙
1
1
18
1,407
Our makerspace llama has a new bodyguard. No more drama, llama.
3
2
17
1,569
We extended @GoogleAI 's Gemma 3 (a 2D Vision-Language Model) with some 3D spatial understanding! At Menlo, we're exploring how standard 2D VLMs can comprehend 3D scenes directly from voxel grids. Our approach, VoxRep, slices the 3D voxel space into 2D images (like CT scans!) for the VLM to process. The model learns to extract "voxel semantics" – identifying objects, their color, location, and volume – outputting structured data. This leverages powerful pre-trained 2D VLM models for 3D tasks, avoiding adding complexity by introducing 3d encoder. This work serves as a baseline to test decoder ability to do recognize position, shape and semantics information of a 3d space through a simple 2d sliced image pipeline. Paper: arxiv.org/abs/2503.21214 @huggingface : huggingface.co/Menlo/voxel-r… Researchers: @alandao_ai and Norapat Buppodom Stack: Gemma3 and ModelNet from @Princeton
1
8
16
1,265
What if speech could be treated like text? 🍰 IchigoWhisper makes it possible. It uses a codebook system to compress speech waves into discrete IDs: - Continuous speech is split into smaller, manageable segments - Each segment is assigned an ID from a predefined codebook - The original wave can be reconstructed later by referencing the codebook Why tokenize data this way? It works with existing model architectures without requiring major changes. - It scales efficiently, even with limited real-world data. 99% of the dataset for this approach is synthetically generated using a SpeechLess model based on this tokenizer. - Similar methods are widely adopted, like @nvidia's Cosmo using the Emu3 tokenizer. Discrete IDs make speech data easy to use in multimodal training alongside text and images. We'll be sharing a paper soon.
1
17
736
🍰 IchigoWhisper outperforms Whisper in Vietnamese! IchigoWhisper, the ASR model we've been building, brings: - Better Vietnamese recognition than Whisper - A powerful speechTokenizer for training speech models - Near-perfect compression for speech embeddings Demo: ichigo-whisper.homebrew.ltd/ Blog on @huggingface: huggingface.co/homebrewltd/I… We'll share more soon, but it's almost ready to power your speech projects now!
2
1
16
834
Jan's getting a new look, tool use, and an agentic model that holds its own against the giants. Curious how it all works? We're hosting a casual community call to walk through: - New design - Jan-nano (agentic model) - Technical upgrades - Roadmap RSVP: lu.ma/nimqd2an
2
16
1,368
"Hey folks - I left my head at home. If you find it, please tell it I'm not mad, just disappointed."
1
15
1,835
Meet our first AI hires: Janice & Claire We've built two AI-powered Discord bots to help with Jan & Cortex. 👋 Janice – Answers questions about Jan 🤖 Claire – Answers questions about Cortex Tag them in Discord, and they'll pull answers from our docs. discord.gg/Exe46xPMbK
1
4
15
1,186
We're grateful to the authors of Reasoning-SQL for citing our paper on AlphaMaze! Their paper outlines a reinforcement learning framework that brings partial reward design into the Text-to-SQL task — targeting sparse feedback with measures like schema linking, syntax checks, and AI-based evaluation. It's a careful contribution to a space that benefits from structured, interpretable reasoning. - Cited work: AlphaMaze: Enhancing LLMs' Spatial Intelligence via GRPO: arxiv.org/pdf/2502.14669 - Appears in: Reasoning-SQL: RL with SQL-Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL: arxiv.org/pdf/2503.23157 Authors: - Mohammadreza Pourreza (@googlecloud) - Shayan Talaei (@Stanford) - @ruoxi_cc (@GoogleDeepMind) - @wanxingchen_ (@googlecloud) - Hailong Li (@googlecloud) - @Azaliamirh (@Stanford) - Amin Saberi (@Stanford) - @sercanarik (@googlecloud)
1
4
16
860
Tech in Asia just featured Menlo Research. Highlights: - Brain for robots - Open-source, bootstrapped, built in public - "The real investors are the people who chose to work here" Thanks to Glenn Kaonang & @techinasia. techinasia.com/menlo-researc…
4
14
518
We're gathering AI researchers, builders, and founders in Singapore once more! 🗓️ Wednesday, December 11, 7:00 PM - 8:30 PM 📍 National Library / Lee Kong Chian Reference Library, Singapore 🔗 RSVP: lu.ma/p1gg6wz8 This event will feature lightning talks packed with insights & challenges across 3 categories: horizons, spaces, and sound. Horizons - East Meets West: Shaping the Future of AI, Weiyi M., @01AI_Yi - Orchestrating LLM agents, John Chong Min Tan, ​AgentJo ​- Jan + Cortex, Daniel Ong, Homebrew ​ Spaces ​ - Reimagining Interior Design with AI, Shaun Mak, @spacejot - Automatic the Construction Industry, Ethan Ow, ​Wenti Labs Sound ​ - 🍓 Ichigo - Making Voice Assistants Smarter, Homebrew - AI Oral Presentation Practice, TheLucid - Where Music Meets Magic, Anand Roy, Wubble Join us to learn, share, and spark new ideas with the community. 🔴 RSVP required: lu.ma/p1gg6wz8
6
14
2,176
We're really impressed by a new study on speeding up autoregressive image generation using speculative and parallel decoding methods. The core idea: if we can generate images autoregressively, we can make the process a lot faster with the right techniques. The researchers discovered that not all parts of an image can be generated at the same time because some tokens depend on others and need to be created in sequence to keep the image coherent. To tackle this, they divided the image into smaller grids, based on the idea that spatially close tokens are more likely to depend on each other. By focusing on these smaller grids, they were able to generate distant tokens with weaker dependencies in parallel, while keeping the closely related ones in sequence. This approach achieved up to a 9.5× speedup without sacrificing image quality, making autoregressive image generation much more efficient. Keep reading on @huggingface: huggingface.co/papers/2412.1… #homebrewaidaily
1
1
14
552
Singapore AI Showcase is happening this Tuesday, May 27! - National Library Board, Level 7 – Launch Programme Room - 7:00 – 8:30 PM (doors open 6:30 PM) - No food inside (library rules) - RSVP required: lu.ma/oazupw94 Featuring: - Yuanda Li (@tealseed) - KBHub: Crowd-sourced knowledge powered by AI ​- Kemi Bolatito (Scooly Technology) - Scooly Caseflow AI: How LLM is Redefining Global Mobility ​- Aakash Patil (Nugen) - Domain-Aligned AI for Businesses that Demand Reliability - Peeking into Generative Model Architectures ​- Zhu Liang (eval.16x.engineer) - Towards personalized evaluation for prompts and models ​- @PradyuPrasad (@NUSingapore, Govtech Safety Intern) - Calibrate Debate RSVP: lu.ma/oazupw94
1
14
1,368
Only a few hours left until the Singapore AI Showcase! ​​​🕒 Today, 7:00 pm – 8:30 pm (Doors open at 6:30 pm) 📍 National Library Board, Level 7 – Launch Programme Room RSVP: lu.ma/asukpmrk Speakers: ​- @thorwebdev, @elevenlabs - Talk to DeepSeek R1 with ElevenLabs Conversational AI ​- Chris, @hika_search - AI search for deeper thinking ​- @ye_dennis, @facticityai - Finding the good and true in a world of bad and false information ​- Minu, SouthbridgeAI - Pickasso: Painting the true shape of your data ​- Giang Nguyen, @NanyangBiologics - AI-Driven Drug Discovery ​- Harishankar Durairaj, @SeeWiseAI - From Object Detection to Vision AI Agents ​- Serena Lam, Fuzzy AI 🚀 - 10x your meetings booked - ​Arjun, Smartbeans - AI math tutor who can draw! ​- Yoeven D Khemlani, @jigsawstack - Small fast models with large results Join us: lu.ma/asukpmrk
2
3
14
1,190
Join us for Singapore AI Showcase tonight! We're bringing together AI researchers, engineers, and founders to share what they're building. 🔗 RSVP required: lu.ma/tem5kq0y 🕖 7 - 8:30 PM (Doors open 6:30 PM) 📍 National Library, Level 7 📺 Watch live on YouTube: piped.video/watch?v=zdKlpmWK… Talks by: AI Tools ​- Isaac Tay (RI) — Revel: A Semantic Search Tool Built for Students by Students - ​@vivekkalyansk (Cartograph) — Enriching codebase context with codegraph and documentation for AI agents - @ivanleomk (Kura) — Discovering Hidden Patterns in Customer Conversations ​Embodied and Interactive AI Systems ​- Dr Yanliang Zhang (Weston Robot) — From Lab to Startup: Addressing the Challenges in Practical Robotics - Eugene Cheah (Featherless AI / RWKV open source foundation) — Transformer attention not needed - worlds largest post-transformer AI model @ 72B parameters - John Chong Min Tan (AgentJo) — AgentJo Plays Pokemon ​AI in Practice ​- @kanikamindpeers (@MindPeersCo) — AI Agents, Art and Outcomes in Mental Health - Pulak Goyal (Sustain) - No more spreadsheets - Smarter workflows for Green Buildings - Wei Hsueh (IntentBridges) - Who Owns the Answers Your AI Gives? - ​Ankit Kochar (AIQURIS - A TUV SUD Venture) - AI Quality and Risk management Platform RSVP here: lu.ma/tem5kq0y
3
13
1,410
From the previous checkpoint we identified several areas for improvement: - Limited multilingual capabilities - MMLU performance degradation - Nonspeech input hallucinations - Limited multi-turn conversation understanding 🍓 Ichigo addresses these limitations through a 3-phase training approach. 3/10
1
13
2,080
Ran 🍓 Ichigo on a single 3090 server to show how lightweight it is… and it's burning up! Thanks for the interest, everyone! Try it locally with the instructions here: github.com/homebrewltd/ichig…

ALT Cat With Fire Reflected On The Glass On His Head GIF

1
1
13
602
We're hiring a Software Engineer (API Platform) at Menlo. You’ll help design the backbone of how developers interact with next-gen AI from intuitive APIs to scalable infra and model deployment. - Stack: Kubernetes, Docker, Python, TypeScript, WebRTC - Remote-first Menlo works on open, intelligent systems that anyone can use. If you love building developer platforms and shipping fast, this one's for you: menlo.bamboohr.com/careers/9… Feel free to DM or tag someone who might be a great fit.
1
12
813
Happy Holidays, everyone! 🎁 While you're unwrapping gifts, Ichigo is unwrapping datasets - yes, it's live training right now!
1
1
11
492
Faster open-source voice AI processing? llama3-s vs. traditional cascaded (transcription-first) approach: - 13x faster processing (skips transcription step) - Efficiency gap widens with longer audio - More responsive real-time interactions
1
3
11
1,116
We're meeting in Singapore at 6:30 PM today! 📍 National Library Board, Level 7 - Launch Programme Room RSVP now: lu.ma/ptx5ln1i Join us for the lineup of lightning talks: Agent - @rishdotblog (@defogdata) - Building an open-source Deep Research for your internal data: lessons learnt. - Abhishek Gupta (OneInbox AI) - Beyond Chatbots: The Rise of Voice AI Agents. ​Building Agents ​- @rachpradhan (@homebrewltd) - Bhumi: Fastest Inference Client - Gang Lee (ELGO AI) - Creating Your Own AI Workflows in Minutes. - Muhammad Usman (JAM AI) - AI orchestration made easy. ​Embodying Agents ​- Michael M. Sayre (KABAM) - Powering Intelligent Robotics with Edge and Cloud Synergy - Chen Siwei (Autolife) - Autolife Robot: Affordable AI Robots for Everyone! - Matthew (@NUSingapore) - Toaster Chan: Singaporean Waifu kaya toaster. Closing ​Sneak peek into Menlo's latest happening! Details & RSVP: lu.ma/ptx5ln1i
1
10
850
We're at Echelon Singapore 2025 this week. Come say hi at booth M48 if you’re around. We're showing the agent that gets things done and a humanoid robot.
2
11
609
We're bringing AI companies together the day before Tech Week Singapore ❤️ Join us for talks & chats: lu.ma/mlqwqi6x ​Talks line-up (so far): - Jenni AI, @Calclavia - Test with users' AI twins, @carboncopies_ai - AI Neoclouds and 300k GPU Megaclusters, @dnishball - Pretrain LLMs in 9 days, Calvin Tan
1
1
11
761
Next steps: - Better curation of training dataset - More efficient synthetic data pipeline - Establishing cascaded system, baseline, and ASR benchmarks Long-term: Develop Ichigo as a production-level tool for AI apps. Thanks to #OpenSLR for speech recognition resources!
1
10
1,175
Homebrew's Ichigo paper was cited in Continuous Speech Tokens Make LLMs Robust Multi-Modality Learners by Ze Yuan, Yanqing Liu, Shujie Liu, and @shengzhao8. Their research focuses on advancing multi-modality learning in LLMs, an important step toward real-time speech understanding. We appreciate 🍓 Ichigo being part of this work! @Tsinghua_Uni @microsoft arxiv.org/abs/2412.04917v1
1
9
485
Phase 1: Continual Pre-training on Multilingual Speech - Shifted to 7-language dataset - Updated tokenizer (@collabora's WhisperSpeech) - 8064 steps, 45+ hours on @nvidia 10xA6000s - Used @PyTorch's Torchtune FSDP2 4/10
1
10
2,011
🍓 Ichigo v0.5 is in training! It's fueled by 100% synthetic data, supports both English and Vietnamese and uses our latest speech compression: Speechless! Join our Discord to catch the training live stream: discord.gg/Exe46xPMbK
1
10
536
We're training models live - with cute tunes to keep it fun. Join us on Discord and see how it's done! 🫶 discord.gg/Exe46xPMbK
1
8
311
Singapore AI Showcase – Feb 10! We're bringing AI researchers, builders and founders together on February 10! Expect lightning talks, insights & chats. Join us: lu.ma/asukpmrk 📅 Feb 10, 2025 (Mon) 🕒 7:00–8:30 PM (Doors open 6:30 PM) 📍 National Library Board, Level 7 – Launch Programme Room 🎟️ RSVP: lu.ma/asukpmrk 🎙️ Speakers: ​- @hika_search - AI search for deeper thinking ​- @facticityai - Finding the good and true in a world of bad and false information - SouthbridgeAI - Pickasso: Painting the true shape of your data (about SouthbridgeAI) - ​Nanyang Biologics - AI-Driven Drug Discovery - @SeeWiseAI - From Object Detection to Vision AI Agents - Fuzzy Sequence - 10x your meetings booked - Smartbeans - AI math tutor who can draw! RSVP required: lu.ma/asukpmrk
2
4
9
1,080
Data distribution comparison (v0.2 vs v0.3): - Speech Multi-turn: 0 vs 150K - Speech QA: 679K vs 1.33M - Transcription: 250K vs 400K - Noise Audio: 0 vs 8K - Text-only: 0 vs 150K 9/10
1
9
1,175
We’re discussing the latest AI papers and sharing insights in the community. Join 11,594 AI tinkerers: discord.gg/Exe46xPMbK
1
8
742
We're starting an experiment to train llama3-s to have native listening ability! 🦙 - Using sound compression for discrete audio token representation - Early fusion of audio and text in a multimodal model - Training on synthetic audio data Learn more: homebrew.ltd/blog/can-llama-…
1
1
9
522
👉👈
Introducing 🖖 Homebrew. We started as an open-source team working on Jan, a personal AI. Along the way, we encountered challenges in product, software, hardware, and even data center levels. These challenges have inspired to us think bigger to find better solutions.
1
9
1,019
At our core, Homebrew’s culture is driven by who we are: AI enthusiasts who want to build practical, useful tools for ourselves and others. We are tinkerers: always excited about new innovations, and relentlessly finding ways to improve our product.
1
8
343
To all our users and contributors: thank you 🙏 Our team is grateful for your bug reports, criticism, ideas, and feedback. We’re continually improving, and excited to ship new products in the months ahead.
7
277
We're bringing AI companies together one day before TechWeek Singapore! ​​📅 8 October, Tuesday ​​🕒 6.30 pm ​​📍 National Library Board, Level 7 Lineup so far: Jenni, CarbonCopies, Pints AI, Jan, Cortex, Ichigo. Register here: lu.ma/mlqwqi6x
3
8
715
🔴 The Singapore AI Showcase is streaming live! Tonight, you'll hear from AI researchers, engineers, and founders as they share what they've built, and what they've learned along the way. Watch now: piped.video/watch?v=zdKlpmWK… Live until 8:30PM SGT
2
8
967
Results: - MMLU: Recovered from 0.42 to 0.63 (Base: 0.69) - AudioBench: Improved scores on Open-hermes and - Alpaca instruction tests Limitations: - Weak to nonsensical audio in multi-turn conversation - Multilingual capability not fully explored 7/10
1
8
1,270
Founders, builders, and researchers in Singapore - we're gathering next Tuesday, and you're invited! ❤️ RSVP required: lu.ma/mlqwqi6x
3
7
797
Jan Nano is now live as a hosted endpoint on @featherlessai. You can also run it locally from the @huggingface repo. Thanks to both 🧡
You can now access @menloresearch @jandotai 4B model via Featherless too or on Hugging Face -- just pick our endpoints! featherless.ai/models/Menlo/…
1
7
689
Come say hi at COMPUTEX 2025! Menlo is an open R&D lab building the brain for robots. We're around talking LLMs, hardware, robotics, and open source. Hien To Minh - DevOps Emre Can Kartal (@eckartal) - Marketing #COMPUTEX2025
1
5
7
1,134
Take control of your tools and keep your data where it belongs - with you. Self-hosting is the future.
Replying to @thepatwalls
Mind boggling to me how many people are using home cooked /self hosted tools for this.
1
7
719
If you're in Colorado next week, come join Ramon at CppNow and hear how we're building open-source tooling for robotics. Talk details: schedule.cppnow.org/session/…
2
7
301
Today, we’re becoming 🖖 Homebrew, an AI research lab. We believe deeply in AI’s potential to push the human race forward, but recognize there are difficult infrastructure problems that require time and determination to solve.
1
7
300
Singapore AI Showcase is set for 19 August, featuring 9 rapid-fire talks. - Tue 19 Aug 2025 | 7:00 – 8:30 PM (doors 6:30) - National Library Board, Level 7 - Launch Programme Room - RSVP: lu.ma/tvnf60ry See you there!
1
7
833
We're teaching LLMs to think visually and appreciate your shoutouts! 💙 Big thanks to @fahdmirza for the video—watch it to run AlphaMaze locally: piped.video/watch?v=02n6G1kF…
3
7
645
Xin chào! We're hosting our AI Meetup HCMC on Saturday, May 31 at VNG Campus! This meetup is for anyone building, researching, or learning AI; engineers, founders, students, and researchers to share real work, meet peers, and learn together. RSVP here: lu.ma/7q2ihbj6 (required to attend) Big thanks to VNG Corporation for providing the venue. Note: Most of the event will be in Vietnamese. Some talks may be in English depending on the speaker. What's happening: - Talks on privacy-first local LLMs, complex image reasoning with AlphaMaze, and a new 1000-hour Vietnamese speech dataset. - Open lightning talks, bring your project or research to present: forms.gle/JvcndE6nQGjFTMrb9 - Casual networking and collaboration. RSVP: lu.ma/7q2ihbj6
1
7
408
AI Meetup HCMC is live now at VNG Campus! Talks and networking are underway - join us via the livestream on Facebook if you couldn't make it in person: facebook.com/watch/live/?ref…
1
7
765
Tonight’s the night - are you in? 🦙 Singapore AI Showcase at the National Library Building, 7th floor. RSVP here: lu.ma/mlqwqi6x
1
7
1,980
Models can act better if we describe the world in terms they understand. AlphaSpace gives them that language. Read more in our blog: menlo.ai/blog/alpha-space - Code: github.com/menloresearch/spa… - Model: huggingface.co/Menlo/AlphaSp…
5
228
Feel free to brew a cup of coffee while we work on integrating an AI model into Jan that can understand human speech.
New epic created: 👋 Jan to support its own voice models
1
1
7
423
Singapore AI Showcase May session starts soon. Talks from researchers, devs, and founders will be streamed on the Menlo Research YouTube channel. Watch here: piped.video/live/GVN15nXeb1I
Singapore AI Showcase is happening this Tuesday, May 27! - National Library Board, Level 7 – Launch Programme Room - 7:00 – 8:30 PM (doors open 6:30 PM) - No food inside (library rules) - RSVP required: lu.ma/oazupw94 Featuring: - Yuanda Li (@tealseed) - KBHub: Crowd-sourced knowledge powered by AI ​- Kemi Bolatito (Scooly Technology) - Scooly Caseflow AI: How LLM is Redefining Global Mobility ​- Aakash Patil (Nugen) - Domain-Aligned AI for Businesses that Demand Reliability - Peeking into Generative Model Architectures ​- Zhu Liang (eval.16x.engineer) - Towards personalized evaluation for prompts and models ​- @PradyuPrasad (@NUSingapore, Govtech Safety Intern) - Calibrate Debate RSVP: lu.ma/oazupw94
5
643
There's gotta be a local alternative.
1
1
6
720
🍓 Ichigo, our local real-time voice AI, now talks back! Stop by booth Q125 for a demo or stay tuned for the launch. #TechWeekSingapore
1
6
471