Haotian Tang · Dec 31, 2024 · 2:37 PM UTC

Haotian Tang

Haotian Tang @haotiant1998

31 Dec 2024

Personal update: I am excited to share that I will join @GoogleDeepMind next week after defending my PhD thesis @MITEECS earlier last month. I will be working on generative models that simulate the physical world. Looking forward to the new journey ahead in 2025!

2,185

126,321

Haotian Tang · Oct 15, 2024 · 1:48 PM UTC

Haotian Tang @haotiant1998

15 Oct 2024

🚀 We're thrilled to introduce HART, an efficient AR model that generates stunning 1024x1024 images! 🎨✨ HART delivers: ⚡️ 4.5-7.7x higher throughput 🔋 6.9-13.4x less compute 🔥 top-notch FID & CLIP scores, rivaling diffusion models in quality! Code: tinyurl.com/nkvpnhyk

138

24,482

Haotian Tang · Jun 6, 2024 · 1:47 PM UTC

Haotian Tang @haotiant1998

6 Jun 2024

Excited to share my #MLSys 2024 best paper 🏆 presentation on AWQ. AWQ democratizes edge LLM deployment 💻 and has been downloaded over 1 million times on Huggingface 🙌! piped.video/dcINVsqxQgQ?si=WEuH…

AWQ: Activation-aware Weight Quantization for LLM Compression and...

Talk video for MLSys 2024 Best Paper: "AWQ: Activation-aware Weight...

youtube.com

15,030

Haotian Tang · Jun 6, 2024 · 1:49 PM UTC

Haotian Tang @haotiant1998

6 Jun 2024

AWQ website: hanlab.mit.edu/projects/awq Paper: arxiv.org/abs/2306.00978 Code: github.com/mit-han-lab/llm-a… Joint work with @jilin_14, @jmtang42, @Shang_mit, Wei-Ming Chen, Wei-Chen Wang, @Guangxuan_Xiao, Xingyu Dang, Prof. @gan_chuang, Prof. @songhan_mit.

AWQ: Activation-aware Weight Quantization for LLM Compression and...

Large language models (LLMs) have transformed numerous AI applications. On-device LLM is becoming increasingly important: running LLMs locally on edge devices can reduce the cloud computing cost...

arxiv.org

1,308

Haotian Tang · Oct 15, 2024 · 1:52 PM UTC

Haotian Tang @haotiant1998

15 Oct 2024

📄 Paper: arxiv.org/abs/2410.10812 🌐 Project: hanlab.mit.edu/projects/hart 🖥 Demo: hart.mit.edu 💻 Code: github.com/mit-han-lab/hart

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

We introduce Hybrid Autoregressive Transformer (HART), an autoregressive (AR) visual generation model capable of directly generating 1024x1024 images, rivaling diffusion models in image generation...

arxiv.org

1,167

Haotian Tang · May 8, 2024 · 4:51 PM UTC

Haotian Tang @haotiant1998

8 May 2024

🔥Welcome to try out QServe! TRT-LLM efficiency⚡️ + PyTorch flexibility 😄, your LLM serving turn-key solution 🔑

Shang Yang @Shang_mit

8 May 2024

🔥🎉Thrilled to introduce QServe, our latest breakthrough in efficient LLM serving with W4-A8-KV4 quantization. 🚀⚡1.2-3.5x higher throughput over TensorRT-LLM. 💵 Matches TensorRT-LLM’s A100 throughput with 3x cheaper L40S GPUs. 👐 Code: github.com/mit-han-lab/qserv… (1/4)

1,081

Haotian Tang · Oct 15, 2024 · 1:50 PM UTC

Haotian Tang @haotiant1998

15 Oct 2024

✨ How it works: We decompose continuous latents into two parts: 🔹 Discrete tokens for the big picture, modeled by a scalable-resolution AR transformer 🔸 Residual tokens for image details, handled by a lightweight diffusion module (37M parameters, 8 sampling steps)

1,407

Haotian Tang · Jan 22, 2025 · 3:12 AM UTC

Haotian Tang @haotiant1998

22 Jan 2025

What an achievement! Congrats to the team!

Demis Hassabis

@demishassabis

21 Jan 2025

Our latest update to our Gemini 2.0 Flash Thinking model (available here: goo.gle/4jsCqZC) scores 73.3% on AIME (math) & 74.2% on GPQA Diamond (science) benchmarks. Thanks for all your feedback, this represents super fast progress from our first release just this past Dec! Latest version also includes code execution, a 1M token content window & a reduced likelihood of thought-answer contradictions. We’ve been pioneering these types of planning systems for over a decade, starting with programs like AlphaGo, and it is exciting to see the powerful combination of these ideas with the most capable foundation models.

709