Fleek · Apr 3, 2026 · 6:10 PM UTC

Fleek

Fleek

@fleek

Apr 3

please excuse the silence. we've been cooking up something cool and are excited to share more details soon

4,727

Fleek · Jan 29, 2026 · 6:51 PM UTC

Fleek

@fleek

Jan 29

NVIDIA just dropped benchmarks showing 4-bit inference loses less than 1 point vs BF16 on most tasks. It's not accuracy per request that you should be measuring. It's tasks completed per dollar. And at that metric, 4-bit wins by a landslide. Read the full blog 👇

Fleek

@fleek

Jan 29

x.com/i/article/201692737652…

NVIDIA Just Killed the "Quantization = Quality Loss" Myth

NVIDIA's new benchmarks show NVFP4 loses less than 1 point on most tasks while delivering 4x FLOPS. The quantization-kills-quality myth is officially dead. There's this take that floats around AI

8,611

Fleek · Jan 29, 2026 · 6:46 PM UTC

Fleek

@fleek

Jan 29

x.com/i/article/201692737652…

NVIDIA Just Killed the "Quantization = Quality Loss" Myth

NVIDIA's new benchmarks show NVFP4 loses less than 1 point on most tasks while delivering 4x FLOPS. The quantization-kills-quality myth is officially dead. There's this take that floats around AI

13,353

Fleek · Jan 24, 2026 · 2:26 PM UTC

Fleek

@fleek

Jan 24

1/ Yesterday we announced mdspan-cute: C++23 std::mdspan syntax with CUTLASS cute layouts. One header. Zero overhead. Here's how it works 🧵

2,974

more replies

Fleek · Jan 24, 2026 · 2:26 PM UTC

Fleek

@fleek

Jan 24

7/ Layout algebra is formalized in Lean 4. 26 theorems, 0 sorry. Properties extracted to RapidCheck tests. The art/ directory has 23 SVG visualizations - we drew pictures until we understood.

2,079

Fleek · Jan 24, 2026 · 2:26 PM UTC

Fleek

@fleek

Jan 24

8/ Check out the code: github.com/weyl-ai/mdspan-cu… Check out the Proofs: github.com/weyl-ai/mdspan-cu… /end

1,823

Fleek · Jan 23, 2026 · 1:29 PM UTC

Fleek

@fleek

Jan 23

💿 Open Source Release 💿 mdspan-cute: a zero-overhead bridge between C++23 std::mdspan and CUTLASS cute layouts. One header. Swizzled memory. No bank conflicts. Read the blog and check out the repo (links in reply)

2,199

Fleek · Jan 23, 2026 · 1:29 PM UTC

Fleek

@fleek

Jan 23

Read the blog: weyl.ai/plan/mdspan-cute/ Check out the repo: github.com/weyl-ai/mdspan-cu…

mdspan-cute: Zero-Overhead Bridge to CUTLASS | Weyl

C++23 std::mdspan meets CUTLASS cute layouts. One header. Zero cost. 26 theorems. 0 sorry.

weyl.ai

1,444

Fleek · Jan 22, 2026 · 4:45 PM UTC

Fleek

@fleek

Jan 22

5/ Quantized RoPE already runs in: → LLaMA → Mistral → Most open source inference stacks This isn't obscure. It's foundational.

696

Fleek · Jan 22, 2026 · 4:45 PM UTC

Fleek

@fleek

Jan 22

6/ On "bit augmentation": Log/exp is a bijection. Information in = information out. You can't create precision from a reversible transformation. Thermodynamics doesn't allow it.

605

Fleek · Jan 20, 2026 · 3:56 PM UTC

Fleek

@fleek

Jan 20

1/Yesterday we announced nix2gpu - a NixOS package for portable GPU containers. Portable containers prevent vendor inference lock-in. Here's why it's a big deal. #Nix #AIInfra

2,070

more replies

Fleek · Jan 20, 2026 · 3:56 PM UTC

Fleek

@fleek

Jan 20

7/ Why it matters: Makes distributed GPU compute easy and deterministic. Philosophy: It's just Linux with libs - complexity is optional. Open-source, MIT-licensed; production-tested on Fleek machines.

853

Fleek · Jan 20, 2026 · 3:56 PM UTC

Fleek

@fleek

Jan 20

8/ Check out more info on nix2gpu: Full blog: weyl.ai/plan/portable-nix-gp… Repo: github.com/fleek-sh/nix2gpu Quickstart in README - test and send feedback! /End

Ruining GPU Market Owners' Day with the Power of Nix | Weyl

Build containers with nix2gpu that run on any GPU market

weyl.ai

794