Kevin Black (@kvablack) | nitter

Kevin Black @kvablack

31 Oct 2024

It's been 6 months since I slammed the brakes on several PhD research projects to go work at π... 😅 super excited to finally share our results! A short 🧵 with some details:

Physical Intelligence

@physical_int

31 Oct 2024

At Physical Intelligence (π) our mission is to bring general-purpose AI into the physical world. We're excited to show the first step towards this mission - our first generalist model π₀ 🧠 🤖 Paper, blog, uncut videos: physicalintelligence.company…

9

49

653

101,754

Kevin Black @kvablack

9 Jun 2025

In LLM land, a slow model is annoying. In robotics, a slow model can be disastrous! Visible pauses at best, dangerously jerky motions at worst. But large VLAs are slow by nature. What can we do about this? An in-depth 🧵:

13

64

525

89,727

Kevin Black @kvablack

5 Jul 2023

The biggest problem with our RL diffusion paper was that nobody could run our Jax+TPU code. No more! I've reimplemented DDPO in PyTorch, plus replicated our results using LoRA for low-memory training! Links below 👇

10

44

294

90,915

Kevin Black @kvablack

24 Sep 2024

These people copy + pasted the website I designed (including the body text), changed some stuff here and there, and then released it as their own work without attribution. The best part is they didn't even change the preview thumbnail so it still says "Octo". Incredible stuff!

Chris Paxton

@chris_j_paxton

23 Sep 2024

Replying to @chris_j_paxton

TinyVLA paper: arxiv.org/pdf/2409.12514 project page: tiny-vla.github.io/

7

6

233

74,417

Kevin Black @kvablack

12 Nov 2024

Here's a link to the recording for anyone that's interested! piped.video/live/ELUMFpJCUS0…

1st Workshop on X-Embodiment Robot Learning, CoRL'24

Workshop Website & Papers: https://sites.google.com/view/xembodimen...

Kevin Black @kvablack

7 Nov 2024

If you're at #CoRL2024, come check out my talk at the X-Embodiment workshop at 1:30pm! Thanks to @KarlPertsch for inviting me to speak!

From Octo to π0: How to Train Your Generalist Robot Policy

ALT From Octo to π0: How to Train Your Generalist Robot Policy

3

17

180

20,177

Kevin Black @kvablack

7 Nov 2024

If you're at #CoRL2024, come check out my talk at the X-Embodiment workshop at 1:30pm! Thanks to @KarlPertsch for inviting me to speak!

From Octo to π0: How to Train Your Generalist Robot Policy

ALT From Octo to π0: How to Train Your Generalist Robot Policy

2

9

149

39,435

Kevin Black @kvablack

9 Jun 2025

This caption is a bit funny to me because we've put precisely zero effort into optimizing our model implementation. Thanks JAX!

"Also, despite its larger size, π0 outperforms both RDT-1B and Diffusion Policy in speed thanks to its optimized JAX implementation (all other methods are implemented in PyTorch)."

ALT "Also, despite its larger size, π0 outperforms both RDT-1B and Diffusion Policy in speed thanks to its optimized JAX implementation (all other methods are implemented in PyTorch)."

3

4

115

14,373

Kevin Black @kvablack

12 Nov 2024

My favorite slide that I made for my talk last weekend -- a very silly thought experiment in which we compare language datasets to robotics datasets (in the most shallow way possible). Yes it is to scale; I learned that the maximum shape size in Keynote is 20,000pts

Comparison of robotics and language datasets in terms of hours: OXE, π dataset, GPT-2 training dataset, and Llama 3 training dataset.

ALT Comparison of robotics and language datasets in terms of hours: OXE, π dataset, GPT-2 training dataset, and Llama 3 training dataset.

5

4

92

20,881

Kevin Black @kvablack

12 Jul 2024

I'm surprised more window manager enthusiasts don't know about yabai. This is what my macOS setup looks like -- maybe I'm just not a power user, but I don't miss i3 one bit!

Screen Recording 2024-07-12 at 4.01.37 PM.mov

Lucas Beyer (bl16)

@giffmana

11 Jul 2024

Replying to @enriquezaf_ @awesome_ruler_

No you cannot, you can only get it down to 200ms or so. At least when i had one 4y ago, and believe me i searched!

9

2

74

22,253

Kevin Black @kvablack

28 Nov 2024

Replying to @cloneofsimo

not just RoPE, tons of ppl copy the original Vaswani et. al. posemb code without thinking. it's *much much worse* for diffusion/flow timestep encodings; if you're not careful, you can end up using an encoding calibrated for t ∈ [0, 1000] with t ∈ [0, 1].

5

67

36,865

Kevin Black @kvablack

28 Aug 2023

🔥🔥🔥 wake up babe, new BridgeData just dropped 🔥🔥🔥 Are you a fan of the original BridgeData? Doesn't matter! BridgeData V2 has 60k trajectories, 24 environments, 13 skills, and 100+ objects.

4

8

62

13,100

Kevin Black @kvablack

23 Feb 2025

Replying to @kevin_zakka

bro capitalizes TeleOperate like it's a Special Move from a fantasy novel

4

57

3,257

Kevin Black @kvablack

14 Dec 2023

We combined the 3 hottest things in machine learning: transformers, diffusion, and cute animal names, and what we got was Octo🐙: an open-source, cross-embodied, generalist robot policy backbone! 1/n

1

7

58

8,872

Kevin Black @kvablack

8 May 2024

If you're at #ICLR2024, come check out my DDPO poster tomorrow -- Thurs 10:45am, poster #21, Hall B! It's crazy looking back to see how much has changed since the paper first came out nearly 1 year ago. Really makes me feel how fast things move in this field.

Sergey Levine

@svlevine

1 Oct 2023

We've updated the DDPO website with some new results for training diffusion models with RL! Our aesthetic bunny is now much more... aesthetic. Latest here: rl-diffusion.github.io/ Includes code, LoRA training for low memory, pretrained models, etc. Some highlights 👇

1

4

48

10,208

Kevin Black @kvablack

31 Oct 2024

I worked a lot on the model design, and I think we ended up with a pretty cool way to adapt a pre-trained VLM backbone for action prediction using a diffusion-style objective (we use flow matching, of course, like all the cool kids these days)

our VLM + action expert flow matching architecture for predicting robot actions

ALT our VLM + action expert flow matching architecture for predicting robot actions

1

3

38

1,704

Kevin Black @kvablack

5 Jul 2023

Here's a link to the code: github.com/kvablack/ddpo-pyt… If you want to learn more about DDPO, you can check out the project website (rl-diffusion.github.io) or @svlevine's original thread below:

GitHub - kvablack/ddpo-pytorch: DDPO for finetuning diffusion models, implemented in PyTorch with...

DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support - kvablack/ddpo-pytorch

Sergey Levine

@svlevine

22 May 2023

We figured out how to train diffusion models with RL to generate images aligned with user goals! Our RL method gets ants to play chess and dolphins to ride bikes. Reward from powerful vision-language models (i.e., RL from AI feedback): rl-diffusion.github.io/ A 🧵👇

2

4

39

10,889

Kevin Black @kvablack

31 Oct 2024

Flow matching is a great fit for modeling continuous action distributions, especially as we scale up data collection and train on many distinct tasks/behaviors/strategies at the same time

2

6

37

5,055

Kevin Black @kvablack

19 Jun 2025

Replying to @anshulkundaje

this simply isn't true, at least not at Berkeley. students with all sorts of backgrounds get in, and everyone I know has grown significantly during their PhD. junior students are not at all like postdocs, and usually have a lot to learn about doing research.

2

33

4,376

Kevin Black @kvablack

9 Jun 2025

Our solution, real-time chunking (RTC), combines action chunking with inpainting — the actions within the inference delay are frozen, while the rest are “inpainted” in a way that’s consistent with the previous plan.

1

2

37

5,225

Kevin Black @kvablack

30 May 2023

We just released the code and model weights for DDPO! Excited to see what the community will do with this 😃 Project website: rl-diffusion.github.io Code (links to weights/demos inside): github.com/jannerm/ddpo

teaser figure showing progressions of images throughout RL training: a picture of a llama becoming more and more compressible by becoming smaller and with a blurry background (top), a picture of a lion becoming more and more aesthetic by adopting a minimalistic artistic style (middle), and a picture of a raccoon washing dishes becoming more and more true to the prompt (bottom)

ALT teaser figure showing progressions of images throughout RL training: a picture of a llama becoming more and more compressible by becoming smaller and with a blurry background (top), a picture of a lion becoming more and more aesthetic by adopting a minimalistic artistic style (middle), and a picture of a raccoon washing dishes becoming more and more true to the prompt (bottom)

3

1

33

7,940

Kevin Black @kvablack

9 Jun 2025

That’s all for now! This project was a long time coming, be sure to check out the full blog post and paper here: pi.website/research/real_tim…

Real-Time Action Chunking with Large Models

A real-time system for large VLAs that maintains precision and speed in the face of high latency.

1

31

2,440

Kevin Black @kvablack

31 Oct 2024

Overall, working at @physical_int has been a blast and joining was definitely the right decision. I can't believe it's only been 6 months and I can't wait for what comes next!

3

25

2,190

Kevin Black @kvablack

2 Oct 2023

DDPO updates: after fixing some numerical precision issues, of all things, the aesthetic quality results look much better! (1/4)

Sergey Levine

@svlevine

1 Oct 2023

We've updated the DDPO website with some new results for training diffusion models with RL! Our aesthetic bunny is now much more... aesthetic. Latest here: rl-diffusion.github.io/ Includes code, LoRA training for low memory, pretrained models, etc. Some highlights 👇

1

5

24

14,017

Kevin Black @kvablack

24 Sep 2024

Replying to @ZhongpaiGao

This is a common sentiment, but I disagree. A website isn't automatically a template unless it's explicitly advertised as such. Copying it without permission is no different than copying someone's artwork or writing.

1

23

1,984

Kevin Black @kvablack

31 Oct 2024

Here's a little secret: π₀-small, which also uses flow matching but not a VLM backbone, was our "main model" for 4+ months and was outperforming many strong baselines! IMO the most exciting benefit of adding the VLM init was drastically improved language following

1

23

1,429

Kevin Black @kvablack

24 Sep 2024

Replying to @chris_j_paxton

The paper is original (as far as I can tell). The website is plagarized. The second body paragraph is almost word-for-word identical, not to mention the overall design, which was obviously copied and slightly modified.

2

1

21

3,382

Kevin Black @kvablack

6 May 2024

Finally arrived in Vienna for #ICLR2024! @mitsuhiko_nm and I will be at the first poster session tomorrow morning presenting SuSIE --- a simple recipe for generalizable robotic manipulation using a pretrained diffusion model. Come check us out at poster #69, hall B, 10:45am!

Sergey Levine

@svlevine

17 Oct 2023

Diffusion models make great images. But can they drive robots? Usually that gets complicated really fast. We figured out how to get a Stable Diffusion model (based on Instruct pix2pix) to drive robotic instruction following. Simple recipe, works on a wide range of tasks. Thread👇

2

20

8,117

Kevin Black @kvablack

9 Jun 2025

Finally, there’s a subtle issue with non-real-time inference that’s easy to overlook: distribution shift. Pauses for inference are not in the training data! We found that RTC was not only faster, but also more precise and consistent than our old synchronous strategy.

2

1

23

2,973

Kevin Black @kvablack

9 Jun 2025

Importantly, this requires no training-time changes! It’s applicable to any diffusion- or flow-based policy at inference time. With RTC, we get smooth real-time execution.

1

24

2,463

Kevin Black @kvablack

5 Jul 2023

Thanks to LoRA and mixed precision, you can now finetune Stable Diffusion with less than 10GB of GPU memory. It runs on my 1080ti!

a chart showing that LoRA + mixed-precision is able to pretty much match the performance of non-LoRA full-precision training

ALT a chart showing that LoRA + mixed-precision is able to pretty much match the performance of non-LoRA full-precision training

2

2

17

1,679

Kevin Black @kvablack

19 Jun 2025

Replying to @anshulkundaje

yes, most admits have written at least one paper (not necessarily at top conference), but that doesn't make them "basically postdocs". also, almost everyone views their ugrad work as immature and weak compared to their PhD work -- which supports your wider argument for academia!

1

15

1,301

Kevin Black @kvablack

9 Jun 2025

Model size is not the only contributor to latency. Personally, I’m betting that the VLAs that solve physical intelligence will not be able to fit in onboard robot computers. That means we will need centralized inference servers, and we will have network latency.

2

20

1,881

Kevin Black @kvablack

31 Oct 2024

For super cool uncut videos of evaluations and all that other good stuff, check out our blog post: physicalintelligence.company…

1

18

2,675

Kevin Black @kvablack

5 Jul 2023

Shoutout to my friend @shreyaskapur and his new library Iceberg (github.com/revalo/iceberg) for the sick animation! Fun fact, the animation would also not have been possible without LoRA, because otherwise the checkpoints are way too big to save every epoch.

GitHub - revalo/iceberg: A compositional diagramming and animation library as an eDSL in Python

A compositional diagramming and animation library as an eDSL in Python - revalo/iceberg

2

16

1,057

Kevin Black @kvablack

9 Jun 2025

For smooth execution, we need to always produce the next action as soon as it’s needed. This is called a “real-time constraint”. With high-latency models, this requires concurrency: generating new actions while executing old ones. But naive concurrency does not work.

1

16

2,578

Kevin Black @kvablack

19 Jun 2025

Replying to @anshulkundaje

there is selection towards people with prior research experience, but "multiple first author papers at top conferences" is not common at all

3

14

1,307

Kevin Black @kvablack

9 Jun 2025

To prepare for this future, we added up to +200ms of artificial latency to π0.5 (>300ms total), and the speed and performance of RTC were totally unaffected!

1

16

1,681

Kevin Black @kvablack

4 Jul 2023

my therapist: incompressible drake isn't real he can't hurt you incompressible drake:

1

14

636

Kevin Black @kvablack

2 Oct 2023

tag urself im sixx ttutttas

a screenshot from the DDPO website showing reward-hacking against LLaVA, where the diffusion model generates some wacky looking text instead of the correct number of animals

ALT a screenshot from the DDPO website showing reward-hacking against LLaVA, where the diffusion model generates some wacky looking text instead of the correct number of animals

2

13

1,095

Kevin Black @kvablack

13 Jun 2024

Google Brain really cooked with bfloat16

13

1,607

Kevin Black @kvablack

9 Jun 2025

I mean, technically the model is optimized... by the XLA compiler, not by a human! from arxiv.org/abs/2502.19645

1

1

13

2,281

Kevin Black @kvablack

21 Feb 2024

how to check if the latest "AI feature" rollout is being powered by a real LLM (google messages "magic compose" passes with flying colors)

11

867

Kevin Black @kvablack

24 Jul 2023

Also at ICML 👍

Katie Kang @katie_kang_

24 Jul 2023

Same 👍

1

11

2,171

Kevin Black @kvablack

21 May 2024

Octo has been accepted to RSS 2024! For the full paper, we added some juicy new experiments (including bimanual ALOHA). And of course we're also releasing some new and improved models! The best part of finally uploading to arXiv is getting those sweet sweet AK tweets 😉

AK

@_akhaliq

21 May 2024

Octo An Open-Source Generalist Robot Policy Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a

1

10

862

Kevin Black @kvablack

14 Dec 2023

I'm especially proud of `octo.data`. We knew we had to take advantage of the incredible Open X-Embodiment dataset, but it turns out that just because the data exists doesn't mean that loading it is easy! We went through a lot of pain so that you don't have to. 3/n

1

1

9

637

Kevin Black @kvablack

12 Nov 2024

Replying to @pfbudzianowski @physical_int

nice! you have a minor bug in your timestep sampling function -- it should be `(self.s - sample) / self.s`. also, you don't need `num_train_timesteps` ;)

1

6

560

Kevin Black @kvablack

23 Nov 2024

Replying to @jokrvivek @maticrobots

if productization is the bottleneck, then shouldn't there be non-product prototype of a general-purpose humanoid, or even autonomous decluttering?

1

2

5

658

Kevin Black @kvablack

26 Nov 2024

Replying to @CyberRobooo

The robot is teleoperated. Here is the original post from @watneyrobotics, which you copied and watermarked as your own.

Watney Robotics

@watneyrobotics

9 Aug 2024

Ever seen a robot do a cannonball?

1

5

411

Kevin Black @kvablack

26 Dec 2023

Replying to @ericjang11

PyTorch always manages to be wrong 🫣

1

2

1,319

Kevin Black @kvablack

25 Sep 2024

Replying to @ZhongpaiGao @giffmana

copying someone else's original work without acknowledgement is plagiarism. technically, it's also copyright infringement (with or without acknowledgement), although academics are typically protected by fair use.

1

5

685

Kevin Black @kvablack

14 Dec 2023

Website: octo-models.github.io Code: github.com/octo-models/octo An long-running collaboration with @HomerWalke @KarlPertsch @kvablack @oier_mees @SudeepDasari @JoeyHejna Toby Kreiman @jianlanluo Charles Xu @youliangtan @DorsaSadigh @chelseabfinn @svlevine 4/n

1

4

608

Kevin Black @kvablack

30 May 2023

Thanks again to my awesome collaborators @michaeljanner, @du_yilun, @ikostrikov, and my advisor @svlevine

4

377

Kevin Black @kvablack

24 Nov 2024

Replying to @jokrvivek @maticrobots

I think a) is the hard part and still an open research problem with plenty of people working on it

1

4

226

Kevin Black @kvablack

23 May 2024

Replying to @ekzhang1

> JAX is the framework most LLMs are trained on Is this true? Which companies use JAX? (genuinely curious)

1

3

454

Kevin Black @kvablack

28 Aug 2023

Sometimes, data is all you need. We got *6 different methods* -- both image-conditioned and language-conditioned, imitation learning and RL -- to achieve zero-shot generalization to new tasks, objects, and environments. Download the data for yourself 👇 rail-berkeley.github.io/brid…

BridgeData V2: A Dataset for Robot Learning at Scale

rail-berkeley.github.io

1

4

553

Kevin Black @kvablack

14 Dec 2023

And especially grateful to my co-leads, @its_dibya @HomerWalke @KarlPertsch @oier_mees. Everyone went absolutely all-in on making this into a killer project -- working together has been one whale of an opportunity! n/n

1

3

483

Kevin Black @kvablack

19 Jun 2025

Replying to @arthurallshire @anshulkundaje

I'm sure it varies by lab based on the PI's recruiting preferences. I personally have 1 ugrad paper in a very small, non-AI conference. I would say the majority of my PhD friends don't have the "extensive prior experience" background

1

2

244

Kevin Black @kvablack

28 Aug 2023

Check out Sergey's thread for more:

Sergey Levine

@svlevine

25 Aug 2023

What kind of general-purpose robotic learning algorithm can learn to perform such a huge range of skills in so many different environments, based on either language commands or goals? Let me explain😉 Thread below👇

4

664

Kevin Black @kvablack

29 Nov 2023

Replying to @BoyuanChen0

shameless plug for SuSIE (rail-berkeley.github.io/susi…) -- beats RT-2 and video prediction by quite a large margin (and we tried *very* hard to get video prediction working)

SuSIE: Subgoal Synthesis via Image Editing

rail-berkeley.github.io

3

137

Kevin Black @kvablack

13 Jul 2024

Replying to @yacineMTB

do I need to record a keyboard overlay? it's literally instant I dislike apple in general but unfortunately I also like having an 18 hour battery life and my drivers working on the first try

3

348

Kevin Black @kvablack

4 Jun 2024

This is super useful! I've never actually measured FID because I was too lazy to install 2+ year old repos...

Kevin Frans

@kvfrans

3 Jun 2024

FID computation can be quite esoteric, here's a simple helper to do it in JAX. You can compute FID online during training! This implementation can closely match the numbers from OpenAI's guided-diffusion evaluations. Code: github.com/kvfrans/jax-fid-p… .

3

1,620

Kevin Black @kvablack

24 Nov 2024

Replying to @jokrvivek @maticrobots

that's great that you're working on that, and I'm not saying that your company isn't going to solve it ;) but calling general-purpose manipulation a product problem is like calling self-driving a product problem ever since the 2005 DARPA challenge in the Mojave desert

2

3

187

Kevin Black @kvablack

23 Nov 2024

Replying to @jokrvivek @maticrobots

but have they been successful? I guess my point here is that if a research lab has not once successfully demonstrated what you want your product to do, then it seems quite inaccurate to say that "productizing" is "the bottleneck"

1

3

198

Kevin Black @kvablack

22 May 2023

Replying to @kvablack @generatorman_ai @svlevine

Regardless, BERTScore seemed to be more than sufficient for our tasks. Also worth mentioning that for the counting issue, LLaVA directly produces the "fake" number -- so better response scoring wouldn't fix it, only a better VLM. Again, I'm sure GPT-4 could do it ;).

1

3

68

Kevin Black @kvablack

2 Oct 2023

And finally, here are the maximally aesthetic majestic animals that I can't stop staring at (4/4)

human attention maximizers

ALT human attention maximizers

3

318

Kevin Black @kvablack

14 Oct 2024

Replying to @giffmana @yacineMTB

I was bored this afternoon so I timed it using my phone's 240fps slow-mo camera Yabai on my 2023 MacBook Pro: 71ms (17 frames) i3 on my deskmate's 2020 X1 Carbon: 88ms (21 frames) seems pretty instant to me 😅

3

146

Kevin Black @kvablack

9 Jun 2025

Replying to @ramkumarkoppu

Of course! Our experiments use Pi0.5, which is more or less the same architecture. There is not yet an official implementation of RTC in openpi though.

3

1,036

Kevin Black @kvablack

26 Nov 2023

Replying to @tokenpilled65B @typedfemale

😳

2

36

Kevin Black @kvablack

14 Nov 2024

Replying to @ramkumarkoppu

pi_zero also starts from a pretrained VLM (PaliGemma), which has seen a lot of Internet data! the idea of this slide is to compare datasets rather than models (the labels are a bit confusing in that sense -- should probably read "GPT-2/Llama 3 training dataset")

2

687

Kevin Black @kvablack

22 Apr 2025

Replying to @_k_sridhar @unsorsodicorda @physical_int

yep, Pi05 is also based on PaliGemma 1.

2

68

Kevin Black @kvablack

22 May 2023

Replying to @generatorman_ai @svlevine

Thanks! In my experience, LLaVA isn't good enough to give accurate numerical scores if you ask for them directly. Even ChatGPT with few-shot prompting seemed to struggle with this. I'm sure GPT-4 could do it -- or maybe I'm just not a good enough prompt engineer :'(.

1

2

95

Kevin Black @kvablack

2 Oct 2023

Super cool new work from Google DeepMind that, most importantly, continues the legacy of the "raccoon washing dishes" prompt (3/4)

Kevin Clark @clark_kev

2 Oct 2023

@PaulVicol and I are excited to introduce DRaFT, a method that fine-tunes diffusion models on rewards (such as scores from human preference models) by backpropagating through the diffusion sampling! with @kswersk, @fleet_dj arXiv: arxiv.org/abs/2309.17400 (1/5)

1

2

474

Kevin Black @kvablack

9 Jun 2025

Replying to @giffmana

do you mean during data collection or post-hoc? post-hoc isn't possible because you don't have access to the robot's (and the world's) dynamics. I guess adding pauses during data collection could work but it feels... quite unsavory

1

2

445

Kevin Black @kvablack

12 Jul 2024

Replying to @giffmana @awesome_ruler_

I used to be exactly like you. I transitioned to macOS with yabai/skhd a year ago and quickly found it to be *better* than i3 with the right setup. 0ms switching, intuitive controls, plus edge cases like multiple displays with different resolutions "just work"

1

2

129

Kevin Black @kvablack

24 May 2023

Replying to @AntonWiehe @generatorman_ai @svlevine

We tried a few CLIP experiments but didn't get anything compelling. Stable Diffusion already uses the CLIP text encoder, so you wouldn't expect CLIP applied naively to magically improve things that Stable Diffusion is bad at. LLaVA is generally more powerful than CLIP as well.

1

2

72

Kevin Black @kvablack

9 Jun 2025

Replying to @tarikkelestemur

Kevin Black @kvablack

9 Jun 2025

Replying to @kvablack

I mean, technically the model is optimized... by the XLA compiler, not by a human! from arxiv.org/abs/2502.19645

1

347

Kevin Black @kvablack

26 May 2023

Replying to @AntonWiehe @generatorman_ai @svlevine

Good point, we used L/14 throughout. No idea how LLaVA compares to the largest CLIP models. I do think the LM backbone helps a lot with reasoning capabilities, but that might not come into play until more complex reward functions (which I hope to see in future work!)

1

36

Kevin Black @kvablack

7 Jul 2023

Replying to @payandath @svlevine

Midjourney would absolutely blow us out of the water, haha. These results are more about showing that our algorithm works and can achieve some promising results. I don't think anything based on Stable Diffusion can get anywhere near Midjourney atm.

1

1

27

Kevin Black @kvablack

2 Oct 2023

Shoutout to @carperai's @iScienceLuvr (below) and also @huggingface for integrating DDPO into their DRLX and TRL libraries, respectively (2/4)

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

27 Sep 2023

Really excited to share something I've been working on for the last couple months: DRLX - A library for doing RLHF on diffusion models! Implements the DDPO algorithm but more algorithms coming soon! Read more about the library and our experiments here: carper.ai/enhancing-diffusio…

1

1

185

Kevin Black @kvablack

19 Dec 2023

Replying to @ThePrimeagen

I was 48 hours away from having to ship a frontend thing and I decided to learn Svelte instead of trying to use React (which I had used before) and I think it saved my ass

64

Kevin Black @kvablack

7 Jul 2023

Replying to @offchan420

I think the aesthetic quality DDPO finetuning is decent, but these experiments are really quite small-scale compared to production usage. I think it's up to industry to scale up methods like DDPO and see what really works best.

1

75

Kevin Black @kvablack

15 Jul 2023

Replying to @liliyu_lili

In the SDXL tech report there's an interesting tidbit that COCO FID is negatively correlated with image quality. Their chart has SD2-1 at worse FID than SD1-5, while yours has the opposite. I'm not an expert so I might be missing something -- is there an obvious explanation?

1

160

Kevin Black @kvablack

1 Nov 2024

Replying to @Joe__Black__ @physical_int

Thanks! 1. the long term goal is to solve robot manipulation 2. on the model side, none (actions are just vectors); on the hardware/data side, probably a lot, since we would have to build or buy a hand and then collect lots of data with it

1

1

76

Kevin Black @kvablack

21 May 2024

Always feels nice to have your work shared by others!

Aran Komatsuzaki

@arankomatsuzaki

21 May 2024

🐙 Octo: An Open-Source Generalist Robot Policy Transformer-based diffusion policy, pretrained on 800k robot episodes from the Open X-Embodiment dataset proj: octo-models.github.io/ abs: arxiv.org/abs/2405.12213

1

512

Kevin Black @kvablack

14 Dec 2023

Flexibility was our #1 design principle. You can swap out different observation spaces, action spaces and training objectives with only a config change. This allowed us to get great results across 6 robot setups and 3 institutions! 2/n

1

1

237