Philipp Moritz · Oct 6, 2025 · 8:00 AM UTC

Philipp Moritz

Philipp Moritz

@pcmoritz

6 Oct 2025

The Tinker API recently released by Thinking Machines will have a big impact on how people think about post-training and inference systems. To allow more people to experiment with Tinker like systems and run it on their own hardware, we started SkyRL tx 🧸, an open source project with the goal of implementing the Tinker API, see our blog post novasky-ai.notion.site/skyrl…. We welcome contributions, looking forward to working with the open source community 🚀

194

65,157

Philipp Moritz · Oct 1, 2025 · 6:34 PM UTC

Philipp Moritz

@pcmoritz

1 Oct 2025

Very excited to see the Tinker release by @thinkymachines! @robertnishihara and I had a chance to experiment with the API, see anyscale.com/blog/fine-tunin…. It does a nice job of providing flexibility while abstracting away GPU handling. This will be 🔥 when combined with @raydistributed for simulations, inference and data processing. Looking forward to all the experimentation this unlocks! anyscale.com/blog/massively-…

157

42,652

Philipp Moritz · Jul 10, 2024 · 7:58 PM UTC

Philipp Moritz

@pcmoritz

10 Jul 2024

If you are doing LLM inference, FP8 is almost a no-brainer (almost no accuracy loss, support 2x larger models with the same memory, up to 2x faster). We recently contributed FP8 support to vLLM -- check it out!

Anyscale

@anyscalecompute

10 Jul 2024

We’ve recently contributed FP8 support to the @vllm_project in collaboration with @neuralmagic. With this feature, you can see up to a 1.8x reduction in inter-token latency, with >99% accuracy preservation! 1/n

134

15,326

Philipp Moritz · Oct 14, 2025 · 5:57 PM UTC

Philipp Moritz

@pcmoritz

14 Oct 2025

We are happy to release SkyRL tx 0.0.2, an open source library that implements a backend for the Thinking Machine Tinker API and allows people to set up their own Tinker-like service running on their own hardware. There is lots of new features and it is exciting to see the first contributions from the open-source community, check it out novasky-ai.notion.site/skyrl…

131

11,132

Philipp Moritz · Nov 3, 2025 · 6:34 PM UTC

Philipp Moritz

@pcmoritz

3 Nov 2025

We are happy to release SkyRL tx 0.1 novasky-ai.notion.site/skyrl…, an open source unified training and inference engine that supports the Tinker API. This release has many performance enhancements and also new features but most importantly RL training is now working end-to-end. If you are interested in the project and are coming to #RaySummit, we are giving a talk about SkyRL tx tomorrow (Nov 4) at 4pm, come join us!

10,266

Philipp Moritz · Oct 21, 2025 · 8:07 PM UTC

Philipp Moritz

@pcmoritz

21 Oct 2025

We are happy to announce SkyRL tx 0.0.3! SkyRL tx is an open source library that implements a backend for the Tinker API and allows people to set up their own Tinker-like service running on their own hardware. This release has full MoE support, better checkpointing and the first implementation of sampling. Check it out novasky-ai.notion.site/skyrl…

SkyRL tx v0.0.3 Release

Philipp Moritz, Tyler Griggs, and the SkyRL Team

novasky-ai.notion.site

2,083

Philipp Moritz · Dec 21, 2023 · 9:52 PM UTC

Philipp Moritz

@pcmoritz

21 Dec 2023

Thanks to vLLM, Anyscale Endpoints is at the top of the LLM performance leaderboard 🚀. We are excited to merge more advanced performance optimizations & features like speculative decoding and per-request LoRA adapters upstream soon, stay tuned!

Anyscale

@anyscalecompute

21 Dec 2023

📈We’re excited to introduce the LLMPerf leaderboard: the first public and open source leaderboard for benchmarking performance of various LLM inference providers in the market. Our goal with this leaderboard is to equip users and developers with a clear understanding of the capabilities and limitations of LLM inference solutions, featuring key providers such as @replicate, @awscloud, and @togethercompute! You can find the leaderboard here: github.com/ray-project/llmpe… The LLMPerf leaderboard tracks three main metrics: time-to-first-token, inter-token latency, and success rate. - Time-to-first-token (TTFT) measures the time it takes between the query and the first response of the provider. TTFT is especially important for interactive and streaming applications, such as chatbots. - Inter-token latency measures the average time between consecutive tokens. This is important for applications that require the entirety of the response to be ready, like summarization tasks or agent use cases. - Finally, success rate measures the number of successful responses where the inference API operates without errors. This measure reflects the reliability and stability of API provider. Blog announcement: anyscale.com/blog/comparing-… (1/2)

4,340

Philipp Moritz · Jul 1, 2025 · 7:09 PM UTC

Philipp Moritz

@pcmoritz

1 Jul 2025

There have been a lot of open source RL libraries for training LLMs popping up recently. We took a stab at describing some of the use cases and design decisions they are optimized for: anyscale.com/blog/open-sourc…

Open Source RL Libraries for LLMs | Anyscale

Explore a technical comparison of leading Reinforcement Learning (RL) libraries for LLMs from Ray. This guide analyzes frameworks like TRL, Verl, and RAGEN to help developers choose the best tools...

anyscale.com

8,030

Philipp Moritz · Apr 24, 2025 · 4:06 PM UTC

Philipp Moritz

@pcmoritz

24 Apr 2025

Check out this recent blog post blog.vllm.ai/2025/04/23/open… which describes how OpenRLHF runs on top of @raydistributed and @vllm_project

Accelerating RLHF with vLLM, Best Practice from OpenRLHF

How OpenRLHF uses vLLM, Ray, ZeRO-3, AutoTP, Ray placement groups, and weight synchronization to accelerate PPO and RLHF sample generation for reasoning models

vllm.ai

1,263

Philipp Moritz · Feb 28, 2025 · 7:25 PM UTC

Philipp Moritz

@pcmoritz

28 Feb 2025

After using uv for a while, I think it finally solves most Python dependency problems. Ray and uv fit together perfectly to make package management on a cluster seamless. Check our blog post anyscale.com/blog/uv-ray-pai…

uv + Ray: Pain-Free Python Dependencies in Clusters | Anyscale

Pain-free Python dependencies in clusters with uv + Ray! Learn how to build lightning-fast, consistent environments for distributed applications.

anyscale.com

1,246

Philipp Moritz · Sep 10, 2025 · 10:06 PM UTC

Philipp Moritz

@pcmoritz

10 Sep 2025

Do you find it challenging to run RL / agent simulations at a large scale (e.g. dealing with docker and remote execution)? Check out our blog post anyscale.com/blog/massively-… where we show how to do it with Ray and mini-swe-agent (kudos to @KLieret)

Massively Parallel Agentic Simulations with Ray | Anyscale

anyscale.com

2,268

Philipp Moritz · Jun 26, 2025 · 4:41 PM UTC

Philipp Moritz

@pcmoritz

26 Jun 2025

🚀 We are introducing SkyRL-v0.1: A highly-modular RL library for training LLMs! ✨ Key features: 1) Simple modular design – adapt to your needs by implementing core interfaces 2) 1.8x faster training with async rollouts 3) Optional built-in gymnasium of tool-use tasks (math, code, SQL, search) Perfect for researchers who want to prototype new RL ideas without the usual framework constraints. Blog post: novasky-ai.notion.site/skyrl… Try it out: github.com/NovaSky-AI/SkyRL

993

Philipp Moritz · Jul 31, 2024 · 6:44 PM UTC

Philipp Moritz

@pcmoritz

31 Jul 2024

Excited to be working with @KeertiMelkote, welcome!

Anyscale

@anyscalecompute

31 Jul 2024

Today, we’re welcoming @KeertiMelkote as CEO of Anyscale! anyscale.com/blog/welcome-ke…

858

Philipp Moritz · Jun 4, 2024 · 11:39 PM UTC

Philipp Moritz

@pcmoritz

4 Jun 2024

If you have been working on vLLM related projects (e.g. contributions to vLLM like optimizations or new features, vLLM deployment strategies, or interesting use cases and applications), consider submitting a talk proposal! The vLLM and Ray community would love to hear about it :)

Anyscale

@anyscalecompute

4 Jun 2024

There has been so much excitement and activity around this topic, that we are adding a vLLM track to the Ray Summit! If you contribute to or use @vllm_project, we want to hear from you. raysummit.anyscale.com

422

Philipp Moritz · Sep 5, 2023 · 7:02 PM UTC

Philipp Moritz

@pcmoritz

5 Sep 2023

Ray Summit is going to be a great event for open-source LLM topics. If you haven't registered yet, go join us -- I'm very excited to see everybody there!

Robert Nishihara

@robertnishihara

5 Sep 2023

Ray Summit this month will be 🔥🔥 🤯 ChatGPT creator @johnschulman2 🧙‍♀️ @bhorowitz on the AI landscape 🦹‍♂️ @hwchase17 on LangChain 🧑‍🚀 @jerryjliu0 on LlamaIndex 👨‍🎤 @zhuohan123 and @woosuk_k on vLLM 🧜 @zongheng_yang on SkyPilot 🧑‍🔧 @MetaAI on Llama-2 🧚‍♂️ @Adobe on Generative AI in Firefly 🧑‍💻 @jeffreyhuber on the Chroma vector DB 🧑‍🏫 @weights_biases on LLM observability 🧑‍🎓 @Uber @Airbnb @LinkedIn on their LLMs products 🧑‍🌾 @awscloud on Inferentia and Trainium 🧑‍💼 @googlecloud on LLMs on TPUs This is an unbelievable list. You'll also hear the nitty-gritty details of how AI gets done at @Spotify @NianticLabs @Instacart @Pinterest @Samsara @DoorDash @netflix @AntGroup @InstabaseInc @SnorkelAI @NetEaseGames_EN @LockheedMartin @clarihq and many more. On top of all that, we're running a full day of hands-on trainings where you'll go through the motions and actually build the following 🖥️ ✅ RAG versus fine-tuning ✅ Running LLMs in production ✅ Building products around stable-diffusion models ✅ Delivering AI applications at scale raysummit.anyscale.com/

442

Philipp Moritz · Aug 13, 2024 · 8:56 PM UTC

Philipp Moritz

@pcmoritz

13 Aug 2024

Looking forward to the #vLLM track at Ray Summit! Join us Sep 30-Oct 2 in SF raysummit.anyscale.com/

Ray Summit 2026 | Hosted by Anyscale

Join Ray Summit in San Francisco, Aug 24–26, for technical talks on foundation model training, multimodal AI, RL, and other AI in production systems.

anyscale.com

Robert Nishihara

@robertnishihara

13 Aug 2024

Something we're doing differently this time around, we added a #vLLM track to #RaySummit! @vllm_project is one of the most popular inference engines, and is often used together with @raydistributed for scaling LLM inference. Can't wait to hear from these companies about how they're using (and contributing to) vLLM! - @Roblox - @neuralmagic - @IBMResearch - @koredotai - @Uber - @Apple - @joinHandshake - @intel - @AlibabaGroup - @databricks - @KaikoData

1,459

Philipp Moritz · Sep 5, 2024 · 6:52 PM UTC

Philipp Moritz

@pcmoritz

5 Sep 2024

Looking forward to the Ray Summit! There will be keynotes from AI leaders like Mira Murati (CTO of OpenAI) and Anastasis Germanidis (CTO of Runway) and many talks from the Ray and vLLM community about use cases and the latest developments! Sign up at raysummit.anyscale.com

Ray Summit 2026 | Hosted by Anyscale

Join Ray Summit in San Francisco, Aug 24–26, for technical talks on foundation model training, multimodal AI, RL, and other AI in production systems.

anyscale.com

356

Philipp Moritz · Mar 18, 2025 · 8:13 PM UTC

Philipp Moritz

@pcmoritz

18 Mar 2025

Check out our recent Runway case study ❤️

Anyscale

@anyscalecompute

18 Mar 2025

Runway is pushing the limits of generative AI – proving that innovation accelerates when infrastructure gets out of the way. With Anyscale, they scale effortlessly, freeing their team to focus on building cutting-edge AI for media creation. "Anyscale enables us to push the boundaries of what’s possible in generative AI by giving us the flexibility to scale workloads seamlessly. This removes the risk around our infrastructure and allows our team to focus on innovation rather than infrastructure bottlenecks." – @agermanidis, Co-Founder & CTO @runwayml Learn how Runway scales AI-driven media creation, powered by Anyscale: anyscale.com/resources/case-…

308

Philipp Moritz · Oct 28, 2023 · 5:38 PM UTC

Philipp Moritz

@pcmoritz

28 Oct 2023

Replying to @Dorialexander @amydeng_ @robertnishihara @anyscalecompute @MosaicML @DeepInfra @replicate

On Anyscale you can do fine-tuning and hosting of the fine-tuned models like this: docs.endpoints.anyscale.com/…

242

Philipp Moritz · Jun 12, 2025 · 10:45 PM UTC

Philipp Moritz

@pcmoritz

12 Jun 2025

Check out the new end-to-end examples that @GokuMohandas and @bae_theorem and other have been adding to the Ray documentation, e.g. docs.ray.io/en/master/ray-ov… and a number of others (multi modal LLMs, time series prediction)

325

Philipp Moritz · Oct 6, 2025 · 6:32 PM UTC

Philipp Moritz

@pcmoritz

6 Oct 2025

Replying to @jackson_stokes

I don't know what exactly the integration will look like but it is very natural to make it run very well on Ray. As well as other distributed backends if there is enough interest :)

845

Philipp Moritz · Oct 14, 2025 · 6:38 PM UTC

Philipp Moritz

@pcmoritz

14 Oct 2025

Replying to @guidotrev

We are making it a drop in replacement (just replace the `base_url` in the SDK). Currently the basic stuff is working but there are many things missing (a good opportunity to contribute if you see anything missing you need!)

368

Philipp Moritz · Aug 25, 2025 · 6:58 PM UTC

Philipp Moritz

@pcmoritz

25 Aug 2025

Replying to @ariaurelium

Thanks for pointing this out, I created an issue for this here github.com/ray-project/ray/i… which describes more of the context and also a workaround. We will try to get it fixed :)

[Core] Iterating on uv runtime environment can fill up disk · Issue #55898 · ray-project/ray

What happened + What you expected to happen In some circumstances, iterating on the uv runtime environment may fill up the disk. Probably the most common setting where this happens is if /tmp/ray i...

github.com

Philipp Moritz · Oct 22, 2025 · 3:25 AM UTC

Philipp Moritz

@pcmoritz

22 Oct 2025

Replying to @miyanomamazuki @tyler_griggs_ @thinkymachines

Yes it does, the implementation is in github.com/NovaSky-AI/SkyRL/… if you are interested in checking out the code :)

[tx] Add MultiLoRA MoE layer by pcmoritz · Pull Request #432 · NovaSky-AI/SkyRL

This PR implements MultiLoRA for the MoE layer. It is implemented by routing tokens to the (num_experts, num_adapters) dimensions of the lora_A and lora_B matrices, much in the same way that tokens...

github.com

Philipp Moritz · Jul 25, 2023 · 11:30 PM UTC

Philipp Moritz

@pcmoritz

25 Jul 2023

Check out Ray 2.6 and our new example gallery: docs.ray.io/en/latest/ray-ov…

ray

@raydistributed

25 Jul 2023

The Ray 2.6.1 released with : 🎏 Streaming responses in Serve for real-time capabilities 🎏 📀🏃‍♀️Ray Data streaming integration w/Train 🏃‍♀️☁️Distributed Training & Tuning sync with cloud storage persistence 🤖 Alpha release of the Multi-GPU Learner API 📙 Ray Gallery examples 👇

115

Philipp Moritz · Oct 15, 2025 · 9:41 AM UTC

Philipp Moritz

@pcmoritz

15 Oct 2025

Replying to @MSuryavansh

Not yet, but we'd love your contribution if you know how to implement it, the modeling code is in github.com/NovaSky-AI/SkyRL/… and very easy to understand / hack.

Philipp Moritz · Apr 27, 2023 · 7:40 PM UTC

Philipp Moritz

@pcmoritz

27 Apr 2023

Open-source Ray 2.4 upgrade speeds up generative AI model deployment venturebeat.com/ai/open-sour… via @VentureBeat

Philipp Moritz · Mar 1, 2025 · 5:12 AM UTC

Philipp Moritz

@pcmoritz

1 Mar 2025

Replying to @michalwols @anyscalecompute @astral_sh

That's not currently planned but could be done by extending the runtime environment hook github.com/ray-project/ray/b… -- get the script name, look if there is a "script" section in there, extract it into a pyproject.toml and then use that. You can actually implement this in your own runtime environment hook (which will call the existing hook) without needing to modify Ray and try it out if you want to. If you do we'd love to hear about your feedback in github.com/ray-project/ray/i… :)

ray/python/ray/_private/runtime_env/uv_runtime_env_hook.py at 6880654e6c06d650505bf30c8344eb22402...

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads. - ray-project/ray

github.com