Co-founder and CTO at @anyscalecompute. Co-creator of @raydistributed. Interested in ML, AI, computing.

San Francisco
The Tinker API recently released by Thinking Machines will have a big impact on how people think about post-training and inference systems. To allow more people to experiment with Tinker like systems and run it on their own hardware, we started SkyRL tx 🧸, an open source project with the goal of implementing the Tinker API, see our blog post novasky-ai.notion.site/skyrl…. We welcome contributions, looking forward to working with the open source community 🚀
6
25
194
65,157
Very excited to see the Tinker release by @thinkymachines! @robertnishihara and I had a chance to experiment with the API, see anyscale.com/blog/fine-tunin…. It does a nice job of providing flexibility while abstracting away GPU handling. This will be 🔥 when combined with @raydistributed for simulations, inference and data processing. Looking forward to all the experimentation this unlocks! anyscale.com/blog/massively-…
4
24
157
42,652
If you are doing LLM inference, FP8 is almost a no-brainer (almost no accuracy loss, support 2x larger models with the same memory, up to 2x faster). We recently contributed FP8 support to vLLM -- check it out!
We’ve recently contributed FP8 support to the @vllm_project in collaboration with @neuralmagic. With this feature, you can see up to a 1.8x reduction in inter-token latency, with >99% accuracy preservation! 1/n
1
16
134
15,326
We are happy to release SkyRL tx 0.0.2, an open source library that implements a backend for the Thinking Machine Tinker API and allows people to set up their own Tinker-like service running on their own hardware. There is lots of new features and it is exciting to see the first contributions from the open-source community, check it out novasky-ai.notion.site/skyrl…
4
20
131
11,132
We are happy to release SkyRL tx 0.1 novasky-ai.notion.site/skyrl…, an open source unified training and inference engine that supports the Tinker API. This release has many performance enhancements and also new features but most importantly RL training is now working end-to-end. If you are interested in the project and are coming to #RaySummit, we are giving a talk about SkyRL tx tomorrow (Nov 4) at 4pm, come join us!
4
10
77
10,266
We are happy to announce SkyRL tx 0.0.3! SkyRL tx is an open source library that implements a backend for the Tinker API and allows people to set up their own Tinker-like service running on their own hardware. This release has full MoE support, better checkpointing and the first implementation of sampling. Check it out novasky-ai.notion.site/skyrl…
2
29
2,083
Thanks to vLLM, Anyscale Endpoints is at the top of the LLM performance leaderboard 🚀. We are excited to merge more advanced performance optimizations & features like speculative decoding and per-request LoRA adapters upstream soon, stay tuned!
📈We’re excited to introduce the LLMPerf leaderboard: the first public and open source leaderboard for benchmarking performance of various LLM inference providers in the market. Our goal with this leaderboard is to equip users and developers with a clear understanding of the capabilities and limitations of LLM inference solutions, featuring key providers such as @replicate, @awscloud, and @togethercompute! You can find the leaderboard here: github.com/ray-project/llmpe… The LLMPerf leaderboard tracks three main metrics: time-to-first-token, inter-token latency, and success rate. - Time-to-first-token (TTFT) measures the time it takes between the query and the first response of the provider. TTFT is especially important for interactive and streaming applications, such as chatbots. - Inter-token latency measures the average time between consecutive tokens. This is important for applications that require the entirety of the response to be ready, like summarization tasks or agent use cases. - Finally, success rate measures the number of successful responses where the inference API operates without errors. This measure reflects the reliability and stability of API provider. Blog announcement: anyscale.com/blog/comparing-… (1/2)
1
3
29
4,340
After using uv for a while, I think it finally solves most Python dependency problems. Ray and uv fit together perfectly to make package management on a cluster seamless. Check our blog post anyscale.com/blog/uv-ray-pai…
4
22
1,246
Do you find it challenging to run RL / agent simulations at a large scale (e.g. dealing with docker and remote execution)? Check out our blog post anyscale.com/blog/massively-… where we show how to do it with Ray and mini-swe-agent (kudos to @KLieret)
7
17
2,268
🚀 We are introducing SkyRL-v0.1: A highly-modular RL library for training LLMs! ✨ Key features: 1) Simple modular design – adapt to your needs by implementing core interfaces 2) 1.8x faster training with async rollouts 3) Optional built-in gymnasium of tool-use tasks (math, code, SQL, search) Perfect for researchers who want to prototype new RL ideas without the usual framework constraints. Blog post: novasky-ai.notion.site/skyrl… Try it out: github.com/NovaSky-AI/SkyRL
2
1
17
993
Excited to be working with @KeertiMelkote, welcome!
2
15
858
If you have been working on vLLM related projects (e.g. contributions to vLLM like optimizations or new features, vLLM deployment strategies, or interesting use cases and applications), consider submitting a talk proposal! The vLLM and Ray community would love to hear about it :)
There has been so much excitement and activity around this topic, that we are adding a vLLM track to the Ray Summit! If you contribute to or use @vllm_project, we want to hear from you. raysummit.anyscale.com
14
422
Ray Summit is going to be a great event for open-source LLM topics. If you haven't registered yet, go join us -- I'm very excited to see everybody there!
Ray Summit this month will be 🔥🔥 🤯 ChatGPT creator @johnschulman2 🧙‍♀️ @bhorowitz on the AI landscape 🦹‍♂️ @hwchase17 on LangChain 🧑‍🚀 @jerryjliu0 on LlamaIndex 👨‍🎤 @zhuohan123 and @woosuk_k on vLLM 🧜 @zongheng_yang on SkyPilot 🧑‍🔧 @MetaAI on Llama-2 🧚‍♂️ @Adobe on Generative AI in Firefly 🧑‍💻 @jeffreyhuber on the Chroma vector DB 🧑‍🏫 @weights_biases on LLM observability 🧑‍🎓 @Uber @Airbnb @LinkedIn on their LLMs products 🧑‍🌾 @awscloud on Inferentia and Trainium 🧑‍💼 @googlecloud on LLMs on TPUs This is an unbelievable list. You'll also hear the nitty-gritty details of how AI gets done at @Spotify @NianticLabs @Instacart @Pinterest @Samsara @DoorDash @netflix @AntGroup @InstabaseInc @SnorkelAI @NetEaseGames_EN @LockheedMartin @clarihq and many more. On top of all that, we're running a full day of hands-on trainings where you'll go through the motions and actually build the following 🖥️ ✅ RAG versus fine-tuning ✅ Running LLMs in production ✅ Building products around stable-diffusion models ✅ Delivering AI applications at scale raysummit.anyscale.com/
10
442
Looking forward to the #vLLM track at Ray Summit! Join us Sep 30-Oct 2 in SF raysummit.anyscale.com/
Something we're doing differently this time around, we added a #vLLM track to #RaySummit! @vllm_project is one of the most popular inference engines, and is often used together with @raydistributed for scaling LLM inference. Can't wait to hear from these companies about how they're using (and contributing to) vLLM! - @Roblox - @neuralmagic - @IBMResearch - @koredotai - @Uber - @Apple - @joinHandshake - @intel - @AlibabaGroup - @databricks - @KaikoData
4
9
1,459
Looking forward to the Ray Summit! There will be keynotes from AI leaders like Mira Murati (CTO of OpenAI) and Anastasis Germanidis (CTO of Runway) and many talks from the Ray and vLLM community about use cases and the latest developments! Sign up at raysummit.anyscale.com
1
7
356
Check out our recent Runway case study ❤️
Runway is pushing the limits of generative AI – proving that innovation accelerates when infrastructure gets out of the way. With Anyscale, they scale effortlessly, freeing their team to focus on building cutting-edge AI for media creation. "Anyscale enables us to push the boundaries of what’s possible in generative AI by giving us the flexibility to scale workloads seamlessly. This removes the risk around our infrastructure and allows our team to focus on innovation rather than infrastructure bottlenecks." – @agermanidis, Co-Founder & CTO @runwayml Learn how Runway scales AI-driven media creation, powered by Anyscale: anyscale.com/resources/case-…
6
308
On Anyscale you can do fine-tuning and hosting of the fine-tuned models like this: docs.endpoints.anyscale.com/…
2
3
242
Check out the new end-to-end examples that @GokuMohandas and @bae_theorem and other have been adding to the Ray documentation, e.g. docs.ray.io/en/master/ray-ov… and a number of others (multi modal LLMs, time series prediction)
1
2
4
325
Replying to @jackson_stokes
I don't know what exactly the integration will look like but it is very natural to make it run very well on Ray. As well as other distributed backends if there is enough interest :)
3
845
Replying to @guidotrev
We are making it a drop in replacement (just replace the `base_url` in the SDK). Currently the basic stuff is working but there are many things missing (a good opportunity to contribute if you see anything missing you need!)
3
368
Check out Ray 2.6 and our new example gallery: docs.ray.io/en/latest/ray-ov…
The Ray 2.6.1 released with : 🎏 Streaming responses in Serve for real-time capabilities 🎏 📀🏃‍♀️Ray Data streaming integration w/Train 🏃‍♀️☁️Distributed Training & Tuning sync with cloud storage persistence 🤖 Alpha release of the Multi-GPU Learner API 📙 Ray Gallery examples 👇
2
115
Replying to @MSuryavansh
Not yet, but we'd love your contribution if you know how to implement it, the modeling code is in github.com/NovaSky-AI/SkyRL/… and very easy to understand / hack.
2
85
Open-source Ray 2.4 upgrade speeds up generative AI model deployment venturebeat.com/ai/open-sour… via @VentureBeat
1
35
That's not currently planned but could be done by extending the runtime environment hook github.com/ray-project/ray/b… -- get the script name, look if there is a "script" section in there, extract it into a pyproject.toml and then use that. You can actually implement this in your own runtime environment hook (which will call the existing hook) without needing to modify Ray and try it out if you want to. If you do we'd love to hear about your feedback in github.com/ray-project/ray/i… :)
1
22