📈We’re excited to introduce the LLMPerf leaderboard: the first public and open source leaderboard for benchmarking performance of various LLM inference providers in the market.
Our goal with this leaderboard is to equip users and developers with a clear understanding of the capabilities and limitations of LLM inference solutions, featuring key providers such as
@replicate,
@awscloud, and
@togethercompute!
You can find the leaderboard here:
github.com/ray-project/llmpe…
The LLMPerf leaderboard tracks three main metrics: time-to-first-token, inter-token latency, and success rate.
- Time-to-first-token (TTFT) measures the time it takes between the query and the first response of the provider. TTFT is especially important for interactive and streaming applications, such as chatbots.
- Inter-token latency measures the average time between consecutive tokens. This is important for applications that require the entirety of the response to be ready, like summarization tasks or agent use cases.
- Finally, success rate measures the number of successful responses where the inference API operates without errors. This measure reflects the reliability and stability of API provider.
Blog announcement:
anyscale.com/blog/comparing-…
(1/2)