Humanloop is the LLM evals platform for enterprises. Trusted by Gusto, Vanta and Duolingo to ship reliable AI products.

SF and London
We're thrilled to announce that the Humanloop team is joining @AnthropicAI! Our mission has always been to enable the rapid and safe adoption of AI. Now, as AI progress accelerates, we think Anthropic is the ideal home to continue this work.
25
21
459
243,077
Interact to clean your timeline: Transformers LLMs Prompt engineering CoT Constitutional AI Ray Kurzweil predictions TPU pods Attention is all you need noam shazeer Scaling laws Colab Pro ooms GPT wrappers Model distillation AGI timelines p doom NVDA VRT TSMC open weights unhobblings infinite context stargate
333
252
5,254
190,130
Today we're excited to announce that we're partnering with @CarperAI of Stability on bringing the first RLHF-trained GPT-3 like model to the open source community. This will be huge. Let us explain
7
67
438
Prompt Engineering is a thing. Here's Prompt Engineering 101 for working with LLMs like GPT-3
5
45
434
102,917
GPT-4 is now available in Humanloop Playground and API! You can start prototyping GPT-4 applications now – with the best in-class tools for LLMs.
9
60
319
83,208
Today we're excited to announce public access to Humanloop for Large Language Models! We're making it easier than ever to build incredible products with GPT-3 Sign-up at humanloop.com
6
22
249
TogetherAI have brought the context window of LLaMA-2-7B to 32k tokens. This is potentially the first time an open source model with competitive performance to ChatGPT can be fine-tuned for long-context tasks. This is important for the following reasons 🧵
8
20
171
43,573
The playground may be the first way you interact with GPT-3... but it's also an IDE in disguise. We want to transform it to be best way to build useful apps with LLMs. Here's how ↓
4
22
125
Humanloop is now generally available! After 2 years of working closely with early customers, we're opening access to our full evals platform. 🧵 Here's what we've learned and how we can help you build great AI products:
17
14
112
38,736
OpenAI on Azure is now generally available. Here's why you may want to use it (and also why you may not)
Just in—new AI models on Azure OpenAI Service empower your business to deliver results at scale. Explore the possibilities → msft.it/6017exfWh #AzureOpenAI
2
19
95
33,080
There are hundreds of different use cases for Large Language Models like GPT-3. Starting today we're going to be showcasing a different awesome start-up and what they're building every day.
3
9
84
16,743
RLHF – Reinforcement Learning from Human Preferences. Models are fine tuned using RL from human feedback. They become more helpful, less harmful and they show a huge leap in performance. An RLHF model was preferred over a 100x larger base GPT-3 model.
3
13
77
BREAKING: ChatGPT API is now publicly available! The price is *10x cheaper* than davinci openai.com/blog/introducing-… Available to use in Humanloop.
1
9
78
12,967
Our CEO @RazRazcle sat down with @rowghani of YC to discuss: • the future of Large Language Models • the huge opportunities for startups • and how to build differentiated products in the AI space Video link below ↓
6
15
75
22,683
Cohere has found that data pruning on a pre-training corpus for LLMs, can result in a performance improvement while using just 30% of the training data. Here’s why this is important 🧵
5
10
69
13,021
OpenAI is the most important company in the world right now. We sat down with @sama where he revealed their future plans. He shared the ambitious vision for ChatGPT, announced plans for a new stateful API and revealed what's held back multimodal GPT. humanloop.com/blog/openai-pl…
6
14
59
16,316
humanity's last stand: AP English Lit
1
8
56
5,932
LLM startup of the day: @phindsearch
2
4
58
7,652
Excited to be working with @carperai and @StabilityAI to build an open InstructGPT model!
2
3
40
LMOps not MLOps
3
5
41
The 100k context window Claude is now available in the Humanloop Playground. Try it out, paste in a book... we're excited to see what you build with it! Model pricing details from Anthropic below
2
5
40
8,044
Want to get started with deep learning and NLP? Start by doing. Get stuck in. We've collected the best ML & NLP resources. Here's our guide 🧵👇
1
11
36
⚡️ Streaming the response from GPT-3 a huge UX win Feels magical as you watch the AI respond. Now available on the Humanloop Playground and Generate endpoints so you can use it in your app.
4
34
4,120
LLM Startup of the day: mem.ai (@memdotai)
2
4
32
10,959
Unveiling the new humanloop.com! Humanloop offers you the best way to rapidly build and continuously improve your NLP models. Powered by Active Learning. The best ML teams follow this approach. Here's how you can too
1
5
32
These models show far greater ability to take instruction, which has massively increased their usability. We think RLHF-tuned models will ultimately be applied to every domain and task, and these systems will unlock incredible amounts of value in the real world.
1
1
29
CarperAI will be building in public, and will be releasing the data, code and weights over the coming months. Follow along in their discord. Like what happened to Stable Diffusion, we can't wait to see the innovation that happens once these models become available for all
1
4
29
only if you want to add a lot of noise to your timeline!
2
26
6,169
OpenAI have just released text-davinci-003 • higher quality writing • handles more complex instructions • better long form content generation and now supported in @Humanloop playground!
1
25
What's the difference between an AI engineer and an ML engineer? "The AI engineer looks after the 0-1 phase whereas the ML Engineer takes on the 1-N phase" @swyx penned the viral essay 'Rise of the AI Engineer' (latent.space/p/ai-engineer) and is the founder of the @aiDotEngineer World's Fair. He joined Raza on the High Agency podcast this week to discuss what an AI engineer really is. Watch: hubs.ly/Q02DLDzj0
2
5
26
7,092
This week we locked ourselves in a country home in South England. Our goal: Launch a unique AI-powered product. • 4 teams • 48 hours • 1 rule - everything had to be built on Humanloop Here's the results 🧵
1
3
24
5,553
Follow us here for our progress along the way
1
1
21
New Release: We’ve just added OpenAI Function Calling to Humanloop! Function Calling has been hailed as one of the most significant OpenAI feature releases of 2023. However it has been tricky to get them into production. Here's how Humanloop helps solve this ⬇️
2
4
24
3,984
Big News: We've just added Evaluation Functions to Humanloop! This gives you a powerful new set of tools to evaluate model performance, allowing you to build LLM applications with much greater confidence than before. Here’s why this is a big deal 🧵
2
5
24
3,702
📍PMs in AI Meetup, London 🇬🇧 Yesterday we held a Meetup in the UCL Centre for Artificial Intelligence for product managers working on AI agents and applications. Huges thanks to all who turned up (it was a full house!) and to our speakers: • @samstphenson (Founder, @meetgranola) - who advised on making your 1 AI feature extremely effective before trying to add any more. • @Albertorizzoli (Co-founder, @V7labs) - who said to listen to user problems, not their proposed solutions (this is more true than ever with AI). • @RazRazcle (Co-founder, @humanloop) - advised to bring domain experts into the prompt engineering and evaluation process as early as possible to drive differentiated and effective AI performance. The London AI community is next level 🚀 What should be the theme of our next meetup? 👀
3
1
25
4,184
You can now talk to GPT-4 in the Humanloop discord! discord.gg/juHPCQAKKR The OpenAI live demo inspired us, so we used GPT-4 to create a GPT-4 bot! With the help of GPT-4 it only took @jordnb about 20 minutes to code this from scratch!
4
21
31,445
We're excited to host @ycombinator's London meet-up at @localglobevc's office this summer for future founders to network and learn about startups. We'll be telling the story of starting Humanloop. If you're in London, sign up through @startupschool at startupschool.org/.
1
7
23
All OpenAI models are down right now. Follow here for updates status.openai.com/incidents/… (Anthropic, Cohere, AI21 are all running well. An excellent time to toggle on a backup deployment in Humanloop)
1
2
21
6,029
Stay tuned and follow us for Prompt Engineering 201 soon 👀 We'll cover chaining, prompt injection, DSLs, hallucination protections and more!
1
19
3,086
Want to understand how Humanloop can help you build differentiated products with LLMs? Join a live demo with @RazRazcle ↓ Today, 6pm GMT / 10am PT lu.ma/32ntvz7w March 7th, 5pm GMT / 9am PT lu.ma/3pmkuooh March 9th, 6pm GMT / 10am PT lu.ma/atdckv4c
3
2
19
3,744
MCP is rapidly becoming the universal adapter for AI. Since its release in November, developers and teams have raced to adopt the standard, giving agents the tools they need to interface with the real world, from APIs to internal systems. Our latest explainer breaks down MCP: what it is, how it works, and how to get started. 🔗 Read here: humanloop.com/blog/mcp
11
2
13
2,089
1. Just ask With the advent of instruction-tuned models, these models are usable without needing to get clever about it. If in doubt, just ask.
1
17
6,559
Human feedback is critical to aligning these models to do as you want. We've built the specialised tools needed for this crucial component and are delighted to be contributing to this open-source effort.
1
1
15
New homepage just dropped
1
17
2,564
LLMs are the next big computing platform. At least as important as the internet, perhaps more. We want to empower the next million developers to build AI-first applications If you want to build successful products and apps with GPT-3, sign up now. humanloop.com/blog/llm-launc…
17
Like any good @ycombinator company, we @humanloop are always trying to listen to our customers and build what they want. Here are 5 recent feature releases driven directly by customer requests and feedback: 1) Multi-provider chat-focused playground
2
3
15
17,192
GPT-4o is so fast. Here it is compared to Turbo
1
2
16
2,094
Sam Altman (@sama) of OpenAI will be doing a fireside chat chaired by our CSO @davidobarber in London 24th May 2023 1.30-4.15pm Apply here to join! eventbrite.com/e/a-conversat…
1
5
17
4,849
Building an LLM app has evolved significantly since 2022. If you're working on an AI project, you should familiarize yourself with the anatomy of a modern LLM application. Here's a quick overview: • LLM model - the core reasoning engine; an API into @OpenAI, @AnthropicAI, @GoogleAI, or open source alternatives like @MistralAI. • Prompt template - the boilerplate instructions to your model, which are shared between requests. This is generally versioned and managed like code using formats like the .prompt file™️ . • Data sources - to provide the relevant context to the model; often referred to as retrieval augmented generation (RAG). Examples being traditional relational databases, graph databases, and vector databases like @pinecone or @trychroma. • Memory - like a data source, but that builds up a history of previous interactions with the model for re-use. • Tools - provides access to actions like API calls and code execution empowering the model to interact with external systems where appropriate. • Agent control flow - some form of looping logic that allows the model to make multiple generations to solve a task before hitting some stopping criteria. • Guardrails - a check that is run on the output of the model before returning the output to the user. This can be simple logic, for example looking for certain keywords, or another model. Often triggering fallback to human-in-the-loop workflows. These individual components represent a large and unique design space to navigate. The configuration of each one requires careful consideration; it's no longer just strictly prompt engineering.
2
17
2,573
GPT-3 lets you build apps that feel like science fiction. We now have assistants that can help us • write better (@writesonic, @peppertype_ai, @sudowrite) • code better (@GitHubCopilot) • imagine better (#dalle, #midjourney)
2
15
🚨 Announcement 🚨 Just a few hours ago @AnthropicAI dropped Claude 3 - a new suite of models which outperform GPT-4 and Gemini Ultra. We're excited to announce that Claude 3 is now available on Humanloop. Bring your API Key to test, evaluate and deploy humanloop.com
1
2
15
1,481
Big product update: We’ve introduced support for multi-modal models! You can now input images and text to models like GPT-4V in our Editor or API. This presents a new frontier in the range of apps that can now be built with AI. Here’s what GPT-4V looks like in our Editor ⬇️
2
1
15
1,510
2. Few shot learning For extra clarity, provide examples. This is particularly useful if you want an uncommon tone, style or syntax. If accuracy is paramount, stuff the whole context window with examples.
1
13
5,125
Really pleased to have been covered in TechCrunch today as one of the startups leading a new wave in NLP! techcrunch.com/2022/07/28/a-…
1
2
15
Fine tuned models at scale Instead of a 4-5x higher per-token fee, you pay an hourly rate for hosting fine tuned models. This gets economical at scale. And at scale you'll definitely want to be fine tuning. (though wow at the jump for hosting Davinci vs Curie)
5
3
15
5,163
Every ML engineer should consider programmatic labelling. Here's why:
1
1
14
This is likely the first of many advancements we will see as #opensource continues to level up! You can check out the model now: together.ai/blog/llama-2-7b-…
2
15
1,621
Will Larger Context Windows Kill Retrieval Augmented Generation? As model providers continue to extend their context windows, researchers have been looking into whether or not this enhances performance over Retrieval Augmented Generation (RAG).
2
4
15
3,601
...and now introducing Obsidian AI 🪄 Write faster. Think bigger. Augment your creativity. → github.com/humanloop/obsidia…
Introducing Notion AI 🪄 Write faster. Think bigger. Augment your creativity.
1
1
14
Here’s how you connect an LLM to tools to give it extra capabilities. For example, include google search in your context window with simple syntax like {{ google(query) }}. Read more here: humanloop.com/blog/announcin…
1
3
15
2,042
The HELM benchmark from Stanford is a fantastic resource to understand the relative strengths of large language models Here’s the overall ranking on the 16 core tests with the win-rate over the other models.
Announcing Holistic Evaluation of Language Models (HELM) v0.2.0 with updated results on the new @OpenAI, @AI21Labs, and @CohereAI models. HELM now evaluates 34 prominent language models in a standardized way on 42 scenarios x 7 metrics.
1
2
14
3,984
Delighted to be at #AISummit London. Come talk the team about Human in the Loop systems for NLP @humanloop #AISummit #MLOps
1
3
15
Check out CS224n from Stanford (best NLP course we know!) The videos and homeworks are online. web.stanford.edu/class/cs224… The course evolved from OGs in NLP: @chrmanning and @RichardSocher. A good amount of background while focusing on the practical and relevant research today.
1
2
15
Ever wanted to build a custom ChatGPT? Here's a guide to build it with Humanloop, powered by @nextjs, @openai / @AnthropicAI in 15 minutes. docs.humanloop.com/v4.0/docs…
1
2
15
1,879
Our feature with @EO__Global is now live on YouTube 🚀 Watch @jordnb and @RazRazcle talk about their journey to co-founding Humanloop, building an MVP at @ycombinator and their advice for AI startups piped.video/TlUZrm2jlCw?si=xyJv…
2
14
9,081
Higher Rate limit on code 1200 RPM on the codex models compared to just... 20 with OpenAI (although still not the full 3k per minute for davinci-003)
1
1
14
12,797
Subtle update you'll have missed: Finetuned gpt-3.5-turbo is now only 3x of the cost of the base 3.5-turbo. It was 8x before. Finetuning just got even more compelling.
11
4,496
In fact there is a new job title emerging for generalist engineers who have a strong familiarity with LLMs and AI tooling: "The AI Engineer" @swyx Despite the claims that 'AI is coming for the jobs of software engineers first’, so far it has only created a greater need for them.
1
1
11
3,315
GPT-4 usage is free but will have rate limits while in this beta period. Access to Humanloop is available as a free trial. Sign up here → humanloop.com
1
1
11
2,955
3. Plan for your instructions to be taken very literally These models have been trained to predict the most likely sequences of words. Protect around simplistic pattern matching, and be careful what you wish for.
3
10
5,167
AI Tinkerers London! Talks from: @RazRazcle - projects built on Humanloop @mysticdotai - fine-tuning #opensource models @meetcleo - chatting with your bank account and cool demos from other tinkerers. Congrats @LouisKnightWebb and @bloopdotai on a great event!
1
13
965
The Allen Institute for AI (@allen_ai) just released a 3T token dataset to support open training of large language models. This is the largest #opensource dataset released this year, with 50% more tokens than Llama2’s training data. Here’s what you need to know 🧵
3
4
12
1,490
but going from a cool demo (gpt3demo.com) to a fully-fledged production app is still too hard. • hard to evaluate • hard to make reliable (hallucinations, repetition) • hard to get real domain knowledge into these models
1
13
New version of Programmatic dropping today! 🎉 You can now add ground truth labels directly in the UI! 🤯 Programmatic uses these to give you instantaneous feedback on precision and recall. 👉 Check it out: programmatic.humanloop.com
1
13
SO excited to share what we've been working on! We're building the best tool for programmatic labelling. It's a super powerful way to build your datasets for NLP and it's getting great feedback from early users. Sign up for access 👉 programmatic.humanloop.com
1
4
12
Replying to @RuiCarrilho5
May your timeline be blessed with fewer regrettable minutes.
1
10
2,157
Love machine learning? Join a YC-backed startup that bridges the gap between people and machine learning, allowing people to truly understand and control their models. ➡ careers.humanloop.com/ #machinelearning #activelearning #python #PyTorch Retweets appreciated! 😃
9
11
Thanks @nasscom or including Humanloop in their Generative AI Landscape Map! We're building out tooling to support the next generation of applications that will be built on top of large language models. It's great to be recognised here alongside so many amazing companies in AI!
4
12
1,608
Humanloop takes the guesswork out of prompt engineering and model development with • Feedback at scale • Experiment tracking • One click fine-tuning all brought together with a simple SDK.
1
11
Happy Tuesday: Today we're launching support for @Meta's Llama 2 in the Humanloop Playground! You can now go to humanloop.com and start testing out the model right away! No need for an API Key or any other configuration. Here's what this means ⬇️
1
2
12
1,766
Had a great discussion of LLM best practices with the @localglobevc portfolio this morning. Thanks for hosting! Great to share and learn with @bloopdotai, @OrbitalWitness, @SigmaOS and others!
1
1
12
2,103
1. Feedback Eyeballing a few examples isn't enough for production applications. So, Humanloop lets you collect end-user feedback at scale, unlocking actionable insights on how to improve your models. Discover the issues that you are missing!
1
10
Wow. Everything is tokens.
2
3
1,178
We're now @humanloop. Handle secured.
1
11
Want #machinelearning in #Zapier? Humanloop can get to the same level of accuracy as traditional supervised learning with ✨10% ✨ of the labelling requirements. ➡ humanloop.com/zapier
6
10
Watch the interview to learn how to build with the huge potential of LLMs and let us know how you're building with AI. piped.video/watch?v=hQC5O3WT…
1
1
11
2,620
Announcing Tools in Humanloop humanloop.com/blog/announcin…
3
10
1,899
Llama 3 on @groqinc vs GPT-4 "Name every country in the United Nations" The difference in speed is no joke ⬇️
1
10
1,204
One consistent piece of advice that has comes on almost every episode of the High Agency Podcast... • Start building today❗ @sourcegraph is one of the world's most popular AI coding tools. Their CTO and Cofounder @beyang shares his thoughts on this ⬇️
2
1
10
552
GPT-3.5: Chat vs Instruct Chat is biased towards talking whereas Instruct gets straight to the point. Here's an example when writing code ⬇️ @OpenAI
1
4
10
1,270
AI tool of the week: YC Idea Matcher It's an open-source semantic search engine for matching ideas with @ycombinator companies! Just give it an idea and it links you to similar YC backed startups.
3
1
10
2,027
We're excited to be working with @CarperAI, the biggest group in the open-source RLHF space! Their recently released TRLX library makes it easy to adapt LLMs from human feedback. Check it out!
1
9