We decided to make our function-calling benchmark results fully public and interactive!
Our interactive function-calling benchmark results are live! boundaryml.com/blog/sota-fun…
11
4
27
18,102
Check out our product hunt demo!
Replying to @kwindla
@hellovai: "We did something dumb. We made a programming language." @boundaryML is a DSL for working with LLM data output.
4
317
We've been cooking
Announcing structured output support in all languages + tool-calling for every model. Declare a prompt in a BAML file, and our tooling generates an SDK in the language of your choice. 1. Leverage our SOTA results in structured output parsing 2. Support complex schemas, even with enums. 3. Type safety + validation 4. Retries / fallback to another model, etc. 3. Streaming support (coming next week) 4. Feature parity and stability across all languages (since it's built using our Rust runtime). Links below
1
3
401
Announcing our State-of-the-art results on the BFCL benchmark. Turns out pydantic isn't all you need.
Announcing our BFCL benchmark results for OpenAI's structured output, which also tests the *contents* of generated outputs. 1. @OpenAI "strict" function-calling (FC) is slightly worse on the 2024-08-06 model, but better on every prior model. 2. OpenAI handedly beats Anthropic at Function Calling (+15% improvement). 3. Prompt Engineering on Anthropic's Claude 3.5 Sonnet is just as good as OpenAI FC 4. @boundaryML 's BAML is still SOTA at FC on every model. With BAML, even GPT-3.5/Haiku perform at par with GPT4o / Sonnet (-2%) The models are not bad. The way you prompt and JSON.parse is. Here's our breakdown, and how BAML outperforms other structured generation techniques.
4
2,278
Up and to the right
We're converting all of YC from Langchain to BAML. One at a time.
3
6,679
let us know if you have any questions, and thanks for the shoutout @anish_agarwal20 !
2
8
Check out our recent podcast with @bigdata!
🆕💡🎧 Vaibhav Gupta @boundaryML: Key Ways BAML Outperforms Traditional LLM Frameworks 🚀thedataexchange.media/baml/
1
2
278
BAML now supports Gemini ✨, with both Image and Audio inputs. Here is Gemini classifying a sound into bowling / swiming / soccer /tennis category. Demo link promptfiddle.com/-Audio-Demo…
1
2
859
Our framework helps you beat the hardest LLM evals. We get these messages all the time.
A user got his hardest LLM eval working with BAML (our prompting framework) today in a few mins. It didn't pass using Pydantic / Langchain / Instructor you name it. Only BAML. All because it uses 4x less tokens to get structured data.
1
254
Replying to @skull8888888888
Awesome work!
1
27