Building a framework for AI called BAML. (YC W23). @BoundaryML

We are hiring Rust devs to work on building out a programming language (BAML) with crazy new, useful features for making AI pipelines. Our entire team is senior+ engineers. We are a team of 5. We work hard and in person in Seattle. We accept new grads. The interview is not leetcode, but a 1-week work trial. Send me a DM on X or LinkedIn or email aaron @ boundaryml dot com if interested.
2
5
33
5,569
Our team now has 4 absolutely cracked devs one guy chose us over an $800k OpenAI offer another writes programming languages for fun another made FaceID on Pixel phones 100x faster I'm just the personality hire
62
113
4,198
442,576
Replying to @levelsio
do you think they want cafes do be more of a social place vs a coworking place?
21
829
120,416
Replying to @RobynElyse
maybe we can normalize eating lollipops, maybe they can make some healthy ones
7
1
358
82,450
I just quit my Sr Engineer Amazon job after being there for 7 years to start my own thing. As @dvassallo put it, it's time to chart my own path. If I don't do it now I will never do it. I have to try. It's as simple as that.
31
5
230
Claude Sonnet just migrated our Rust repo from Minijinja 1.0 to Minijinja 2.0 in basically one-shot. I just let it run via Cursor's Background Agents feature and it was done after $5 and 10ish mins of work. I was inspired by @mitsuhiko 's posts on Claude code so I gave it a try and I'm kinda mind blown at how easy this was. All I told it to do was to clone the minijinja repo to guide itself. It did that, read the Changelog, and figured out roughly what was needed. It's just that good. PR Below
4
14
194
30,736
Clickhouse seems to be exploding. Literally every analytics company is migrating to it
9
14
154
51,507
This is amazing. Thanks for sharing. We have all been there.
100
23,337
Replying to @testaccountoki
these PRs need a Linus AI bot that looks at these patterns and yells at you
1
80
22,708
Replying to @karpathy
when i was at AWS one of the top restrictions was not deploying worldwide at once. You can't safeguard against bugs but you can reduce how many devices can be impacted at any given moment
2
1
72
4,644
i can do a pretty good impressions tbh
1
1
74
32,736
Replying to @alexalbert__
3 months with 3.5 sonnet has felt like an eternity
72
6,105
also not to forget our ex-intern @anishpalakurT who reworked our language's AST in like 2 weeks, in rust
2
1
67
31,889
Replying to @levelsio
their need to make money now may also outweigh the need for hackers to use their space to make money in the future
2
58
11,063
Replying to @theo @t3dotgg
The amount of satisfaction I get from using `await` inside a rsc is unparalleled
1
1
60
7,169
Introducing BAML -- a language to get structured output from LLMs -- with the first ever VSCode LLM Playground. BAML is fully integrated with Python and Typescript, on Day 1. (1/5). Declare your LLM function signature in BAML plus your instructions, and see a **live** preview of your full prompt in VSCode.
6
18
61
7,634
This Github wrapped for tinygrad made me lol @realGeorgeHotz
1
4
57
8,920
We're converting all of YC from Langchain to BAML. One at a time.
6
4
55
47,217
We built a DevRel AI agent that scans online forums for mentions of our devtool (BAML) and generates AI responses using our documentation. This AI intern works 24/7 while you sleep. You can prompt the agent to improve responses and approve them with 1 click Built for AI Tinkerers hackathon in Seattle by @roeybc @foadgr @hellovai @theRealAdi159 @AITinkerers @foundations @CopilotKit @AnthropicAI @alexalbert__
2
7
52
6,399
2 more YC companies started using us this week (one from W24, another from F24)
3
2
51
4,605
Replying to @yacineMTB
it writes 50 lines of rust with no errors.
7
1
43
3,663
Replying to @theo @t3dotgg
no way there's a million flutter devs
7
39
25,300
fuck, the new gpt-4o is actually so so good at free-form speak. Beautiful prose. It's wild to feel emotional about some text written by a machine. @sama
2
5
37
4,409
Announcing our BFCL benchmark results for OpenAI's structured output, which also tests the *contents* of generated outputs. 1. @OpenAI "strict" function-calling (FC) is slightly worse on the 2024-08-06 model, but better on every prior model. 2. OpenAI handedly beats Anthropic at Function Calling (+15% improvement). 3. Prompt Engineering on Anthropic's Claude 3.5 Sonnet is just as good as OpenAI FC 4. @boundaryML 's BAML is still SOTA at FC on every model. With BAML, even GPT-3.5/Haiku perform at par with GPT4o / Sonnet (-2%) The models are not bad. The way you prompt and JSON.parse is. Here's our breakdown, and how BAML outperforms other structured generation techniques.
1
11
37
14,201
Replying to @RoxanaDaneshjou
The answers are better if you ask it to diagnose a patient (as if you were a doctor asking, not the patient themselves).
1
36
I made a promise to release something in 2016 but I can't possibly meet it at this point. I'll let everyone know when it is ready.
14
3
30
"function calling" is one of the most confusing names AI companies have come up with. It literally is just an extraction task. Models don't actually call your functions. You extract the right func parameters with them (and the name of the function), and call them yourself.
6
1
35
4,951
Replying to @alexdanilowicz
once AGI is here someone will also comment it's too slow :P
1
32
5,496
I used to be a YouTuber, so I'm currently exploring what things I can build for these creators to make their lives easier and help them raise money when they need it. There's a few more ideas/projects I have that I will be tweeting about as I go.
4
34
PSA: If you are doing PDF extraction with LLMs, convert the PDF to an image, and use Anthropic's Sonnet 3.5. It is the best model we have tested so far. Take it from one of our customers: Looking forward to more gpt4o updates though.
7
1
27
2,244
all you need for $1M ARR is 10 absolute banger tweets
2
25
4,585
Our teammate @sxlijin compared 8 different frameworks that help you get structured data from LLMs: boundaryml.com/blog/structur… (5 more in the article, can't fit them all in this image!)
6
25
22,224
langchain devs be like
4
23
2,093
Sam from Supernatural was on my plane. Can confirm he's a good parent.
2
11
16
This is it yall, will post more about our hiring process soon
16
11,973
Introducing our new state-of-the-art Agentic Framework, now available in all languages:
2
21
2,635
Replying to @vikhyatk
You realize all that stress was meaningless
3
20
2,193
We're live on YC finally. We're chugging along early access requests. If you're interested in generative AI search hit me up!
Welcome to YC, @aaronnstuff, @hellovai, and @tryGloo! Gloo helps developers connect LLMs like ChatGPT to their knowledgebases while providing guardrails to protect against hallucinated answers. Learn more at trygloo.com.
5
19
7,297
A startup is the hardest thing I’ve ever done in my career, 💯
2
18
951
Ready to launch this rocket to the stratosphere 🚀 Reddit seems to work better for us than twitter, interestingly enough.
3
19
1,116
Replying to @vikhyatk
Oh shit man, modern indentured servitude
1
19
939
Update: Our newest hire wrote database from scratch, in Rust. He made a video about it (link below) @antoniosarosi
Our team now has 4 absolutely cracked devs one guy chose us over an $800k OpenAI offer another writes programming languages for fun another made FaceID on Pixel phones 100x faster I'm just the personality hire
3
2
15
4,355
Llama 3.2 just dropped. Here's the benchmarks vs gpt-4o-mini It's likely nowhere near the level of Claude Sonnet 3.5 Vision yet (which in our experience outperforms all other models at the moment), but we may finally have a capable model for building offline Agents that can better interact with UIs.
3
2
18
4,008
Replying to @Teknium @teknium
what benchmarks do you test on for function-calling?
3
113
The Sound of Metal
2
16
here's me doing function-calling with Llama 3.2, 1B parameter model using Instructor vs BAML (our prompting framework). 1 billion params. BAML: ✅ Pydantic / Instructor: ❌ BAML works without native tool-calling APIs (which isn't available for this model), and without LLM retries. You can see for yourself in the playground link below. If you're using Pydantic / Zod for structured LLM generations, you're getting worse performance out of the box with smaller models.
3
1
17
1,121
The most successful people are probably the ones that think the journey to their destination is fun instead of the destination itself being the only fun part
1
1
15
finally migrated everything to rust
1
17
1,219
We hit the r/langchain frontpage 🚀 yesterday, with great positive reactions to BAML (even if we had some skeptics). But damn, we gotta hit front page on more subreddits like 100 more times to grow faster. Doing a startup feels like trying to hit 100 homeruns in a row
1
14
650
Replying to @mitsuhiko
It's wild python spent so many years being shit at this until uv came. 20+ years of people attempting to fix the problem
1
15
1,117
BAML IS NOW AT 1k github starss y’all It took 1 year to go from 0 to 500. It took a month-ish to go from 500-1k. More and more devs are switching from Langchain to BAML.
3
2
15
1,225
Replying to @iamgingertrash
I think you may just be too online tbh
1
13
317
Become a good shitposter
13
4,594
BAML Launch Week - Day 1: Announcing the BAML VSCode LLM Playground 2.0 🧪 Tests for structured outputs (works with🐋DeepSeek R1) 🔄 Eval History 👀 More visualizations for evals 🎙️ Improved multi-modal support 🌓 Dark AND light mode It is literally the fastest way to iterate. Plus a spotlight of BAML users: @ProductHunt and @goveagle Full post below 👇
4
2
16
1,084
if the models stopped getting better we'd still have years worth of things to build
2
16
848
It's crazy how quickly the goal posts move as a founder. One month you have 3 users and just want to get your 4th user, the next month you're cryin cause you're not growing 3x every week anymore.
1
16
1,085
Here's how to implement @mattshumer_ 's groundbreaking Reflection technique used in his Reflection-Llama-3.1 model in one minute -- but with GPT4o and BAML. You can use this reflection techhique to generate more accurate structured data. Try it out in the link below. We don't use Tool-calling API's since it's harder for LLMs to reason with constrained generation.
3
3
16
2,484
Replying to @mckaywrigley
why not just do a structured output with a share_your_screen boolean
3
1
16
1,224
That feeling when a Japanese engineer starts his email with "Aaron-san"
1
14
Who the hell is buying my Take a Walk cover? I love you. No joke, that song alone is paying for many of my meals. It's an amazing feeling
2
15
Replying to @Japanesehouse
Your tickets sold out in Seattle :(.
1
13
I watched @tom_doerr go from 1k followers to 23.4k followers in ~5 months just by posting like 20+ times every day about different github repos / papers / and projects with just their title. basically posting the Github trending feed to X users. all you have to do is post
2
13
1,256
New BAML users this week. Keep going
1
13
514
3 months ago we got good feedback once a week. Today we get great feedback multiple times a day. The Discord is poppin. Just keep shipping.
1
14
4,337
Working hard to achieve something great tends to bring a lot of loneliness. You spend time refining your craft instead of going out. You work more after coming home from work, maybe with no potential payoff. You have to be almost delusional for it to work.
1
14
Replying to @justansub
Thanks, and yes having a clear and ambitious vision helps
12
12,001
Replying to @vikhyatk
Holy shit wtf, what made you stay? Worst for me at ec2 was oncall every 3 weeks with like 20 sev2s per week
1
13
1,085
we're gonna need a new function-calling benchmark cause we crushed it
2
13
1,535
This statement is crazy on this study
After a high dose of psilocybin, the brain desynchronizes at a massive scale, causing loss of our sense of self, time, and space. This may drive the burst of plasticity caused by psychedelics. The next day, brain activity has largely returned to normal, but an echo remains – a reset of circuits critical to the sense of self. Our study is out today in @Nature nature.com/articles/s41586-0…

ALT https://www.nature.com/articles/s41586-024-07624-5

1
3
13
2,929
Final day of Launch Week, and we just crossed 2000 ⭐️ on Github 🚀 Today, we’re more than excited to unveil our vision and what you can expect from BAML 1.0 in 2025 — with a strong focus on enterprise scalability and developer experience. And today the spotlight is on our team without whom none of this would be possible: Read more 👇
1
7
12
4,180
about time we got some influencer marketing
1
13
634
The YC development speed boost is real. Ooooof
13
768
It really is crazy how car-dependent American (and many other) cities are, and how millions will never know what it's like to live in a walkable city.
2
10
257 days after quitting my job to work on a startup a customer finally told us they "love Gloo and use it everyday" and "if this didn't exist I'd have to find a replacement". Honestly building a startup is hard as hell. I'm injecting this into my veins and taking the win.
2
12
the number of devs using BAML (our prompting DSL) has 2x every month for 3 months all these devs have already tried Instructor / Langchain / etc... the formula is to chew glass for 1 year, and write Rust
1
12
1,665
Profiling will make it slower as well fyi
1
10
2,639
I remember telling someone i had never traveled much and they said i was missing out. Well yeah.. When your family has money it's easy.
4
9
I created an AI podcast like "Smosh Reads Reddit Stories" in 3mins using NotebookLM. The quality is insanely good. This one is from r/AmITheAsshole. The main problems right now: - they tend to ramble - not funny (just informative) - too much summarization -- it'd be best to just quote the funniest comments. Imagine creating personalized audio podcasts from your top subreddits that you can listen to on a walk to catch up on your communities.
2
12
3,624
How tall does Ariana Grande have to be to become Ariana Venti
1
3
10
Replying to @frantzfries
Do you still have a gazillion slack connects?
1
8
1,421
Protip: If your coworkers fail to lock their computers when they're away from their desk, go to chatgpt and change their system prompt to only respond in assembly.
1
1
10
869
It's been 3 years since I quit my big tech job to work on startups. tl;dr is i dont feel like i'm wasting my life away since I did it, and I've grown a lot as a person in many different dimensions. When you stop optimizing for money amazing things start happening.
1
11
482
We got 100k impressions on reddit by basically helping people out with prompts
1
10
774
Our startup has conjured an insane amount of work for ourselves out of thin air
4
11
574
When i became a senior engineer at Amazon like 3 years ago I felt like my technical breadth was so tiny. Only after quitting and doing startups for the past 2ish years did I start learning again at like 4x the rate. In 2 years i learned 2 more programming languages, built desktop apps, trained ML models, and worked with so many more technologies (like postgresql — we almost never used rdbms at Amazon!). I have a clearer view of how software is built now, instead of just learning how to scale it.
2
10
641
Just a normal day at Boundary
LGTM! 👍 Straight to prod 🚀
1
1
10
2,062
We officially have SAP using BAML whos next
1
1
10
546
We're currently working on running the Berkeley FC benchmark on OpenAI's structured generation. Will post results soon
2
11
660
I'm trying to claw my way back into music.
1
11
Replying to @abacaj
The first thing i found for myself during a customer project is how inconsistent human labelers were compared to LLMs
1
11
2,233
Who cares if a song is [some genre] from [some artist] made with [some software]? If it's good to your ears it's good.
3
10
Announcing structured output support in all languages + tool-calling for every model. Declare a prompt in a BAML file, and our tooling generates an SDK in the language of your choice. 1. Leverage our SOTA results in structured output parsing 2. Support complex schemas, even with enums. 3. Type safety + validation 4. Retries / fallback to another model, etc. 3. Streaming support (coming next week) 4. Feature parity and stability across all languages (since it's built using our Rust runtime). Links below
1
2
10
4,045
Setting huge goals for yourself is both good and bad — it makes it easier to set yourself up for failure, but also gives you ambition and motivation
1
11
I made a "Simple Life" A capella cover by @IAmCaseyAbrams piped.video/watch?feature=pl…, Hope you like it :).
3
8
9
Theres a whole “Clerk for X” startups that can be built now due to React Server Components. Just import smart components that can interact with your backend and also give you client-side user flows.
2
10
2,865
and while you're here, give us a Star 💫 github.com/BoundaryML/baml
3
10
10,259
Don't really understand the habit of smoking cigarretes.
1
10
Doing function-calling / tool-use with DeepSeek R1, using BAML Works flawlessly -- no tool APIs needed.
3
9
788
We are bringing LLM function-calling to all languages as a first-class citizen, not just Python and TypeScript. Building our compiler + runtime in rust has been a game-changer in making this happen.
1
1
10
513
Replying to @klimovx
In one gig you get state of the art results with your cute lil json parser, in another you manage some gpu cluster
9
6,042