Frontier AI Data Lab advancing AI through better data

Redwood City, California
#SnorkelAI was recognized on @Forbes as a top game-changing tech with our programmatic data labeling approach in #SnorkelFlowforbes.com/sites/forbestechc…
14
44
408
[1/5] **Spoiler alert** We trained a model with the same accuracy as GPT-3 (fine-tuned) that was 1400x smaller with 0.1% of the inference cost. How? With Data-centric Foundation Model (FM) Development in Snorkel Flow. Highlights in the thread 👇: snorkel.ai/better-not-bigger…
3
45
336
In case you missed it, our #MachineLearning Engineer, @aarti_bagul, spoke with @jaygshah22 on building platforms for AI applications. Check out the exciting conversation ↓ snkl.ai/j0h
3
41
296
Very excited to announce Snorkel v0.9, the biggest update to our open source framework for programmatically labeling, transforming & structuring training datasets for #ML. We add new core ops, algs, tutorials, and a full redesign of the core lib snorkel.org/hello-world-v-0-… #snorkelML
4
86
276
This week we brought the #AI community together to share transformative ideas, practical applications, and new research on #DataCentricAI. If you weren't able to join us or want to view the insightful talks again, check out ↓ snkl.ai/dcai21
13
19
241
How to Use Snorkel to Build AI Applications: The why, what, and how of Snorkel’s programmatic data labeling approach and the state-of-the-art #SnorkelFlow platform by our Head of Technology and Co-founder, @bradenjhancocksnkl.ai/hts
6
23
218
A big part of the ML workflow is in debugging. However, debugging for ML is hard! In this post, @chipro analyzes major sources of errors & their solutions at the four steps: * labeling * feature engineering * model training * model evaluation snorkel.ai/debugging-ai-appl…
4
44
216
We're excited to announce Snorkel Flow, a new data-first ML development platform based on the core ideas of Snorkel! After years of research, deployments, and user conversations, we saw that Snorkel was just the first step- read about our path forward here bit.ly/32kH4DU
6
49
216
We are starting a new vodcast called Snorkel #ScienceTalks, exploring some of the best ideas to make AI practical. In the 1st episode, @bradenjhancock talks to @Thom_Wolf about @HuggingFace Datasets & Transformers, and taking ML research into production. snorkel.ai/the-scientist-beh…
2
28
183
Most organizations are in early phases of machine learning adoption, and there are many misperceptions of ML production. @chipro explained the 6 common myths in her recent talk at Stanford MLSys Seminar. What other myths have you encountered? snorkel.ai/machine-learning-…
25
159
In case you missed our #MLWhiteboard, where @krandiash reviewed two exciting papers on defining and building malleable #machinelearning systems, check it out ↓ snkl.ai/bms
2
9
144
[1/5] Today, we’re excited to introduce Data-centric Foundation Model Development, a new paradigm for enterprises to use foundation models to solve complex, real-world problems. snorkel.ai/data-centric-foun…
3
29
147
Interested in becoming the newest Snorkeler? We have open roles across engineering, sales, marketing, and more to accelerate #DataCentricAI for the enterprise. Come join one of the most talented, passionate, and supportive teams in tech! ↓ snorkel.ai/careers
3
10
138
Design principles for iteratively building #AI applications with Snorkel Flow's Application Studio, by Founding Engineer and #MachineLearning Engineering Lead, @vincentsunnchensnkl.ai/iai
4
25
132
Announcing our $35M series B funding by @lightspeedvp, @greylockvc, @googleventures, In-Q-Tel, @blackrock, and @waldenvc (You might want to turn your volume on for the video 🎶)
5
24
108
In case you missed our #MLWhiteboard, where @faitpoms talked about "Forager: Rapid Data Exploration for Rapid Model Development." Check it out ↓ snkl.ai/rvl
3
4
100
In the latest episode of Snorkel #ScienceTalks, @seb_ruder, @DeepMind researcher and @bradenjhancock discuss - NLP repositories of datasets and models - New benchmarks (GLUE, SuperGLUE,...) - NLP for low-resource languages - Emerging trends in NLP Enjoy! snorkel.ai/measuring-nlp-pro…
27
103
The next episode of Snorkel #ScienceTalks is out. Tune in to learn more about applying #weaksupervision research with our Co-founder, @paroma_varma. Check it out ↓ snkl.ai/ya8
1
9
100
Team #Snorkel is made up of some of the most talented people in ML. Meet @aarti_bagul, a machine learning engineer who loves working at the intersection of state-of-the-art research and product management. Learn more about Aarti → snkl.ai/mab
2
10
90
The next episode of Snorkel #ScienceTalks is out. Tune in to learn more about @spacy_io, industrial-strength NLP, & the importance of bringing together different stakeholders in the ML dev process, from our chat with @_inesmontani, founder of @explosion_ai snorkel.ai/building-industri…
2
13
90
In case you missed our #MLWhiteboard, where @HiromuHota and @realDanFu reviewed: "Multi-Resolution Weak Supervision for Sequential Data," presented at NeurIPS 2019, check it out ↓ snkl.ai/da2
1
6
85
Team Snorkel is growing fast with some of the brightest minds in AI. Meet Hiromu Hota, a #MachineLearningEngineer who enjoys brainstorming, designing, and implementing solutions with fellow Snorkelers to make AI practical. bit.ly/3oYQyxW
2
6
84
The next episode of Snorkel #ScienceTalks is out. Tune in to learn more about #weaksupervision in #biomedicine with @jasonafries, a research scientist @Stanford and #SnorkelResearch working in #machinelearning and #healthcaresnkl.ai/wsb
2
9
78
.@4shub from our UX Engineering team dives into some frontend best practices for working with lots of data in this blog post on web virtualization to optimize data-intensive app performance. Check it out ↓ snkl.ai/wvo #frontend #react
2
6
72
Two Snorkel papers at @NeurIPSconf this year! (1) *slicing functions* for monitoring and modeling critical data subsets (snorkel.org/use-cases/03-spa…); (2) handling multi-resolution weak supervision for sequential data @vincentsunnchen @paroma_varma @HazyResearch. Blog posts soon!
21
80
In the latest episode of Snorkel #ScienceTalks, @GreylockVC's Partner @SaamMotamedi and our VP of Marketing, @DevangSachdev, discuss how data scientists and machine learning engineers can get started with their startup journey. Check it out ↓ snkl.ai/y8f
1
9
68
In the latest episode of Snorkel #ScienceTalks, Snorkel's @bradenjhancock chats with @abigail_e_see on #AI's facts and myths, the challenges of natural language generation (NLG), and the path to large-scale NLG deployment. Check it out ↓ snkl.ai/nlg
1
9
70
Team #Snorkel is a cross-functional, growing team. Meet Priyal Aggarwal @priyal_aggarwal, a #MachineLearningEngineer who enjoys building the next generation of AI applications and making memes about software bugs. Learn more about Priyal → buff.ly/3gader8
1
4
72
In this paper at #ICLR2022), Chris Ré & team at @Stanford outline a new principled evaluation framework for comparing slice detection methods, & introduce a new technique motivated by their discoveries that outperforms existing methods by double digits ↓ snkl.ai/iclr2022-2
3
18
71
We need to move beyond manual #datalabeling for AI to live up to the hype. Make training data creation and management part of the development process with Snorkel Flow, the first #datacentricAI platform powered by a programmatic approach ↓ snkl.ai/pti
2
5
65
Check out this exciting talk by #MachineLearning Engineer, @priyal_aggarwal, and Founding Engineer and MLE Lead, @vincentsunnchen, discussing how modern #AI application development is transforming #MLOpsWorld2021snkl.ai/9rj
9
62
Dig into why #DataCentricAI is the future of #AIdevelopment, check out this post by @ajratner, Snorkel AI CEO and Co-founder ↓ snkl.ai/accelerating-data-ce…
1
13
67
(1/3) Congratulations to @_albertgu, @krandiash, and our co-founder Chris Ré @HazyResearch, for their paper at #ICLR2022, winning an honorable mention for Outstanding Paper! → blog.iclr.cc/2022/04/20/anno…
1
7
46
Thank you @IgorBosilkovski @Forbes for the great overview of what we're building at Snorkel AI, tackling the training data problem with a new data-first ML platform, Snorkel Flow. It was great chatting! bit.ly/2WgXLw4
1
11
47
Meet Ryan Smith, an ML Research Engineer at Snorkel AI. He is passionate about discovering new ways to solve NLP problems, football, softball, and reading sci-fi and fantasy literature. Learn more about Ryan → snkl.ai/zf4
2
4
43
Team Snorkel is growing fast! Meet @robiriondo, our head of content, who is in charge of spreading the word from all ends about Snorkel AI. Loves family time, movies, reading, and playing world of warcraft. Learn more about Roberto → snkl.ai/mri
3
43
Chris Ré, @SnorkelAI co-founder, and @Stanford associate professor, talks about Snorkel's research-focused journey to #DataCentricAI, from current bottlenecks in ML to tackling these with SotA programmatic labeling and weak supervision approaches ↓ snkl.ai/fv0
1
4
40
We just dropped a benchmark dataset on Hugging Face to test AI agents on real-world insurance underwriting tasks—built with CPCU experts. Most models still struggle. Here’s how to evaluate them right: 🧠 Dataset: huggingface.co/datasets/snor…
2
7
43
14,086
Hundreds of data scientists and machine learning engineers joined us last week to hear about enterprise use of #LLMs and #GenAI from @CohereAI, @Google, @HuggingFace, @McKinsey, @SambanovaAI, @SnorkelAI, @StabilityAI, and @StanfordAILab. Watch the recap: snorkel.ai/fm-summit-shows-f…
6
39
6,607
Congrats to @paroma_varma, Co-founder and Head of Solutions at Snorkel, for being recognized on @GVteam's Impact List, highlighting 25 exceptional women for #IWD2021gv.com/impact. We are proud and privileged to work alongside you. #GVImpact
8
41
Last week @SnorkelAI was featured on @Wing_VC's #EntepriseTech30 list and on @Nasdaq. Read more about @ajratner's insights into how the shift to data-first software development shapes the enterprise AI roadmap in an interview with Nasdaq. nasdaq.com/articles/the-data…
7
39
Team Snorkel is one of a kind! Meet Aubrea Stone, a talented executive operations manager, passionate about improving efficiency and structure for the executive team, loves family time and the Arizona sunshine. Learn more about Aubrea → snkl.ai/51o
1
34
The future of data-centric AI conference is back Aug 2-4! Join the conversation with experts from @Apple, @Bloomberg, @CapitalOne, @Google, @Harvard, @MIT, @Meta, @Mckinsey, @Nvidia, @Pinterest, @Stanford, @StateFarm, @Wellsfargo, & more. Register today 👉 x.snkl.ai/future
12
39
We’re at #NeurIPS2019! Come say 👋 at the poster session on Thu Dec 12 (10:45am - 12:45pm) to chat about *slicing functions* (poster #67) and weak supervision over *sequential data* (poster #110)!
Two Snorkel papers at @NeurIPSconf this year! (1) *slicing functions* for monitoring and modeling critical data subsets (snorkel.org/use-cases/03-spa…); (2) handling multi-resolution weak supervision for sequential data @vincentsunnchen @paroma_varma @HazyResearch. Blog posts soon!
9
34
If you want to learn about: - weak supervision - programmatic labeling for creating massive training datasets Check out @paroma_varma's webinar this morning with @aicampai. Thank you for having us! learn.xnextcon.com/event/eve…
9
37
(1/5) The Future of #DataCentricAI has come to an end. We are incredibly thankful to our speakers, panelists, and moderator, including @AndrewYNg, Anima Anandkumar, @aarti_bagul, @ajratner, Ce Zhang, @chelseabfinn, Chris Ré
6
3
38
Tune in to @HazyResearch and @StanfordAILab hosted MLSys Seminar Series on Nov 5th to hear from @ajratner on real-world challenges faced by enterprises when deploying ML systems and how to solve them using programmatic training data creation: mlsys.stanford.edu/
1
5
34
The wait is over. Join us for The Future of Data-Centric AI on Aug 3-4 at 8:30 AM Pacific Time → future.snorkel.ai
2
3
27
The second episode of Snorkel #ScienceTalks will be available on March 10th. In this episode, @seb_ruder from @DeepMind discusses advances in natural language processing.
6
36
An exciting chat between @ajratner and @simran_s_arora about new research from @HazyResearch on how prompting methods enable a 6B parameter model to outperform the 175B parameter GPT-3. Join us on Jan 17, where Simran will dive deeper into her research: snorkel.ai/event/foundation-…
1
7
31
4,467
Meet @charli3_w, a talented full-stack #softwareengineer who is helping us build Snorkel Flow across the stack. Loves gaming, yoga, reading, swimming, and playing the piano. Learn more about Charlie ↓ snkl.ai/f41
1
3
31
Team Snorkel has some of the brightest minds in #softwareengineering. Meet David Hao, a platform engineer that enjoys solving infrastructure and reliability engineering problems, loves traveling, hiking, and playing indie games. Learn more about David → snkl.ai/mdh
3
28
In case you missed our #MLwhiteboard, where @bradenjhancock talked about his #NLP research paper: "Training Classifiers with Natural Language Explanations," presented at ACM 2018, check it out ↓ snkl.ai/tcn
1
2
29
📢 @OpenAI just released their guide to model selection, and it’s music to our data-centric AI hearts! 🎶 💡"By switching from GPT-4o to GPT-4o-mini with fine-tuning, we achieved equivalent performance for less than 2% of the cost using only 1,000 labeled examples." Read more on how fine-tuning and a data-centric approach can boost performance while cutting costs: platform.openai.com/docs/gui…
2
32
3,901
We are starting a new thing: ML Whiteboard - an informal session where data scientists, ML engineers, and developers along with Snorkel AI team members join to discuss the latest research and new techniques for machine learning, deep learning, NLP, and more.
2
5
31
AI is having its Linux moment 💥 Models are open-sourced like never before. We are excited to have @huggingface as a partner at future.snorkel.ai on Jun 7-8. Join us to learn how to build predictive & #genAI apps using the latest open-source models, datasets, & tools.
1
9
31
10,307
Associate Professor at @StanfordAILab and Co-founder at @Snorkel AI, Chris Ré, will be giving the Keynote talk on #DataCentricAI at @NeurIPSConf DCAI 2021 workshop on December 14 at 1:20 PM PT. Learn more and register ↓ snkl.ai/mj1
6
29
We used weak supervision to programmatically curate instruction tuning data for open-source LLMs like Llama 2 and RedPajama, enabling more granular error analysis and higher quality—without an army of manual annotators. Links to data and models on the blog!snkl.ai/prx
1
5
30
8,276
During #IntelON, Intel Labs Senior Fellow @DubeyPradeepK, our Co-Founder & CEO, @ajratner, @teoliphant, @KaewGB, & Venkatram Vishwanath, discussed how technology, software, & innovation help steer the path forward in #AI. Check out the on-demand session → snkl.ai/r1p
1
8
29
The next episode of Snorkel #ScienceTalks will be available tomorrow. In this episode, @_inesmontani, founder of @explosion_ai and @spacy_io, talks about building industrial-strength NLP. Tune in!
4
30
Team #Snorkel is made of some of the most talented people in #softwareengineering. Meet Sakshi Gupta, a backend software engineer that works on the ML foundations team, loves reading science fiction, improv, and snorkeling! Learn more about Sakshi → snkl.ai/ttr
2
22
Meet Molly Friederich, head of solutions marketing at Snorkel AI. She is a product marketing champion. Before Snorkel, she spent six years at @SendGrid, followed by @Twilio. Learn why Molly joined the Snorkel team ↓ snkl.ai/p78
29
We had a fantastic time meeting you all! In case you missed us, check out the write-ups about the work we presented: * Slice-based Learning: papers.nips.cc/paper/9137-sl… * WS for Sequential Data: papers.nips.cc/paper/8313-mu…
4
29
We are thrilled to sponsor the 2021 @NeurIPSConf workshop on #databases and #AI along with our friends at @RelationalAI. Join us on December 13 → snkl.ai/dbai
6
27
Introducing Application Studio, the fastest way to build AI applications without hand-labeled training data snorkel.ai/introducing-appli…
8
29
Trick-or-training data! Happy Halloween!
2
26
We’re thrilled to partner with Together AI to enable any enterprise to build proprietary LLMs on their data tailored to their specific needs. Learn how both data-centric and model-centric operations are needed to build GPT-You for your business. snkl.ai/tgth
7
28
4,342
We are excited to attend @NeurIPSConf #NeurIPS2022 on 11/28 in New Orleans where the @SnorkelAI Research team and our academic partners will present five peer-reviewed papers on the latest data-centric AI approaches listed in the 🧵 [1/7]
2
6
24
Another Snorkel-driven @NatureComms paper published this week, led by Stanford Researcher and Snorkel Research member @jasonafries: weakly supervised #NER, combining large medical ontologies for state-of-the-art performance and rapid COVID-19 research! rdcu.be/chTpg
9
26
Drawing from her experiences at #Netflix, #NVIDIA, and @SnorkelML, @chipro will talk about bridging the gap between research and production for ML at @Ai4Conferences. The talk is online and Ai4 has free passes. Join us today at 2.50 PM EST!
4
26
(1/2) The doors to our new HQ in Redwood City have been opened! We were thrilled to see our Snorkelers, most for the very first time in person, as our team has grown many times in the past 24 months.
2
2
26
At 11:15 AM Pacific, @ananyaku, ML Researcher @StanfordAILab will walk us through a tutorial on FMs and fine-tuning by selectively tuning parts of the model to preserve pretrained information & deliver better out-of-distribution performance. Sign up here: bit.ly/fm_summit_23
1
6
24
5,588
In case you missed our #MLWhiteboard, where Ryan Smith reviewed "Prompting Methods with Language Models," check it out ↓ snkl.ai/plm
1
1
25
Roshni Malani, from Snorkel AI's engineering leadership, discusses how collaboration is one of the fundamental pillars of data-centric AI in this blog post on building AI applications using #datacentricAI. Learn more ↓ snkl.ai/bai
1
1
23
#TBT to 2016 when @SnorkelAI team and researchers at @StanfordAILab introduced Data Programming - a new paradigm in which users express weak supervision strategies or domain heuristics as labeling functions at @NeurIPSConf arxiv.org/pdf/1605.07723.pdf
1
3
24
#SnorkelAI will be at #MLOps2021! Including @vincentsunnchen, a founding engineer and MLE lead, and @priyal_aggarwal, ML engineer, will discuss "Iterative Development Workflows for Building AI Applications." Join us on June 17 at 11:45 AM PT ↓ snkl.ai/acj
5
26
Meet Shenell Glover, our federal business development and capture manager, who helps us identify and establish federal relationships. Loves helping organizations scale financially, cooking, and being a tech enthusiast. Learn more about Shenell → snkl.ai/w13
1
23
(1/2) Large language models embed a lot of useful knowledge in their pre-trained weights, but they are typically insufficient solutions on their own, either due to knowledge gaps or the inability to transfer what they know. But there’s another way → snkl.ai/few-shot-learning
1
3
21
We are honored to be named one of America's Best Startup Employers by @Forbes 😊. This award recognizes startup companies that carry out groundbreaking work, invest in employees, and demonstrate strong growth. Read more 👇
1
4
24
9,828
It is nearly impossible to get perfect data in this imperfect world. Labels can be inexact/indirect, incomplete/limited, inaccurate/noisy, multimodal, sparse, sequential, or flawed in other ways. @AnimaAnandkumar discusses overcoming data imperfections → snkl.ai/aad
4
24
Meet Tim Sedwitz, head of revenue operations at Snorkel AI. He has deep strategy experience in enterprise software and is passionate about working out, hiking, and golfing. Learn more about Tim → snkl.ai/pbu
23
Check out @vincentsunnchen presenting Snorkel-based weak supervision methods for scene graph prediction at #iccv19, today at the SGRL Workshop and tomorrow afternoon at the conference (Poster 1.2, #130)!
Structured prediction requires large training sets, but crowdsourcing is ineffective— so, existing models ignore visual relationships without sufficient labels. Our method uses 10 relationship labels to generate training data for any scene graph model! arxiv.org/abs/1904.11622
6
25
At Snorkel AI, we believe in data, diversity, and democracy. Our team is excited to #vote and we want you to vote too. Retweet to receive a custom designed t-shirt featuring our very own Dr. Bubbles. #EveryVoteCounts
9
16
24
At 11:45 AM Pacific, @simran_s_arora, ML researcher @HazyResearch, will dive deeper into how a 6B parameter open-sourced model performed better than 175B parameter GPT-3 on 15 tasks. Check out other talks and sign up here: bit.ly/fm_summit_23
6
23
5,762
🌟 Thrilled to announce: "MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records" just won the "Best Findings Paper in Generative AI for Health" at the ML4H Symposium! 🏥💡 Authored by Scott Fleming, Alejandro Lozano, Snorkel AI Researcher - @jasonafries, and a stellar team, this groundbreaking paper introduces MedAlign. It's a vital dataset that challenges LLMs with realistic healthcare text generation tasks, addressing the complex needs of clinicians. Their findings highlight significant error rates in current LLMs and pave the way for more accurate, clinician-focused AI tools in healthcare. 👉 lnkd.in/gEm9Wbwd
5
19
2,676
We’re excited to welcome Devang Sachdev as VP of Marketing to help us bridge the gap between machine learning developers and business leaders looking to make AI their edge. snorkel.ai/07-28-2020-devang…
1
24
Finance has so much potential use for AI. @ManasJoglekar shared challenges & solutions for AI finance, drawn from his experience working with the US's top banks: * Unstructured, multimodal, long-tail data * Regulation * Changing business objectives snorkel.ai/how-to-overcome-p…
2
23
AI has so much potential to improve healthcare. However, there are still many practical and ethical challenges to be overcome for AI to deliver value. In this post, one of our engineers, @bclyang, analyzes these challenges and possible solutions. snorkel.ai/ai-challenges-in-…
4
22
The Foundation Model Summit is just 1 day away. Join us on Jan 17, 9AM PT to hear from @ajratner, @JayAlammar, @AliArsanjani, @apsdehalt, @simran_s_arora, @ananyaku, @bradenjhancock, @MysteryGuitarM, Jimmy Lin, Carlo Giovine and David Harvey. bit.ly/fm_summit_23
4
25
6,929
Check out @paroma_varma talking about Snuba, a system for automating generation of labeling functions for Snorkel, at #VLDB2019 today! Chat with her at the conference (weds. poster 1.3) if interested in Snuba, snorkel.org, or weak supervision more broadly!
5
23
Now you can try out the new 7B model that put Snorkel AI SOTA on AlpacaEval 2.0 on the @togethercompute playground! snkl.ai/tai (login required)
1
4
22
6,186
(1/5) In this post, @MayeeChen discusses Liger: a simple method that provides theoretical and empirical improvements over standard weak supervision methods and empirically outperforms KNN and adapter baselines on FM embeddings ↓ snkl.ai/cws
2
6
20
Snorkel Flow, the world’s first data-centric AI platform, is now GA ↓ snkl.ai/snorkel-ai-general-a…
3
19
Meet Victoria Lo, a frontend software engineer at Snorkel AI. She is a talented and versatile engineer, passionate about CSS, memes, and trying new things. Learn more about Victoria → snkl.ai/2ad
1
1
21
Honored to sponsor this effort to: - bring more researchers from underrepresented groups to #ICLR2021 - advance theory, methods, and tools for weak supervision!
🌟Diversity Funding🌟 To increase diversity, #WeaSuL2021 will offer subsidies to researchers from underrepresented groups to facilitate their participation in the workshop and in #ICLR2021. Please fill out this form to apply: forms.gle/URdYrvVdZBtTPQ5J9 Also, please retweet!
1
2
21