HELM Lite v1.2.0 is out!
Datasets: NarrativeQA, NaturalQA, OpenbookQA, MMLU, MATH, GSM8K, LegalBench, MedQA, WMT14
Results (we still need to add Claude 3, which requires more prompt finagling):
crfm.stanford.edu/helm/lite/…
Apr 26, 2024 · 5:05 AM UTC
8
38
202

