HELM Lite v1.2.0 is out! Datasets: NarrativeQA, NaturalQA, OpenbookQA, MMLU, MATH, GSM8K, LegalBench, MedQA, WMT14 Results (we still need to add Claude 3, which requires more prompt finagling): crfm.stanford.edu/helm/lite/…

Apr 26, 2024 · 5:05 AM UTC

8
38
202