Samuel L Smith (@SamuelMLSmith): "We scale Hawk to 7B parameters, and Griffin to 14B. Both models exhibit power law scaling, just like Transformers! Griffin achieves lower held out loss than a strong transformer baseline across all model sizes, while Hawk closes the gap as we scale training FLOPs." | nitter

Samuel L Smith @SamuelMLSmith

1 Mar 2024

We scale Hawk to 7B parameters, and Griffin to 14B. Both models exhibit power law scaling, just like Transformers! Griffin achieves lower held out loss than a strong transformer baseline across all model sizes, while Hawk closes the gap as we scale training FLOPs.

Mar 1, 2024 · 11:02 AM UTC

1

1

5