New follow-up work on the effects of synthetic data on model pre-training. It’s becoming increasingly clear that the model collapse issues predicted by prior works are not panning out in theory and practice. Industry labs now even have entire synthetic data pre-training teams.
📢New preprint📢 🔄Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄 A deeper dive into the effects of self-generated synthetic data on model-data feedback loops w/ @JoshuaK92829 @ApratimDey2 @MGerstgrasser @rm_rafailov @sanmikoyejo 1/9

Oct 23, 2024 · 6:28 PM UTC

1
13
108