DeepCoder: Comprehensive Programming Dataset
- 24K+ verified problems with 5+ test cases each
- Sources include TACO, PrimeIntellect, LiveCodeBench, Codeforces
- Temporal split: train on pre-Aug 2024, test on newer problems
- Powers SOTA open-source 14B DeepCoder model