Evidence is mounting that increased inference time budgets are a capabilities shift. The question is how to really use them. Longer term I believe large scale RL will allow us to discover better optimization strategies.
Salesforce releases DEI, an open AI software engineering agents org with a 55% resolve rate on SWE-Bench Lite
Discussion: huggingface.co/papers/2408.0…
We propose DEI (Diversity Empowered Intelligence), a framework that leverages SWE agents' unique expertise. DEI functions as a meta-module atop existing SWE agent frameworks, managing agent collectives for enhanced problem-solving. Experimental results show that a DEI-guided committee of agents is able to surpass the best individual agent's performance by a large margin. For instance, a group of open-source SWE agents, with a maximum individual resolve rate of 27.3% on SWE-Bench Lite, can achieve a 34.3% resolve rate with DEI, making a 25% improvement and beating most closed-source solutions.
Aug 14, 2024 · 5:57 PM UTC
5
34

