SWE-bench is a benchmark for evaluating large language models on real world software issues collected from GitHub.
leaderboard