Submit to SEC-bench Leaderboard
Guidelines for contributing your model's results to the SEC-bench leaderboard.
If you are interested in submitting your model to the SEC-bench Leaderboard, please do the following:
- Fork the SEC-bench/experiments repository.
- Clone the repository. Due to this repository's large diff history, consider using
git clone --depth 1if cloning takes too long. - Under the task that you evaluate on (e.g.
evaluation/Patch/), create a new folder with the model name (e.g.swea_o3-mini). -
Within the folder, please include the following files:
report.jsonl: A Report file that summarizes the evaluation resultsmetadata.yaml: Metadata for how result is shown on website. Please include the following fields:- name: The name of your leaderboard entry
- orgIcon (optional): URL/link to an icon representing your organization
- oss:
trueif your system is open-source - site: URL/link to more information about your system
- verified:
false(See below for results verification) - date: Date of submission
trajs/: Reasoning trace reflecting how your system solved the problemlogs/: SEC-bench evaluation artifacts dump
- Create a pull request to the SEC-bench/experiments repository with the new folder.
Results Verification
Submissions marked with the ✓ badge have been verified by the SEC-bench team through artifact reproduction. We run your agent in our controlled environment to confirm the reported results.
Contact
For questions about submissions, evaluation, or the benchmark itself, please contact us at hwiwonl2@illinois.edu or open an issue on GitHub.