Submit to SEC-bench Pro
Submission workflow for the SEC-bench Pro leaderboard.
source_files run.
If you want your system added to the SEC-bench Pro leaderboard, please prepare the following:
- Run the official SEC-bench Pro harness against the target project in
source_filesmode. - Keep the exact harness config you used, including the model identifier, reasoning settings, and timeout.
-
Share the following artifacts with the SEC-bench team:
summary.csvor equivalent checker summary covering the evaluated target instancesconfig.tomlfor the exact run configurationlogs/or an artifact directory with reproducible run outputs- Optional metadata such as project URL, organization icon, and whether the system is open-source
- Open an issue or send the bundle to hwiwonl2@illinois.edu so the official score import can be verified and published.
Contact
For questions about submissions, evaluation, or the benchmark itself, please contact us at hwiwonl2@illinois.edu or open an issue on GitHub.
Submit to SEC-bench
Guidelines for contributing your model's results to the original SEC-bench leaderboard.
If you are interested in submitting your model to the SEC-bench leaderboard, please do the following:
- Fork the SEC-bench/experiments repository.
- Clone the repository. Due to this repository's large diff history, consider using
git clone --depth 1if cloning takes too long. - Under the task that you evaluate on, such as
evaluation/Patch/, create a new folder with the model name, such asswea_o3-mini. -
Within the folder, include the following files:
report.jsonl: a report file that summarizes the evaluation results-
metadata.yaml: metadata for how the result is shown on the website, including:- name: the name of your leaderboard entry
- orgIcon (optional): URL or link to an icon representing your organization
- oss:
trueif your system is open-source - site: URL or link to more information about your system
- verified:
false; see results verification below - date: date of submission
trajs/: reasoning traces reflecting how your system solved the problemslogs/: SEC-bench evaluation artifact dump
- Create a pull request to the SEC-bench/experiments repository with the new folder.
Results Verification
Submissions marked with the Verified badge have been verified by the SEC-bench team through artifact reproduction. We run your agent in our controlled environment to confirm the reported results.
Contact
For questions about submissions, evaluation, or the benchmark itself, please contact us at hwiwonl2@illinois.edu or open an issue on GitHub.