Submit to SEC-bench

If you are interested in submitting your model to the SEC-bench Leaderboard, please do the following:

Fork the SEC-bench/experiments repository.
Clone the repository. Due to this repository's large diff history, consider using git clone --depth 1 if cloning takes too long.
Under the task that you evaluate on (e.g. evaluation/Patch/), create a new folder with the model name (e.g. swea_o3-mini).
Within the folder, please include the following files:
- report.jsonl: A Report file that summarizes the evaluation results
- metadata.yaml: Metadata for how result is shown on website. Please include the following fields:
  - name: The name of your leaderboard entry
  - orgIcon (optional): URL/link to an icon representing your organization
  - oss: true if your system is open-source
  - site: URL/link to more information about your system
  - verified: false (See below for results verification)
  - date: Date of submission
- trajs/: Reasoning trace reflecting how your system solved the problem
- logs/: SEC-bench evaluation artifacts dump
Create a pull request to the SEC-bench/experiments repository with the new folder.

Results Verification

Submissions marked with the ✓ badge have been verified by the SEC-bench team through artifact reproduction. We run your agent in our controlled environment to confirm the reported results.

Contact

For questions about submissions, evaluation, or the benchmark itself, please contact us at hwiwonl2@illinois.edu or open an issue on GitHub.