| 2025.12.11 | π« Introducing CocoaBench, an evaluation framework for evaluating general agentsβ compositional cognitive abilities. |
| 2025.10.15 | ποΈ Check out BigCodeArena, a human-in-the-loop platform for evaluating code through execution. |
| 2025.09.20 | π§π» Guru, our exploration of cross-domain RL for LLM reasoning, is accepted to NeurIPS 2025! |
| 2025.06.20 | π§π» Check out Guru: how cross-domain RL supercharges LLM reasoning. |
| 2024.10.10 | π€ We pre-release Decentralized Arena for automated, scalable, and transparent LLM evaluation. |
| 2024.09.20 | π DRPO is accepted to the main conference of EMNLP 2024! |
| 2024.07.10 | π LLM Reasoners is accepted to COLM 2024! |
| 2024.02.28 | π« We release StarCoder 2, a family of open LLMs for code. |
| 2024.01.16 | π RepoBench gets accepted to ICLR 2024! |
| 2023.11.18 | π₯³ ToolkenGPT receives best paper award at SoCal NLP 2023! |
| 2023.09.22 | π ToolkenGPT gets accepted to NeurIPS 2023 as an oral presentation! |