news | Tianyang Liu

2025.12.11	🍫 Introducing CocoaBench, an evaluation framework for evaluating general agents’ compositional cognitive abilities.
2025.10.15	🏟️ Check out BigCodeArena, a human-in-the-loop platform for evaluating code through execution.
2025.09.20	🧙🏻 Guru, our exploration of cross-domain RL for LLM reasoning, is accepted to NeurIPS 2025!
2025.06.20	🧙🏻 Check out Guru: how cross-domain RL supercharges LLM reasoning.
2024.10.10	🤖 We pre-release Decentralized Arena for automated, scalable, and transparent LLM evaluation.
2024.09.20	🎉 DRPO is accepted to the main conference of EMNLP 2024!
2024.07.10	🎉 LLM Reasoners is accepted to COLM 2024!
2024.02.28	💫 We release StarCoder 2, a family of open LLMs for code.
2024.01.16	🎉 RepoBench gets accepted to ICLR 2024!
2023.11.18	🥳 ToolkenGPT receives best paper award at SoCal NLP 2023!
2023.09.22	🎉 ToolkenGPT gets accepted to NeurIPS 2023 as an oral presentation!