PlanRL-Bench: A Multi-Domain Reinforcement Learning Environment for Training LLM-Based Project Planners
Effective project planning requires reasoning about task dependencies, resource constraints, actor availability, and temporal dynamics, capabilities that large language models (LLMs) exhibit in limited and unreliable ways. We introduce PlanRL-Bench, a reinforcement learning environment designed to train and evaluate LLM-based agents on resource-aware planning across four structurally distinct domains: software release management, hardware production scaling, compute program scheduling, and scientific research coordination. Our environment features a unified interface over domain-specific semantics, realistic stochasticity models (log-normal actor variance, Gaussian resource SLAs), dynamic re-planning triggers, and dual mutex contention on both actors and resources. We formalize the planning problem as a semi-Markov decision process with structured action spaces (JSON plans with diff-based revisions) and propose a GRPO-based training procedure for 32B-parameter LLMs. We construct a benchmark of 50 problems (25 medium, 25 large scale) and evaluate against classical solvers (CP-SAT, ILP), heuristic schedulers, and learned baselines. Our results show that LLM planners can match solver performance on tractable instances while scaling to problem sizes where solvers fail, with stronger resource utilization and lower contention than heuristic baselines. Code and benchmarks are available from the Immaculate research releases.
Priyesh Srivastava Immaculate priyesh@onfinance.in · Apr 20, 2026