FPMCO decomposes multi-constraint RL into KL-projection sub-problems, achieving higher reward with lower computing than second-order rivals on the new SCIG robotics benchmark.
DeepSeek has expanded its R1 whitepaper by 60 pages to disclose training secrets, clearing the path for a rumored V4 coding ...
This project implements an intelligent traffic signal controller using Proximal Policy Optimization (PPO), a state-of-the-art deep reinforcement learning algorithm. The system intelligently manages ...
Abstract: We present HybridEnergyEnv, an open-source, Gym-style simulation environment designed for reinforcement learning (RL) research in hybrid renewable energy systems (HRES) combining wind, solar ...
Optical computing has emerged as a powerful approach for high-speed and energy-efficient information processing. Diffractive optical networks, in particular, enable large-scale parallel computation ...
Energy consumption in mobile communication networks has become a significant challenge due to its direct impact on Capital Expenditure (CAPEX) and Operational Expenditure (OPEX). The introduction of ...
MolAct is an Agentic RL framework that trains LLMs to design molecules through a multi-turn "Think-Tool-Observation" loop. By leveraging GRPO and a two-stage training paradigm—mastering basic editing ...
OpenAI has validated Chain-of-Thought (CoT) monitoring as a primary safety mechanism for its frontier models, explicitly naming “GPT-5 Thinking” as its most monitorable system to date. The research ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results