RL Optimization PPO Algorithm

Multi-constraint reinforcement learning in complex robot environments

FPMCO decomposes multi-constraint RL into KL-projection sub-problems, achieving higher reward with lower computing than second-order rivals on the new SCIG robotics benchmark.

WinBuzzer

DeepSeek Reveals R1 Model Architecture Secrets Ahead of V4 Model Launch

DeepSeek has expanded its R1 whitepaper by 60 pages to disclose training secrets, clearing the path for a rumored V4 coding ...

GitHub

rajeev8008/sumo-traffic-rl-project

This project implements an intelligent traffic signal controller using Proximal Policy Optimization (PPO), a state-of-the-art deep reinforcement learning algorithm. The system intelligently manages ...

IEEE

A Custom Reinforcement Learning Environment for Hybrid Renewable Energy Systems: Design and Implementation

Abstract: We present HybridEnergyEnv, an open-source, Gym-style simulation environment designed for reinforcement learning (RL) research in hybrid renewable energy systems (HRES) combining wind, solar ...

techxplore

Reinforcement learning accelerates model-free training of optical AI systems

Optical computing has emerged as a powerful approach for high-speed and energy-efficient information processing. Diffractive optical networks, in particular, enable large-scale parallel computation ...

IEEE

PPO-EPO: Energy and Performance Optimization for O-RAN Using Reinforcement Learning

Energy consumption in mobile communication networks has become a significant challenge due to its direct impact on Capital Expenditure (CAPEX) and Operational Expenditure (OPEX). The introduction of ...

GitHub

MolAct: An Agentic RL Framework for Molecular Editing and Property Optimization(Work in progress)

MolAct is an Agentic RL framework that trains LLMs to design molecules through a multi-turn "Think-Tool-Observation" loop. By leveraging GRPO and a two-stage training paradigm—mastering basic editing ...

winbuzzer.com

OpenAI: GPT-5 Thinking Models Are The Most “Monitarable” Models To Date

OpenAI has validated Chain-of-Thought (CoT) monitoring as a primary safety mechanism for its frontier models, explicitly naming “GPT-5 Thinking” as its most monitorable system to date. The research ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results