Speaker
Description
Our research addresses the industrial challenge of minimising production costs in an undiscounted, continuing, partially observable setting. We argue that existing state-of-the-art reinforcement learning algorithms are unsuitable for this context. We introduce Clipped Horizon Average Reward (CHAR), a method tailored for undiscounted optimisation. CHAR is an extension applicable to any off-policy reinforcement learning algorithm which exploits known characteristic times of environments to simplify the problem. We apply CHAR to an industrial gas supplier case study and demonstrate its superior performance in the specific studied environment. Finally, we benchmark our results against the standard industry algorithm, presenting the merits and drawbacks of our approach.
Type of presentation | Talk |
---|---|
Classification | Mainly application |
Keywords | reinforcement learning, production planning, industrial application |