15–19 Sept 2024
Leuven, Belgium
Europe/Berlin timezone

Optimising for average reward in a continuing environment: an application to industrial production planning

17 Sept 2024, 11:30
30m
Leuven, Belgium

Leuven, Belgium

Janseniusstraat 1, 3000 Leuven

Speaker

Paul Berhaut (Air Liquide)

Description

Our research addresses the industrial challenge of minimising production costs in an undiscounted, continuing, partially observable setting. We argue that existing state-of-the-art reinforcement learning algorithms are unsuitable for this context. We introduce Clipped Horizon Average Reward (CHAR), a method tailored for undiscounted optimisation. CHAR is an extension applicable to any off-policy reinforcement learning algorithm which exploits known characteristic times of environments to simplify the problem. We apply CHAR to an industrial gas supplier case study and demonstrate its superior performance in the specific studied environment. Finally, we benchmark our results against the standard industry algorithm, presenting the merits and drawbacks of our approach.

Type of presentation Talk
Classification Mainly application
Keywords reinforcement learning, production planning, industrial application

Primary author

Paul Berhaut (Air Liquide)

Co-author

Presentation materials

There are no materials yet.