A real-time path planning algorithm based on the Markov decision process (MDP) is proposed in this paper. This algorithm can be used in dynamic environments to guide the wheeled mobile robot to the goal. Two phases (the utility update phase and the policy update phase) constitute the path planning of the entire system. In the utility update phase, the utility value is updated based on information from the observable environment. Obstacles and walls reduce the utility value, pushing agents away from these impassable areas. The utility value of the goal is constant and is always only the largest. In the policy update, a series of policies can be obtained by the strategy of maximizing its long-term total reward, and the series will eventually form a path towards the goal, regardless of where the agent is located. The simulations and experiments show that it takes longer to find the first path in the beginning due to the large changes of utility value, but once the path is planned, it requires a small amount of time cost to respond to the environmental changes. Therefore, the proposed path planning algorithm has an advantage in dynamic environments where obstacles move in unpredictable ways.
ASJC Scopus subject areas