A new solution for Markov Decision Processes and its aerospace applications

Bertram, Joshua
Major Professor
Peng Wei
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Journal Issue
Electrical and Computer Engineering

Markov Decision Processes (MDPs) are a powerful technique for modelling sequential decisionmaking problems which have been used over many decades to solve problems including robotics,finance, and aerospace domains. However, MDPs are also known to be difficult to solve due toexplosion in the size of the state space which makes finding their solution intractable for manypractical problems. The traditional approaches such as value iteration required that each state inthe state space is represented as an element in an array, which eventually will exhaust the availablememory of any computer. It is not unusual to find practical problems in which the number ofstates is so large that it will never conceivably be tractable on any computer (e.g., the numberof states is of the order of the number of atoms in the universe.) Historically, this issue has beenmitigated by various means, but primarily by approximation (under the umbrella of ApproximateDynamic Programmming) where the solution of the MDP (the value function) is modelled via anapproximation function. Many linear function approximation methods have been proposed sinceMarkov Decision Processes were proposed nearly 70 years ago. More recently non-linear (e.g. deepneural net) function approximation methods have also been proposed to obtain a higher qualityestimate of the value function. While these methods help, they come with disadvantages includingloss of accuracy caused by the approximation, and a training or fitting phase which may take a longtime to converge

This thesis makes two main contributions in the area of Markov Decision Processes: (1) a novelalternative theoretical understanding of the nature of Markov Decision Processes and their solutions,and (2) a new series of algorithms that can solve a subset of MDPs extremely quickly compared tothe historical methods described above. We provide both an intuitive and mathematical descriptionof the method. We describe a progression of algorithms that demonstrate the utility of the approachin aerospace applications including guidance to goals, collision avoidance, and pursuit evasion. We start in 2D environments with simple aircraft models and end with 3D team-based pursuit evasionwhere the aircraft perform rolls and loops in a highly dynamic environment. We close by providingdiscussion and describing future research