A Methodology for Resilience-based Design of an Environmental Control and Life Support System

Author(s)
Rines, Matthew
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
Daniel Guggenheim School of Aerospace Engineering
The Daniel Guggenheim School of Aeronautics was established in 1931, with a name change in 1962 to the School of Aerospace Engineering
Supplementary to:
Abstract
A space habitat is responsible for maintaining the health and comfort of the crew during nominal and off-nominal situations so that they are able to carry out the scientific duties of their mission. With the Artemis program, NASA is planning to construct a space habitat in the moon's orbit and a surface habitat near the lunar south pole. As space habitats become more complex and are located farther from Earth, there are increased challenges to ensure resilience in system performance and crew safety. Disturbances that occurred on the International Space Station (ISS), such as reduced efficiency of the urine processor assembly and atmospheric leaks, may pose greater risks to the crew when the habitats are located farther than LEO. By incorporating resilience-based engineering into the design process, habitat designs can be analyzed based on their ability to keep the crew safe even in the event of unplanned disturbances. Resilience-based engineering calls for the habitat to be able to monitor its performance, anticipate future demands and challenges, respond to any disturbances, and learn from experience. Increasing the system resilience of a space habitat can come from the design of the habitat layout and configuration as well as from advanced resource management strategies. The environmental control and life support system (ECLSS) is able to distribute, recycle, and produce various resources integral to the habitat's functions. On the ISS, ECLSS subsystems are statically set to operate at a prescribed level by engineers at mission control on Earth. As space habitats move beyond low Earth orbit, a number of factors contribute to making this approach less desirable in the case of a disturbance. Farther distances increase the communication time delay, resupply times, and cost of redundancy. Longer-term surface habitats will also become more complex, increasing the number of components that could fail, the risk of cascading failures, and the number of factors to take into account when determining how to reallocate resources. As a result of these challenges, it is beneficial for resource allocation decision making to be autonomous. An autonomous resource allocation algorithm will reduce the workload of the crew and reliance on terrestrial decision-making. The hostility of space drives the need for rapid decision-making without requiring humans to be in the loop. Furthermore, the inability to plan for all possible disturbances necessitates a resource allocation strategy that can adapt and learn from experience. To meet these goals, a methodology is devised for developing resilient resource allocation strategies for an ECLSS. Reinforcement learning techniques are used to allow rapid ECLSS resource allocations both for nominal and off-nominal situations. The methodology includes the ability to learn from data collected in operation as well as in a priori training simulations in order to better respond to disturbances that it had not previously experienced in simulations or to natural changes in the dynamics of the habitat with time. Reinforcement learning has previously been investigated for the purpose of ECLSS resource allocation alongside optimization techniques. In the intervening years, the field of reinforcement learning has made significant improvements. Several policy-based reinforcement learning algorithms are now compared in order to provide a recommendation on which type is most effective at learning to optimally allocate resources for a space habitat. After conducting hyperparameter optimization studies for REINFORCE, deep deterministic policy gradient (DDPG), soft-actor critic (SAC), and proximal policy optimization (PPO), the trained agents were compared with objective values and computational training time. SAC was found to perform the best of the chosen algorithms. Different update strategies are also explored to discover the most effective method for handling non-stationary dynamics that were not seen in training while also performing actions that are still close to optimal. The SAC agent was reconfigured to continue training and updating its policy with every new piece of information in deployment at a reduced learning rate. Additionally, a context-based version of SAC was trained and deployed into simulations with disturbances as well. The non-context SAC was found to perform the best. Two styles of reward functions were compared with the continuously updating SAC agent. The first incorporated linear penalties outside of the target oxygen threshold and the other used a quadratic penalty. It was found that the agent with the linear penalty could adapt to disturbances in real-time better. The methods above were used to train and deploy an agent into a more complex habitat where the agent would observe 8 variables and take 6 actions. It was found that the agent could learn to operate the habitat during a nominal scenario, but in the event of an unexpected disturbance struggled to adapt. While a resource allocation algorithm could be developed to improve the resilience of an existing space habitat, it can also be used to further improve the resilience of a space habitat that is yet to be designed. The methodology for the resource allocation algorithm can also be brought forward in the design process to the habitat's architectural design to investigate how different habitat configurations can affect the ECLSS's ability to successfully allocate resources in off-nominal scenarios, a better design could be found earlier and more cost effectively.
Sponsor
Date
2023-04-30
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI