Advancing reasoning and planning in large language models via reward shaping

Author(s)
Zhuang, Yuchen
Advisor(s)
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
School of Computational Science and Engineering
School established in May 2010
Series
Supplementary to:
Abstract
Recent advancements in large language models (LLMs) have significantly enhanced their reasoning and planning capabilities, enabling them to serve effectively in complex, real-world scenarios. Despite these improvements, achieving human-level performance remains challenging, particularly for tasks requiring extensive multi-step reasoning and sophisticated planning. Motivated by these limitations, my dissertation focuses on improving the reasoning and planning abilities of LLMs through reward shaping to guide LLM decision-making by optimizing rewards for desired outcomes. The core contributions of this thesis are organized around three key aspects of effective and robust reasoning in LLM agents: (1) advancing LLM reasoning and planning capabilities through environmental feedback, long-term memory, and external tool use (Aim 1); (2) integrating human values, goals, and distinct capabilities into LLMs through precise reward models and human-like preferences for reinforcement learning (Aim 2); and (3) improving model robustness to imperfect samples, including sparse data, out-of-distribution data, and noisy labels (Aim 3). Additionally, the dissertation emphasizes data-centric approaches to curate high-quality datasets with light human effort, ensuring both training efficiency and faithful evaluation. Together, these thrusts represent a cohesive, data-centric strategy for enhancing LLM capabilities, systematically improving their ability to reason, plan, and adapt efficiently in complex, real-world environments.
Sponsor
Date
2025-07-10
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI