Advancing reasoning and planning in large language models via reward shaping
Author(s)
Zhuang, Yuchen
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Recent advancements in large language models (LLMs) have significantly enhanced their reasoning and planning capabilities, enabling them to serve effectively in complex, real-world scenarios. Despite these improvements, achieving human-level performance remains challenging, particularly for tasks requiring extensive multi-step reasoning and sophisticated planning. Motivated by these limitations, my dissertation focuses on improving the reasoning and planning abilities of LLMs through reward shaping to guide LLM decision-making by optimizing rewards for desired outcomes. The core contributions of this thesis are organized around three key aspects of effective and robust reasoning in LLM agents: (1) advancing LLM reasoning and planning capabilities through environmental feedback, long-term memory, and external tool use (Aim 1); (2) integrating human values, goals, and distinct capabilities into LLMs through precise reward models and human-like preferences for reinforcement learning (Aim 2); and (3) improving model robustness to imperfect samples, including sparse data, out-of-distribution data, and noisy labels (Aim 3). Additionally, the dissertation emphasizes data-centric approaches to curate high-quality datasets with light human effort, ensuring both training efficiency and faithful evaluation. Together, these thrusts represent a cohesive, data-centric strategy for enhancing LLM capabilities, systematically improving their ability to reason, plan, and adapt efficiently in complex, real-world environments.
Sponsor
Date
2025-07-10
Extent
Resource Type
Text
Resource Subtype
Dissertation