Towards a Theory and Practice of Open-Ended Reasoning with Generative Models
Author(s)
Havrilla, Alexander
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
The unreasonable effectiveness of large language modeling has enabled the rapid development of generative systems capable of increasingly sophisticated ''human-like'' reasoning. This impressive performance stems largely from two key factors: 1. A high-quality pre-training phase during which models acquire a strong prior for language, reasoning in language, and mathematics. 2. An extensive reinforcement learning phase during which the model refines knowledge and skills acquired during pre-training. This thesis proposal presents a series of theoretical analyses improving our understanding of pre-training processes coupled with practical RL-based algorithmic improvements for better generative model reasoning capability with the goal of improving generative model reasoning in both theory and practice. On the theoretical side, I establish novel generalization bounds on the performance of several generative model architectures in terms of model size and number of training samples. I then demonstrate these bounds can be used to understand commonly observed "scaling laws" during large model training. On the experimental side, I develop new RL training frameworks facilitating the open-source training of large language models (LLMs). These are then used to conduct an in-depth investigation of the factors affecting reasoning performance of LLMs after RL training. Insights from this investigation lead to the development of new RL algorithms for better LLM reasoning and self-correction.
Sponsor
Date
2025-08-25
Extent
Resource Type
Text
Resource Subtype
Dissertation (PhD)