Exploring the Design Space of Diffusion Language Models

Author(s)
Bjorner, Jakob Brandt
Advisor(s)
Editor(s)
Associated Organization(s)
Series
Supplementary to:
Abstract
We seek to explore diffusion models as a space for better alignment in text to text problems. This is, we hope to quantify the difference in attainable improvement for alignment between the current paradigm of auto-regressive language models with techniques like PPO for RLHF, versus the relatively under explored area of diffusion language models. In doing so it is clear that we must push the state of diffusion language modeling forward to be considered comparable to auto-regressive models that undergo such techniques. Here in, we will present our study on the process of diffusing over more structured latent spaces as a mechanism of improving attained model performance measured in NLL. Further, we present initial attempts at training diffusion models to represent latent structures from which we can proceed to auto-regressively or non-auto-regressively decode from. In the context of controlled language generation, this is conceptually a VAE posterior. The intuition behind creating such a mechanism is that the posterior can be used for modeling the latent structures we wish to control in our eventual sentence. The ability to generate such a posterior also provides evidence that diffusion models can be used in the data space that latent language is embedded into.
Sponsor
Date
Extent
Resource Type
Text
Resource Subtype
Undergraduate Research Option Thesis
Rights Statement
Rights URI