Multiagent debate among vision-language models improves multimodal reasoning
Author(s)
Murugappan, Ganesh Meyyappan
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
We propose a framework for improving the multimodal reasoning capabilities of vision-language models through multiagent debate, where multiple models engage in a structured debate process, taking opposing perspectives and exchanging arguments about a given multimodal input containing text and images. Through this iterative debate, the models can complement each other's strengths, surface relevant evidence across modalities, and arrive at more robust and well-reasoned conclusions compared to using a single model. Evaluated on the ScienceQA dataset, models involved in a debate significantly outperformed their individual baselines, with prompting strategies resulting in further improvement. The debate process allows models to identify flaws, provide additional evidence, and negotiate stronger final answers by combining diverse skills, highlighting the potential of constructive disagreement and debate for overcoming limitations in current multimodal AI systems.
Sponsor
Date
2024-04-29
Extent
Resource Type
Text
Resource Subtype
Thesis