User Deception Using Chain of Thought Attack Methods on Large Language Models
Author(s)
Yu, Tianyi
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Chain‑of‑Thought (CoT) prompting clarifies large‑language‑model reasoning, yet the same transparency can become a liability. This thesis systematically demonstrates how adversaries exploit CoT to deceive both models and users. We craft back‑door prompts, trigger placements, and logical‑error patterns, then evaluate them across six datasets and a 12‑person user study. Natural‑language triggers placed early in the prompt yield the highest attack‑effect (89 %) and success (81 %), while also inflating user trust in incorrect answers. Our findings reveal that CoT vulnerability scales across tasks and architectures, call for latency‑aware detection and external‑verification defenses, and highlight the urgency of safeguarding step‑by‑step reasoning as LLMs enter high‑stakes domains.
Sponsor
Date
Extent
Resource Type
Text
Resource Subtype
Undergraduate Research Option Thesis