Millions of people chat with AI tools every day, trading small talk for quick answers or support. A new study presented at the 34th USENIX Security Symposium shows how easily those friendly agents can be tuned to make you reveal far more than you planned.
The researchers report that malicious chatbots can push users to disclose up to 12.5 times more personal details than standard ones. The most effective tricks leaned on reciprocity and reassurance, not blunt questions about your life.
The work comes from specialists in security and privacy King’s College London (KCL) and the Universitat Politècnica de València (UPV).
The team ran a randomized controlled trial with 502 participants and compared multiple chatbot behaviors, from neutral to intentionally manipulative, then measured the amount and sensitivity of information people shared.
“AI chatbots are widespread in many different sectors as they can provide natural and engaging interactions,” said Dr. Xiao Zhan, leader of this study and a Postdoctoral Researcher in the Department of Informatics at KCL.
The group built several conversational AI (CAI) systems on top of off the shelf large language model (LLM) backbones and changed only the system instructions that steer the bot’s style.
One configuration stayed neutral, while others were scripted to probe for personal data using different social tactics.
Their final models included Llama-3-8B Instruct for some runs, chosen because it can run on modest hardware and still hold a plausible chat.
The public model card for that variant documents its instruction following behavior and constraints.
They also used Mistral 7B Instruct for comparable tests, again without custom training, to reflect what an attacker could do with minimal effort.
That model is widely distributed and designed for general chat scenarios.
Three strategies were tested: direct asking, user benefit, and reciprocal. The reciprocal approach validated feelings, shared short stories about others, and promised confidentiality while still sliding in requests for details about the user.
This pattern taps a well known effect in online communication, where people tend to mirror disclosure and open up more when a partner appears empathetic.
Classic work has shown that self disclosure rises in computer mediated settings, especially under conditions that feel supportive or lower stakes.
“These AI chatbots are still relatively novel, which can make people less aware that there might be an ulterior motive to an interaction,” noted Dr. William Seymour, a Lecturer in Cybersecurity at KCL.
People often assume a chatbot forgets everything once a window closes, but LLMs learn from and sometimes memorize user text.
Security researchers have repeatedly extracted snippets of training data, including personally identifiable information (PII), by querying models in targeted ways.
More recent work has scaled those attacks against aligned, production systems, showing that even polished, widely deployed chatbots can be nudged into revealing fragments of their training sets.
The authors behind one such study recovered gigabytes of data and argued that current guardrails do not eliminate leakage risks.
Ordinary users can slow down, avoid sharing birth dates, addresses, employer names, and health or financial information, and keep a healthy skepticism when a bot empathizes and then asks for specifics.
If a chatbot insists on collecting details unrelated to your question, stop and reassess.
Platform providers and regulators have a role too, and the NIST AI Risk Management Framework offers a practical baseline for documenting risks, testing for abuse, and tightening controls around apps built on top of general models.
The 2024 Generative AI profile highlights governance steps and evaluations that map directly onto these problems, including pre deployment testing, continuous monitoring, and stronger transparency.
Chatbots are now part of daily life for a sizable share of Americans, with recent survey work finding that roughly one third of U.S. adults have used ChatGPT, and uptake is even higher among younger adults.
That expanding reach turns quiet design choices inside chatbots into privacy issues for everyone, not just tech insiders.
The team will present their findings in Seattle in August 2025, alongside peers studying attacks, defenses, and policy.
Their results make a simple point that is easy to forget in a friendly chat window: intent matters, and small changes in a bot’s script can make a big difference in what you share.
The study is published in USENIX Security Symposium Proceedings.
—–
Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates.
Check us out on EarthSnap, a free app brought to you by Eric Ralls and Earth.com.
—–