When people talk to AI assistants, chatbots tend to drop the pleasantries, trim the grammar, and get straight to the point. That stripped-down style may feel natural – you’re talking to a bot, not a scientist.
However, a new analysis suggests that informality comes at a cost: lower accuracy in what the AI understands and returns.
In a two-part study, Amazon researchers Fulei Zhang and Zhou Yu compared how users open conversations with human support agents versus AI chatbots powered by large language models.
Using Claude 3.5 Sonnet to score thousands of real interactions, they found that people consistently write less formally, less politely, and less fluently when a machine is on the other end.
By Claude’s metrics, human-to-human openings were 14.5 percent more polite and formal than human-to-bot messages, 5.3 percent more fluent, and 1.4 percent more lexically diverse.
The authors told New Scientist: “Users adapt their linguistic style in human-LLM conversations, producing messages that are shorter, more direct, less formal, and grammatically simpler.”
The experts argue that this behavior reflects a mental model of chatbots as “less socially sensitive or less capable of nuanced interpretation.”
The team then asked whether the clipped style actually trips up AI understanding. They trained an intent-classification model, Mistral 7B, on 13,000 real human-to-human conversations, then used it to label user intent in 1,357 real messages sent to chatbots.
Because the model learned from human-to-human language, it struggled with the telegraphic phrasing common in bot chats.
In other words, when people talk to chatbots like machines, a model trained on human conversations is more likely to misread them.
To bridge the gap, Zhang and Yu used Claude to rewrite bot-directed messages before feeding them to Mistral.
One approach expanded terse prompts into fuller, human-like prose, but accuracy fell by 1.9 percent relative to the baseline. Another produced “minimal” rewrites that kept messages short and blunt and accuracy dropped by 2.6 percent.
A third, more “enriched” rewrite added formal, varied language, and accuracy still declined by 1.8 percent.
The only improvement came when the researchers fine-tuned Mistral on a mixture of minimal and enriched rewrites, which lifted performance by 2.9 percent.
The lesson is less about a single perfect style and more about diversity. Models generalize better when they have seen both clipped and polished versions of the same user intent.
Not everyone views informality as a problem to stamp out when it comes to AI.
“The finding that people communicate differently with chatbots than with other humans is temptingly framed as a shortcoming of the chatbot – but I’d argue that it’s not,” Noah Giansiracusa from Bentley University told New Scientist.
“It’s good when people know they are talking with bots and adapt their behavior accordingly. I think that’s healthier than obsessively trying to eliminate the gap between human and bot.”
That perspective reframes the design choice. We can ask users to be more explicit and grammatical when accuracy matters, but the heavier lift is on builders to train systems that can decode the concise shorthand people naturally adopt with machines.
For everyday users, the study hints at a modest adjustment when stakes are high. Adding a bit more context and syntax gives current systems more to latch onto without forcing anyone to write like a lawyer.
For developers, the implications are sharper. Training data should include authentic bot-directed language, not just human-to-human corpora.
Style diversity during fine-tuning appears crucial, since exposure to both minimal and enriched phrasing helped the model recover accuracy.
And evaluation needs care. The study relied on Claude both to score politeness, fluency, and diversity, and to produce rewrites.
That is practical, but it risks baking one model’s stylistic biases into diagnosis and remedy. Independent evaluators and human-rated benchmarks can reduce that risk.
As AI assistants seep into customer service, productivity tools and search, they are also nudging how people write – shorter, sharper, less adorned. The study shows that cultural shift has technical consequences.
Fixing it is not about forcing users back into formal prose. It is about meeting them where they are and teaching models to hear meaning in the modern shorthand.
Until systems fully adapt, a practical middle ground exists. When handling sensitive tasks – medical, legal, financial, or travel-related – take your time, add structure, and make your intent clear.
And if you’re designing the chatbot, don’t expect users to do the same. Train it for the real world they already live in.
A pre-print of the study can be found on arXiv.
—–
Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates.
Check us out on EarthSnap, a free app brought to you by Eric Ralls and Earth.com.
—–