In a very short period of time, the term “fake news” has gone from an almost oxymoronic turn of phrase to become a crisis for media outlets around the world. Countless fake news generators have sprung up in the last year, filling social media pages and general discourse with biased, nonfactual information. While websites such as Politifact and Snopes attempt to debunk fake news as it comes out – the process is long, tedious, and overwhelming.
That is why researchers in MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) and the Qatar Computing Research Institute (QCRI) set out to find a new approach to debunking fake news. Their idea was not to focus on each fact or lack thereof individually, but to assess the news sources themselves. With this approach, they have demonstrated a new system that uses machine learning to determine if a source is factually correct or politically biased.
“If a website has published fake news before, there’s a good chance they’ll do it again,” says postdoctoral associate Ramy Baly, the lead author on a new paper about the system. “By automatically scraping data about these sites, the hope is that our system can help figure out which ones are likely to do it in the first place.” According to Baly, the system only needs about 150 articles to reliably detect if a news source is trustworthy.
This system is the result of a collaboration between computer scientists at MIT CSAIL and QCRI, which is part of the Hamad Bin Khalifa University in Qatar. Analysis of the system’s own accuracy in determining factuality showed that it was 65 percent accurate at detecting whether an outlet has a high, medium, or low level of factuality. It also was roughly 70 percent accurate at detecting if an outlet leaned to the right, left, or was in the middle.
The researchers found that the most reliable ways to detect fake news and biased reporting were to assess the common linguistic features across the source’s stories – including sentiment, complexity, and structure. Fake news outlets were more likely to use language that was hyperbolic, subjective, and emotional. The authors say that the system also found correlations with an outlet’s Wikipedia page and its credibility. The longer pages that had fewer target words such as “extreme” or “conspiracy theory” were found to be more accurate.
However, co-author Preslav Nakov does state that this system is very much a work-in-progress, and that – even if its own accuracy was improved – it would be best to have it work alongside traditional fact-checkers. “If outlets report differently on a particular topic, a site like Politifact could instantly look at our ‘fake news’ scores for those outlets to determine how much validity to give to different perspectives,” says Nakov, a senior scientist at QCRI.
The research team also formed a new open-source dataset of more than 1,000 news sources annotated with factuality and bias score. This is the world’s largest database of its kind. Moving forward, the team plans to assess whether the English-trained system can be adapted to other languages and also determine religion-specific biases.
“It’s interesting to think about new ways to present the news to people,” says Nakov. “Tools like this could help people give a bit more thought to issues and explore other perspectives that they might not have otherwise considered.”