02-15-2019

How reliable are scientific discoveries made using machine learning?

Earth.com staff writer

Machine learning (ML) and artificial intelligence have opened up new worlds of possibilities for research, modeling, data analyses, and future projections.

Sorting and analyzing large amounts of data using machine learning has become more and more widespread, and deep learning has made incredible new discoveries including the DNA of an unknown ancient human ancestor found within the human genome.

However, when it comes to these breakthrough discoveries, how reliable are the results?

According to Genevera Allen, a statistician from Rice University, these results are not always trustworthy, at least not yet anyway.

Allen, who presented her thoughts on the credibility of machine learning at the 2019 Annual Meeting of the American Association for the Advancement of Science, stressed that any researcher should be cautious about conclusions made with ML.

“The question is, ‘Can we really trust the discoveries that are currently being made using machine-learning techniques applied to large data sets?’” said Allen. “The answer in many situations is probably, ‘Not without checking,’ but work is underway on next-generation machine-learning systems that will assess the uncertainty and reproducibility of their predictions.”

Because machine learning “learns” from large amounts of data rather than a series of instructions, a lot of work done in the ML field deals with making predictive models.

Yet, these results and projections often go uncorroborated and are rarely called into question.

Allen brought up recent cancer research conducted with ML techniques to help emphasize her points about unreliability as the results of several recently published studies are not replicable in their findings.

“People have applied machine learning to genomic data from clinical cohorts to find groups, or clusters, of patients with similar genomic profiles,” Allen explained.

Finding people with similar profiles is a key part of drug development because if you have a large group of people with a similar genomic profile, you can create a drug that specifically targets a genome linked to a disease for that group.

“But there are cases where discoveries aren’t reproducible; the clusters discovered in one study are completely different than the clusters found in another,” said Allen. “Why? Because most machine-learning techniques today always say, ‘I found a group.’ Sometimes, it would be far more useful if they said, ‘I think some of these are really grouped together, but I’m uncertain about these others.’”

—

By Kay Vandette, Earth.com Staff Writer

RELATED NEWS