The use of artificial intelligence (AI) is everywhere. From making diagnoses in the medical profession to determining sentencing in court cases, we are increasingly relying on AI and machine learning algorithms to make decisions that can have a profound impact on people’s lives. As the use of AI becomes even more prevalent, how do we know that the algorithms and models used to make these critical decisions are correct, unbiased, and reliable? And, in the absence of human judgement, can we offer explanations to understand why algorithms produce the decisions they do?
The University of Virginia’s Biocomplexity Institute in partnership with the University of California, Davis, was recently awarded a three-year grant from the U.S. National Science Foundation (NSF) to tackle these questions, specifically when it comes to the unsupervised learning domain where explanation is arguably more necessary. In supervised learning, the system goes through a dedicated “training” phase where it learns models through numerous examples. In the unsupervised learning domain, there are no examples on which to base relationships or connections. Therefore, algorithms for unsupervised learning techniques such as clustering must infer the natural structure and relationships present within the data using the specified features and optimization objectives.
“For this project, we’re focused on developing techniques to explain a particular clustering result and understand why the algorithm made the decision that it did,” said S. S. Ravi, research professor in the Biocomplexity Institute’s Network Systems Science and Advanced Computing division. “We chose this focus because there are many clustering algorithms and this gives our work the widest impact. The goal of explanation in this case is to build more confidence in the model by ensuring there are no biases or weaknesses. If we develop methods to explain why a particular clustering decision was made, then humans can more reliably use the model to make complex, high-impact decisions.”
The Biocomplexity Institute will use data from several domains to verify and demonstrate the effectiveness of its research. Preliminary work on the topic examined Twitter activity during the 2016 elections in the United States and France and the resulting communities and relationships. Current work builds on data sets from an ongoing Institute project called the Functional Genomic and Computational Assessment of Threats (FunGCAT), which focuses on identifying whether or not biological material released into the atmosphere are dangerous to living beings. Genomic sequences obtained from these biological materials have been clustered into different threat levels. In this project, researchers will apply their algorithms to these clusters and develop explanations that can be understood by human experts.
Throughout the project, researchers will explore three core tasks as they relate to adding explanation to unsupervised learning:
- Clustering Explanation: Examining how to automatically develop complex explanations using detailed information and useful descriptors.
- Human In the Loop Extensions: Incorporating a human into the explanation process to add insight and judgments that may modify the explanation.
- Questions on Success, Failure, and Trust: Exploring how small modifications affect the explanation, and ultimately, its stability.
“The Institute has been doing a lot of work in AI and machine learning, but the aspect of explanation or model interpretation is an area that is just beginning to be explored,” Ravi said. “I hope our methods can extend to other projects of the Institute that are using unsupervised learning, serving as a checking mechanism and helping researchers gain more confidence in their results through explanation.”
Ravi leads the project as the principal investigator (PI) at the Biocomplexity Institute. The other PI for this collaborative project is Ian Davidson, a professor of Computer Science at UC Davis. The two have worked together for nearly 14 years, and during that time have co-authored numerous publications on clustering. The U.S. National Science Foundation funding spans from October 2019 through September 2022.