Utility Mobile

2021 Joint Statistical Meetings

Mining Federal RePORTER Using Machine Learning: Selected Case Studies on the Popularity of Concept-Related Topics

Date: Thursday, August 12, 2021 | 4:00 PM to 5:50 PM

Authors: Kathryn Linehan, Eric Oh, Joel Thurston, Stephanie S.Shipp, Sallie Keller, John Jankowski, and Audrey Kindlon

Sponsor: Government Statistics Section

Abstract: Gleaning insights from large amounts of data has become crucial in today’s world. In this research, we present insights from federal RePORTER, a database of federally funded Research and development (R&D) grants. By mining project abstracts, we create a concept-themed corpus (e.g., abstracts related to pandemics) through the use of term-matching and latent semantic indexing (LSI) and then use non-negative matrix factorization (NMF) topic modeling to discover concept-related topics in that corpus. We analyze these topics over time to find when topics increase and decline in popularity and include a sensitivity analysis of these trends. We show that the results of our topic analysis over time correspond to past events that would affect the rise or fall in popularity of particular topics. Stability results for the topic model are included as well. We will present selected case studies on concepts such as pandemics, coronavirus, and artificial intelligence.

Watch the presentation here.