We're educating a new generation of researchers — undergraduate and graduate students who are eager to collaborate in teams and across disciplines, use computational tools to understand how the world works, and grapple with the ethical questions around authority, privacy, and human agency. We want to create a generation of scientists who will shape their fields to serve the public good.
We give students the opportunity to work with our team on real-world challenges that affect us all—problems too big for any one person to solve. Our group includes experts from many fields, like math, biology, and computer science, all working together to tackle complex issues. We’re always excited to welcome students who are passionate about data science and can bring fresh ideas to our research.
Current Student Programs
Our world faces big challenges, from health to infrastructure, and data is key to solving them. In the Computing for Global Challenges (C4GC) summer program, undergraduates team up with peers and scientists to tackle real-world issues like disaster resilience, epidemic forecasting, and sustainable agriculture.
You’ll use cutting-edge tools like machine learning and data science while collaborating with a mentor and your team on projects with real impact, from controlling invasive species to preparing for hurricanes. Along the way, you'll also explore the ethical side of research.
This hands-on, eight-week experience wraps up with a student showcase, where you’ll present your work and the solutions you've developed.
- 2024 Projects
Simulating Human Behavior in Classic Epidemic Progression
Student and Mentor: Aaron Li and Dr. Baltazar EspinozaForecasting NDVI Values Using Drought Data
Student and Mentors: Alexander Yao and Drs. Abhijin Adiga and Samarth SwarupAnalysis of Continuously Retrained Models in Time Series Forecasting
Student and Mentor: Arya Palla and Dr. Aniruddha AdigaNational Capital Region Social Impact Data Commons
Student and Mentor: Ava Gutshall and Dr. Aaron SchroederAnalysis of the Demographic Graph Representation of Disease Spread
Student and Mentors: Benny Bigler-Wang and Ritwick Mishra, Drs. Abhijin Adiga and Anil VullikantiVisualizing Contagion Dynamics over Massive Networks
Students and Mentors: Caroline Jareb and Simrat Saini and Mandy Wilson, Drs. Henning Mortveit, Samarth Swarup, Stefan Hoops, and Jiangzhuo ChenApplying PCA to Improve Accuracy of VDH Health Opportunity Indices
Student and Mentor: Chelsea Qian and Dr. Aaron SchroederScience Time Series: Deep Learning in Hydrology
Student and Mentor: Junyang (Eric) He and Dr. Geoffrey FoxLeveraging Cross-Domain Video Similarity for FineTuning Surgical Models Using Pretrained Hiera
Student and Mentors: Jessica Tierney and Soumee Guha and Dr. Scott T. ActonEpiGen: Generating HPC Epidemic Simulators using Natural Language by leveraging Large Language Models
Student and Mentor: Libby Trainum and Dr. Parantapa BhattacharyaBehavioral Responses to Pandemic Disinformation Mathematical and Psychological Modeling
Student and Mentor: Luke Chapman and Dr. Baltazar EspinozaCommodity Flow Prediction with Graph Convolutional Neural Networks
Student and Mentor: Max Benningfield and Hongze Chen, Drs. Aniruddha Adiga and Abhijin AdigaAugmenting Cluster-Tracker for Low Overhead, High-Resolution Importations
Student and Mentor: Reid Farmer and Dr. Andrew WarrenEvaluating the Evolution of Antibiotic Resistance Across increasing Concentrations of Trimethoprim
Student and Mentor: Antonio Aleman and Dr. Rebecca WattamStability of Tag-Based Explanations of Clusters
Student and Mentor: William C. Bradford and Drs. Abhijin Adiga and S. S. RaviExplaining Clusters of Energy Usage Data Using Auxiliary Information
Student and Mentors: Zhiyuan Song and Dr. S. S. Ravi- 2023 Projects
Analysis of Antibiogram Data for Identifying Patterns of Transmission in Hospital-Associated Infections
Student and Mentor: Shan Akiraj, Dr. Anil VullikantiAnalysis of Non-Pharmaceutical Interventions (NPIs) for COVID-19 Using County-level Mandate Data
Student and Mentor: Ranya Fischer, Ben HurtAnalyzing the Role of Virginia Healthcare Facilities in the Spread of Methicillin-resistant Staphylococcus aureus (MRSA)
Students and Mentor: Tyler Gorecki, Ritwick Mirsha, Dr. Anil VullikantiApplying Deep Learning Techniques to Remote Sensing Data for Orchard Age Classification
Student and Mentors: Christopher Goodhart, Drs. Samarth Swarup and Abhijin AdigaBifurcation Analysis for a Mathematical Model to Understand Guillain-Barré Syndrome
Student and Mentor: Ana Gabriela Gómez Patiño, Dr. Baltazar EspinozaConsiderations in Virtual Community Building
Student and Mentor: Fiona Tracy, Erin RaymondDecision Support Model for Epidemic-related Public Transportation Restrictions
Student and Mentor: Ainsley Raymond and Dr. Baltazar EspinozaExtraction of Waterways from Remote Sensing Imagery Using Deep Learning Based Semantic Segmentation
Student and Mentors: Andrew Ma, Chris Goodhart, Drs. Abhijin Adiga and Samarth SwarupEvaluating Generative Models Using Nonparametric Estimation of Rényi Divergence with Sample-level Auditing
Student and Mentors: Alexis Fox, Drs. Sarmarth Swarup and Abhijin AdigaEvaluating the Impact of Bailout Strategies on Financial Networks
Student and Mentor: Harry Li and Dr. Tanvir FerdousiExploring Surrogate Approaches that Utilize Networks for Prediction in Epidemic Simulation
Student and Mentors: Lillian Encarnation, Drs. Gursharn Kaur and Sifat MoonGenerating, Populating, and Analyzing Ego Networks to Mitigate the Spread of MRSA
Student and Mentors: Kushagra Singhai, Drs. Abhijin Adiga and Anil VullikantiHitting Sets for Cluster Explanations and Practical Applications
Student and Mentor: Bob Downey, Dr. S.S. RaviPredicting Epidemic Cascades Using Metapopulation Models
Student and Mentor: Wright Quist, Dr. Srini VenkatramananProcessing Genomic Signals for Epidemic Awareness
Student and Mentor: Lulu Han, Dr. Andrew WarrenSpatiotemporal Analysis of Childhood Vaccine Coverage in Virginia Pre- and Post-COVID
Student and Mentors: Rithika Devarakonda, Drs. Anil Vullikanti, Sifat Moon, and Achla Marathe
- 2022 Projects
Time Series in Python
Student: AbdulBaqiy DiyaoluHow COVID-19 Highlighted Health Inequities in Virginia
Student and Mentor: Alexander Maksiaev and Dr. Bryan LewisCollaboration in the Time of COVID
Student and Mentor: Allison Lai and Dr. Bryan LewisCOVID-19 Non-Pharmaceutical Interventions: Analysis at the County Level
Student and Mentor: Anjali Mathew and Ben HurtEffect of Initial Seeding in Epidemic Process on Different Networks
Student and Mentors: Anthony Panagides and Drs. Gursharn Kaur and Aniruddha AdigaAnalyzing Network Structure Through Temporal Motifs
Student and Mentor: Clark Mollencop and Dr. Abhijin AdigaUniversal Approach to Science Time Series: Deep Learning on Hydrology
Student and Mentor: Junyang (Eric) He and Dr. Geoffrey FoxUpdating and Analyzing Mobility Data for COVID-19 Research
Student and Mentor: Ethan Haller and Mandy WilsonMultiple COVID-19 Time Series Forecasting
Student and Mentors: Finn Mokrzycki and Drs. Gursharn Kaur and Aniruddha AdigaEfficient Algorithms for Generating Provably Near-Optimal Cluster Descriptors for Explainability
Student and Mentor: George Li and Dr. S. S. RaviQuantifying Epidemic Forecast Diversity and Change Through Optimal Transport
Student and Mentor: Lanyin Zhang and Dr. Srinivasan VenkatramananUsing a Metapopulation Model to Explore Measles Outbreak Risk Arising From Undervaccination in Virginia
Student and Mentor: Nicholas Wu and Dr. Anil VullikantiVisualizing Extreme-Scale, Next Generation Epidemic Simulations
Student and Mentor: Rakrish Dhakal and Dr. ChenAntibiogram Resistance Pattern Detection and Tracking
Students and Mentors: Saarthak Gupta, Caitlyn Fay, and Drs. Chen Chen and Anil VullikantiAdaptive Prevalence Testing for Epidemic State Estimation Using Particle Filter Recurrent Neural Networks
Student and Mentors: Sami Saliba and Drs. Henning Mortveit and Samarth SwarupPredicting and Modeling the Spread of Tuta absoluta in the United States
Student and Mentor: William Mueller and Dr. Abhijin Adiga
- 2021 Projects
Predicting MRSA Infections in Hospital Environments
Student and Mentors: Ahmed Hussain and Drs. Parantapa Bhattacharya and Anil VullikantiA Model of Evacuation Rates During Hurricane Laura Using Twitter Data, Weather Forecasts, and Demographics
Student and Mentor: Anna Brower and Dr. Samarth SwarupMapping CAR-T Cell Therapy Associated Neurotoxicity
Student and Mentor: Aubrey Winger and Dr. Andrew WarrenVisualizing Cancer Pain with BESI-C Technology
Student and Mentor: Ayma Khwaja and Dr. Bryan LewisHurricane Evacuation Probability Mapping and Structural Analysis of Synthetic Populations
Student and Mentor: Goutham Mittadhoddi and Dr. Chris KulhmanCommunication With Science
Student and Mentors: Haritha Nanduri and Erin Raymond and Golda BarrowDetecting Infection Spread with Minimum Delay Using Machine Learning and Agent-Based Network Modeling
Student and Mentors: Hisham Assana and Jack Heavey and Drs. Anil Vullikanti and Chen ChenFunctional Annotation of PATRIC Using Gene Ontology
Student and Mentors: Jainam Modh and Dr. Andrew Warren, Dustin Machi, and Joseph OuttenClimate-Dependent Modeling of the COVID-19 Pandemic
Student and Mentor: Leona Gaither and Dr. Baltazar EspinozaSpatializing and Minimizing the Spread of MRSA
Student and Mentors: Matthew Gerace and Drs. Anil Vullikanti and Chen ChenUsing Grad-CAM to Explain Deep Learning Methods for Mapping Invasive Plants in a Biodiversity Hotspot
Student and Mentor: Neha Pattanaik and Dr. Abhijin AdigaAnalyzing the Spectral Properties of Graphs
Student and Mentor: Nicholas Palmer and Dr. Abhijin AdigaAnalyzing the Effect of Declining Measles Immunization Rates Post-Lockdown
Student and Mentor: Richard Zhou and Drs. Anil Vullikanti, Achla Marathe, and Mugdha ThakurAutomatic Tile Grid Map Production Software
Student and Mentor: Rohit Rajuladevi, Srinivasan VenkataramananCDI Case Detection using Machine Learning and Agent-Based Network Modeling
Student and Mentor: Vivian Ma and Drs. Anil Vullikanti and Methun Kamruzzaman
Our program offers a 10-week summer research experience for undergraduate and graduate students from across the U.S., equipping you with the skills to use data to tackle today’s critical social challenges and inform better policy decisions.
As a DSPG Young Scholar, you’ll collaborate with fellow students, postdocs, faculty, and project partners on impactful projects that improve lives and shape public policy. This hands-on experience bridges statistics, data science, engineering, and the social sciences.
We welcome students from diverse backgrounds, and some may even continue working with us during the school year. Scholars are selected through a competitive process and receive a stipend for their participation.
Curious about what it’s like? Read about how Chase Dawson (left), a UVA student and recent DSPG Young Scholar, spent his summer with us.
- 2023 Projects
U.S. Army Research Institute for the Behavioral and Social Sciences
Students and Sponsor: Nakshatra Yalagach, Sara Shallenberger, and Andrew SlaughterU.S. Census Bureau
Students: Jianing Cai, Marijke van der GeerU.S. Department of Agriculture Economic Research Service
Students and Sponsor: Annie Xie, Steve Zhou, and John PenderSocial Impact Data Commons
Students and Sponsor: Anjali Mehta, Prashanth Wagle, Trinity Chamblin, and Mastercard Center for Inclusive GrowthBiokind Analytics
Students and Sponsor: Anjali Mehta, Annie Xie, Jianing Cai, Marijke van der Geer, Nakshatra Yalagach, Kate Lanman, Prashanth Wagle, Sara Shallenberger, Steve Zhou, Trinity Chamblin, and Alex Han
- 2022 Projects
UVA Projects
Coastal Futures: Building Capacity for Data-driven Adaptation in Rural Coastal Communities
Students and Mentors: Kishore Sundaram, Jillian Eberhart, Joshua Goldstein, and Aritra Halder
In collaboration with UVA’s Environmental Resilience Institute, the UVA BII Social and Decision Analytics Division is studying climate impact on the rural Eastern Shore of Virginia. Our goal is to develop tools and build the capacity of local communities to deal with these impacts. We are developing a representative synthetic population of these communities down to the level of individual households and farms by combining information from administrative and survey data. This data will be input into agent-based models and hydrological models measuring flood hazard, water supply, and groundwater salinization to inform stakeholder decision-making.Impacts of Broadband Development on Rural Property Values
Students and Mentors: Donovan Cates, Kristian Olsson, Joshua Goldstein, and Aritra Halder
The UVA BII Social and Decision Analytics Division USDA Broadband Subsidy project evaluated the economic impact of various broadband initiatives in rural communities. This project implemented spatial regression discontinuity designs of residential property values and broadband download speeds from areas both inside and surrounding the program regions. Additionally, we evaluated speed test data from Ookla and illustrated the relationship between project funding and tangible improvements in internet quality.Leveraging Existing DoD Data Towards Optimized Individual and Team Performance in the Army
Students and Mentors: Skylar Haskiell, Jillian Eberhart, Joanna Schroeder, and Joel Thurston
The UVA BII Social and Decision Analytics Division Army Research Institute Qualitative Analysis project utilized document analysis to answer the research question: What are the qualities of an individual soldier that contribute to unit performance? Through qualitative analysis, we interpreted documents to give meaning to our phenomena of interest and triangulated our interpretations for credibility. This project will inform and contextualize future quantitative modeling of Soldier and Unit performance in the U.S. Army.Mastercard Center for Inclusive Growth and Virginia Department of Health Social Impact Data Commons
Students and Mentors: Alan Wang, Kishore Sundaram, Steve Zhou, Donovan Cates, Aaron Schroeder, and Joel Thurston
Sponsored by the Mastercard Center for Inclusive Growth and Virginia Department of Health, the UVA BII Social and Decision Analytics Division Data Commons is building an open knowledge repository that compiles data from trusted open access sources, curates data insights, and provides tools designed to track issues over time and geography. Our toolkit and methodologies allow governments and community stakeholders to access timely data and make informed policy decisions. Several data commons have been deployed focusing on crucial social equity issues across the National Capital Region, as well as between urban and rural Virginia communities.Use of Statistical and Survey Methodology Research to Improve or Redesign Surveys: Product Innovation
Students and Mentors: Alan Wang, Steve Zhou, Neil Kattampallil
The UVA BII Social and Decision Analytics Division Product Innovation project built a proof-of-concept toolkit that enables the use of the North American Industry Classification System (NAICS) to track innovation activities sustainably using opportunity data. The toolkit accelerates Really Simple Syndication (RSS) queries and news source text extraction using open-source modules and browser automation. The collected texts are then piped to natural language processing (NLP) modules that detect business, product, and innovation status.Emerging Digitalization Trends
Students and Mentors: Skylar Haskiell, Kristian Olsson, and Kathryn Linehan
The UVA BII Social and Decision Analytics Division NCSES Research and Development project conducted a thorough literature review of the emerging concept of “digitalization” and explored natural language processing techniques to use in identifying grant abstracts about this theme. We utilized a variety of techniques on a test corpus to discover the most accurate method, which we then applied to the Federal RePORTER database to discover research trends related to the area of digitalization.Virginia Tech Projects
Agricultural Land Use Change in Powhatan and Goochland County
Students and Mentors: Rachel Inman, John Malla, Christopher Vest, Nazmul Huda, Samantha Rippley, Yuanyuan Wen, and Susan Chen
Goochland and Powhatan County would like to understand land-use conversion from agriculture. This project uses publicly available geospatial data and administrative parcel records to construct a profile of land parcels over time and inform Goochland and Powhatan counties about land conversion/agriculture loss. For each land parcel, we build a data frame that includes whether it parcellates, the type of crop grown if applicable, soil type, travel time to Richmond, provision of utilities, and existing land use. We then use these data to conduct geospatial and statistical analysis to understand the factors most likely associated with land-use change.Using Remote-Sensed Data for Social and Economic Decision-Making in Zimbabwe
Students and Mentors: Frankie Fan, Ari Liverpool, Josue Navarrete, Leonard-Allen Quaye, Poonam Tajanpure, Naveen Abedin, Brianna Posadas, and Susan Chen
The Zimbabwean government has recently approved an agricultural policy framework based on climate-smart principles. Still, it contains little geographic specificity in an incredibly diverse agricultural economy. This project uses remotely sensed weather-related data to construct a spatial profile of agricultural conditions in Zimbabwe. Using geospatial analysis and statistical modeling, we assess the utility of using remotely sensed data to understand district-level poverty and its components. Our analysis provides a spatially disaggregated look at whether climate data can be used to identify at-risk regions for potential policy intervention.Sensing Drought in the Sahel for Household Climate Resilience
Students and Mentors: Catherine Back, Milind Gupta, Riley Rudd, Poonam Tajanpure, Armine Poghosyan, and Elinor Benami
Frequent weather shocks impede poverty alleviation efforts in areas dependent on rainfed agriculture, such as the drought-prone Sahel. To help break the link between drought and distress, our DSPG team is creating a reproducible analysis pipeline to examine how historical drought indicators extracted from remotely sensed data are most closely linked with food insecurity. This analysis pipeline, in turn, can help our partners at the World Bank quickly issue relief to the most vulnerable when poor weather strikes.Illustrating Potential Opportunities for Community Schools in Loudoun County
Students and Mentors: Amanda Ljuba, Jontayvion Osborne, Abdullah Rizwan, Nandini Das, and Chanit'a Holmes
This project examines the resources and services available for elementary schools in Sterling, Loudoun County, involved in the Community Schools Initiative. We analyze services in four key areas - Basic Needs, Emotional and Mental Health, Student Engagement and Motivation, and Family Engagement and determine potential opportunities for improvement to meet the needs of students, families, and the community.Assessing Livelihood Diversification in Sundarbans, India using High-Frequency Data
Students and Mentors: Siddarth Ravikanti, Taj Cole, Samantha Rippley, Nandini Das, and Chanit'a Holmes
This project aims to evaluate livelihood-diversification strategies for approximately 300 households in the Sundarbans region in India using weekly financial data. We provide insights into the effects of climate change in this area by describing and visualizing households' income, expenditure, and consumption patterns.
- 2021 Projects
UVA Biocomplexity Institute
Fostering Data Reuse: Measuring the Usability of Publicly Accessible Research Data
Students and Mentors: Emily Kurtz, Aditi Mahabal, and Akilesh S. Ramakrishna and Drs. Alyssa Mikytuck, Gizem Korkmaz, and Sarah Nusser
In this project, we investigated factors associated with the reuse of publicly accessible research data, which is data that is made freely available on a journal, repository, or other website. Funding agencies, such as NSF, mandate that data be made available to the public. However, it takes time and resources to do so. To help data sharers understand the impact of this effort and to understand if those using the data can re-use it, we studied datasets on popular data repositories, such as KNB, Figshare, and Dryad, and used R’s web scraping capabilities to gather information on heavily reused datasets. We gathered metrics like downloads, citations, views, usability scores, metadata information, dataset size, and more from thousands of datasets from six chosen repositories. We used these metrics to understand reuse, which we measured using both the number of downloads and citations. We also analyzed equity of access by utilizing information that some repositories, such as ICPSR and NSF PAR, track on the makeup of their data users and data sharers.Manpower Planning Using Skill Data
Students and Mentors: Morgan Stockham, Asia Porter, and Stephanie Zhang and Joanna Schroeder, Drs. Josh Goldstein and Aritra Halder
We study the connection between Army jobs and skills acquired through jobs like those Army jobs to create a unique vector of skills associated with the Army. Veterans are often crowded out in job searches because they don’t know what skills they have acquired in their tenure. We utilize a unique ONET crosswalk that connects Army MOS codes with SOC codes and then connects these to job ads from Burning Glass. This research provides an overview of skills acquired in the Army which can provide information to Army Veterans on the jobs they may be best suited for and the skills they can place on their resumes.Implementing Text-Based AIs to Investigate and Measure Private Sector Software Innovation
Students and Mentors: Digvijay Ghotane, Aditi Mahabal, and Akilesh S Ramakrishna and Drs. Neil Alexander Kattampallil, Devika Mahoney-Nair, and Gizem Korkmaz
We study the landscape of product innovation in the computer software sector, leveraging publicly available opportunity data news articles obtained from Dow Jones, a business news and data provider. We implement a series of Bidirectional Encoder Representations from Transformers (BERT) neural networks, a sophisticated natural language processing method, for a number of tasks. Our work developed a BERT classification model to identify news articles describing innovation broadly, making use of a training set of 600 manually labeled articles and demonstrating an accuracy rate of over 96%. This model was then applied to 1 year’s worth of news articles about the computer software industry to predict which articles describe innovation. We applied a different BERT algorithm to this set of predicted innovation articles for named entity recognition, which was used here to extract the company and new product names mentioned in these predicted innovation-describing articles.Classifying and Measuring Open Source Software Projects on GitHub
Students and Mentors: Crystal Zang, Cierra Oliveira, and Stephanie Zhang and Drs. Brandon Kramer and Gizem Korkmaz
Over the past few years, our research group has advanced a number of computational approaches to measure the scope and impact of open source software (OSS), including a method that evaluates the resource costs of source code development in online platforms (e.g., Robbins et al. 2018). The goal of this current project is to address how different software types may impact economic evaluations of OSS. During our 2021 Data Science for the Public Good Young Scholars Summer Program, our team has begun to develop a methodology to help researchers study different software types through the use of computational text analysis. Drawing on 10+ million repositories scraped from GitHub, the world’s largest code hosting platform, we detail an approach that classifies software using the information provided on repositories such as README files and repository descriptions. The categories are based on Fleming’s (2021) proposed classifications of software price indices and another prominent code hosting platform named SourceForge. After detailing these category types, we discuss how we use dictionary-based and unsupervised computational text analysis to classify these GitHub repositories. More specifically, we plan to probabilistically match repositories to predefined categories using text-based similarity metrics. After detailing this methodology, we talk about some potential use cases that this approach may proffer and its potential impact on developing novel economic evaluations of OSS tools.R&D Text Corpora Filtering and Data Mining
Students and Mentors: Crystal Zang, Haleigh Tomlin, and Cierra Oliveira and Drs. Joel Thurston, Eric Oh, Stephanie Shipp, and Kathryn Linehan
We use administrative data for federal grants to discover research topics and their trends in the area of artificial intelligence (AI). Our data source is Federal RePORTER, a database of federally funded research grants that includes project abstracts and other project data such as funding agencies and start years. We filter Federal RePORTER project abstracts for those that describe projects about AI. AI is a complex and hard-to-define theme, so this filtering problem is challenging. We utilized three different filtering methods: 1) an AI term matching method proposed by the Organization for Economic Co-operation and Development (OECD), 2) a method by Eads et al., which utilizes term matching and topic modeling, and 3) a Sentence BERT (bidirectional encoder representations from transformers) method that compares the similarity between the AI Wikipedia page and each grant abstract. Each filtering method produces an AI-themed corpus on which we run a non-negative matrix factorization (NMF) topic model. Using linear regression and visualization, we analyze the topic model results to discover AI research trends in projects that were federally funded.A Racial Equity Case Study of the Provision of Parks and Other Amenities in Arlington County
Students and Mentors: Morgan Stockham, Digvijay Ghotane, Asia Porter, and Madeline Garrett and Drs. Eric Oh, Kathryn Linehan, and Aaron Schroeder
We examine the landscape of the provision of parks and their amenities by the Parks and Recreation Department in Arlington County from a racial equity lens. Combining data from the American Community Survey, Arlington Open Data Portal, CoreLogic, and scraped web information, we characterize the extent to which Arlington County is providing services that align with various communities’ needs and desires. In addition to racial (and other demographic) breakdowns, we consider other factors that may influence one’s needs for certain amenities such as car ownership, presence of young children, and type of housing (e.g. single-family home or apartment building) and their intersections with race. We then calculate isochrones to determine how long residents of each neighborhood must travel to get to various parks and access certain amenities and overlay the factors described above to determine varying levels of access.What are the certifications that lead to a job in the Skilled Technical Workforce?
Students and Mentors: Emily Kurtz, Haleigh Tomlin, and Madeline Garrett and Drs. Vicki Lancaster, Cesar Montalvo
In this project, we are studying the non-degree credentials needed for a job in the Skilled Technical Workforce (STW). This work is important because the skilled technical workforce is a fast-growing and crucial sector of the US economy, and these STW jobs can offer a path to the middle class for millions of Americans. Despite the importance of the skilled technical workforce, there are massive data gaps between the skilled technical workforce players, namely federal and state governments, employers, and the educational institutions that provide non-degree credential training. We spent this summer bridging some of these data gaps. First, we used R’s web scraping capabilities to collect the certifications associated with the 133 skilled technical workforce occupations listed in the Department of Labor’s Occupational Information Network (ONET). Then, we collected the certifications demanded by employers for these occupations in job ads from Burning Glass Technologies. We used Natural Language Processing techniques to standardize the job-ad certifications using the ONET certification names. Finally, we used network analysis to visualize the connections between occupations and certifications, thus highlighting paths workers could potentially take in the STW.UVA School of Data Science
Stroke and COVID Population: A Health Equity Analysis
Students and Mentors: Ethan Assefa, Esau Hutcherson, Suliah Apatira, Dahnielle Milton, and Rehan Javaid and Sucheta Sharma, Dr. Andrew Southerland, and Donald E. Brown
To date, COVID-19 has claimed the lives of 449,020 Americans and has infected tens of millions more. In doing so, this massively sweeping pandemic has uncovered systemic flaws leading to the unequal share of the burden to be held by racial minorities. In fact, Black, Hispanic, and Indigenous Americans have 1.5x higher infection rates, 4x higher hospitalization, and 2.7x higher death rates than White Americans. Data shows that this disparity exists across all age groups with the potential for furthering devastation in minority communities. Currently, 41% of all new COVID-19 infections are assigned to persons 35-49 years old. Unfortunately, 35-44-year-old Black and Hispanic Americans are 8-11x more likely to die following COVID-19 infections compared to their White counterparts. Despite these startling statistics, racial minorities are receiving COVID-19 vaccines at dramatically lower rates, with the majority of states reporting vaccination patterns along lines of race showing a 2-4x higher vaccination rate for White vs Black Americans. The cause of these disparities in outcomes vs intervention is multifactorial due to the compounding issues of lack of access, health care bias, and the presence of negative factors impacting social determinants of health largely assigned to Black and Hispanic communities. Knowing that we aim to investigate whether or not there are racial disparities in medical resource allocation in stroke patients who are COVID-19 positive. This project will use the limited dataset N3C data. The focus will be on ischemic stroke and we will explore if particular patterns suggest the COVID-19 stroke outcomes/treatments and COVID-care patterns differ along lines of race.Virginia Tech
Exploring the Skill Content of Jobs in Appalachia
Students and Mentors: Timothy Pierce, Ryan Jacobs, Austin Burcham, and Yang Cheng and Drs. Anubhab Gupta and Susan Chen
We study the industries that comprise Appalachian labor markets, the jobs these industries provide, and the skills required for these jobs. Our work uses individual-level Integrated Public Use Microdata Series (IPUMS) data and occupation-specific O*NET data to understand the skill content of labor in Appalachian communities. We construct an index of skills for each occupation and then aggregate this to the PUMA level. The result is a PUMA-level index that characterizes the skill content of workers in Appalachia. Typifying the current skill endowment of workers in Appalachia will help us understand how a transition away from the current industrial mix towards more green industries, with potentially new skill content, will affect the labor market of these rural communities.Water Resource Management and Industry and Residential Growth in Floyd County
Students and Mentors: Esha Dwibedi, Julie Rebstock, Ryan Jacobs, and John Wright and Drs. Sarah M. Witiak and Brianna Posadas
We studied the factors that affect the water quality and quantity issues existing within Floyd County. We performed relevant literature review and had focused discussions with both the stakeholders and relevant experts on the area’s geology and water issues, which informed our data scraping directives. We found that nearly all the residents in the county rely on well and natural spring systems for their water supply, so these systems were a major focus of our findings. Due to the lack of data on the area’s groundwater resources, we developed various models to indirectly estimate the county’s water resources. We used remote sensing data from both the GRACE satellites and the Landsat 8 satellites to develop estimates for the water quantity trends in the area. The GRACE satellite data was used to estimate temporal trends of the water table anomalies for the county. The Landsat 8 satellite imagery was used to develop a neural network model that used the Normalized Difference Water Index (NDWI) alongside precipitation, elevation, and well water depth values from counties across Virginia to estimate the water table depth for the county. Alongside looking at water quantity, we also studied the water quality issues in the county and probed into factors that might lead to potential contamination of the county’s water resources. We identified the geology of the area alongside potential surface contamination sources and household plumbing issues to be the major sources of contamination in the county. We further identified strategies that the county could utilize for sustainable and efficient use of its water resources for future industrial and residential development.Tracking Indicators of the Economic and Social Mobility of the Black Community in Hampton Roads
Students and Mentors: Avi Seth, Matthew Burkholder, Christina Prisbe, Victor Mukora, and Kwabena Boateng and Drs. Isabel Bradburn and Chanita Holmes
Hampton Roads is a coastal region of Virginia comprised of 10 cities and six counties. It represents most of the Virginia Beach-Norfolk-Newport News metropolitan statistical area, the 37th largest MSA in the United States. Black families represent 31% percentage of the area's population, and ~15% of them are below the poverty line – this is nearly double the general population of Hampton Roads, 8.1% of which is below the poverty line. This project uses publicly available Census data to analyze trends and statistics on key indicators of the economic well-being of the black community in Hampton Roads. We compare these indicators across the Hampton Roads localities and the Virginia population. We visualize the data through a dashboard to provide insights to regional stakeholders to plan policies and activities to positively affect the community.Service Provision for Vulnerable Transition-Aged Youth in Loudoun County, Virginia
Students and Mentors: Yang Cheng, JaiDa Robinson, Julie Rebstock, Austin Burcham, and Kyle Jacobs and Drs. Isabel Bradburn and Chanita Holmes
Transition Aged Youth (TAY), young adults ages 18-24, encounter numerous difficulties in their transition to adulthood. The transition can be especially difficult for youths "aging out” of foster care or those exiting the juvenile detention system. Motivated by the Loudoun County Human Services Strategic Plan 2019-2024, we identify the availability of services for TAYs in five major areas: education, employment, housing, transportation, and health. This project uses geospatial mapping and interactive tress to identify intra-county variation in services provision and utilization. We also conducted cross-county analysis between Loudoun County and Fairfax County, which is inside Virginia, and Allegany County, which is outside Virginia. By unveiling the difference in the demographics, we identified those who disproportionately served TAY.Analyzing Vegetative Health Using Landsat 8 Satellite Imagery
Students and Mentor: Esha Dwibedi, Avi Seth, Atticus Rex, and Victor Mukora and Dr. Briana Posadas
The Normalized Difference Vegetative Index and Normalized Difference Water Index are indices developed to assess the vegetative health and water content of plants. These indices can be calculated using different wavelengths of light captured in high-resolution satellite imagery. The goal of this project is to analyze these indices in the New River Valley using 11 bands of reflected light captured by the Landsat 8 satellite. This research contains machine learning forecasting algorithms and analysis of literature to use these indices in areas such as precision agriculture, groundwater detection, coastal flooding, and drought. This project uses raw satellite images taken of the region, and constructs filters, indices, and subsets of the region by decomposing the wavelengths of light collected in each photograph. These subsets were supplied to a feed-forward neural network to obtain a robust prediction model for the New River Valley.Availability of Services: Evolving Demographics, Housing, and Traffic in Rappahannock County
Students and Mentors: Timothy Pierce, Christina Prisbe, and Mousa Toure and Leonard-Allen Quaye, Drs. Anubhab Gupta and Mulugeta Kahsai
We used publicly available data from the American Community Survey (ACS) to explore questions and concerns held by stakeholders in Rappahannock County, Virginia. Our work involved the creation of a county profile for Rappahannock that displays information about age, race, income, employment, housing prices, and more. Additionally, we analyzed traffic volume data from the Virginia Department of Transportation to identify areas of increased or decreased traffic in the last ten years (2010-2020). Finally, we aggregated community services and resources into a single dashboard that allows us to visualize the availability of services to residents of the county. Using the county profile, traffic volume data, and service data, we can provide data-driven descriptions of service provision in Rappahannock County, Virginia during the last decade.Using PICES Data to Visualize District-Level Multidimensional Poverty in Zimbabwe
Students and Mentors: Yang Cheng, Matthew Burkholder, and Atticus Rex and Sambath Jayapregasham, Drs. Susan Chen, Anubhab Gupta, and Jeffrey Alwang
Prior research suggests that poverty in Zimbabwe has increased since the period of crisis began at the turn of the millennium. According to the latest World Bank estimates, almost 49% of the population of Zimbabwe was in extreme poverty in 2020. Our stakeholders seek solutions to the economic situation. They would like more granular information presented in creative ways that allow the user to glean the multidimensional and temporal aspects of poverty in Zimbabwe. The recent availability of household surveys for public use has opened the possibility of using the data to inform evidence-based policy. This project uses data from the Poverty, Income, Consumption, Expenditure Survey (PICES) to provide granular information on poverty in Zimbabwe. We created multidimensional poverty indices (MPI) at the district and province level and decomposed them into components that focus on education, health, employment, housing conditions, living conditions, assets, agricultural assets, and access to services. We provide interactive tools that allow the user to visualize and study each component and understand their contribution to the MPI. We constructed these measures for two waves of data in 2011 and 2017 to show the changes in poverty over time and across regions in Zimbabwe. The composition and decomposition of MPI in this project provide policy implications for informing evidence-based policy and interventions for poverty reduction.Iowa State University
“Just the Facts” on Educational Attainment
Students and Mentors: Laailah Ali, Max Ruehle, and Ellie Uhrhammer and Dr. Chris Seeger and Bailey Hanson
How does the educational path of minorities, women, or older adults differ from those of other general population individuals in Iowa? What types of jobs do various educational pathways lead to for minorities? This project aims to develop a series of indicators that identify the post-secondary educational attainment of disproportionately impacted communities in Iowa. The team has investigated data related to educational opportunities, attainment, and outcomes of the identified population groups. These data have been cleaned and integrated into a data pipeline. To finally be presented as engaging, unbiased infographics and visuals through a publication series titled Just the Facts. The team has worked closely with the DHR educational attainment and economic and workforce development teams to collaborate on and share general data resources such as population, demographics, and languages spoken that are specifically related to the identified disproportionately impacted populations in Iowa. The team has worked closely with the DHR educational attainment and economic and workforce development teams to collaborate with and share general data resources such as population, demographics, and languages spoken that specifically relate to the identified disproportionately impacted populations in Iowa.DHR “Just the Facts” on Economic and Workforce Development
Students and Mentors: Joseph Zemmels, Avery Schoen, Dylan Mack, and Zack Johnson and Dr. Chris Seeger and Bailey Hanson
The mission of the Iowa Department of Human Rights (IDHR) is to empower underrepresented Iowans by advocating for the elimination of economic, social, and cultural barriers to full participation in civic life. To that aim, we analyzed data and created indicators to identify employment and earnings opportunities for disproportionately impacted communities in Iowa. These communities include racial and ethnic minorities, women, and individuals with disabilities. We also developed a web application to explore language usage across the state of Iowa.Iowa’s Integrated Data System for Decision-Making (Early Childhood Iowa)
Students and Mentors: Avery Schoen, Dylan Mack, and Sonyta Ung and Dr. Todd Abraham
The purpose of this project is to build an interactive dashboard for Early Childhood Iowa with the capacity to connect with I2D2 and identify national, state, and local sources. We aimed at 22 indicators as our primary data resource of the dashboard, including the data from IDPH, IDSH, CDC, etc. We implemented several tools to scrape the data that we collected from each indicator in a different format, and the dashboard users can pull the data from the dashboard in PDF or CSV file format. Furthermore, we also created various visualizations for the dashboard user to analyze the data more directly.Supporting Eat Greater Des Moines and Food Rescue in Central Iowa
Students and Mentors: Zack Johnson, Ellie Uhrhammer, and Saul Varshavsky and Dr. Adisak Sukul
This project looks at the non-profit organization Eat Greater Des Moines (EGDM) and its food rescue efforts. EGDM takes donations of surplus food from grocery and convenience stores, restaurants, and other locations and transports it to food pantries, non-profits, schools, housing locations, and other organizations that can distribute food to those who need it. In the project, the team used data provided by EGDM and other sources to demonstrate where food rescue currently happens, where it can be expanded, and what areas can benefit most from food rescue. The team has also built a data pipeline and dashboard that is sustainable for EGDM and will be used by the organization moving forward to support its food rescue efforts.Quality of Life in Small and Shrinking Cities in Iowa
Students and Mentors: Amanda Rae, Laailah Ali, Max Ruehle, and Jack Studier and Dr. Heike Hoffman
This project focuses on factors affecting the perception of quality of life in small and shrinking rural communities in Iowa. The goal is to help communities focus their limited resources on improving quality of life rather than using scarce resources to try to grow (as this is unlikely in most towns). Residents and leaders of small rural towns are collaborators and stakeholders of the umbrella NSF project. The team is building a community information ecosystem that will be available through an online web application. This ecosystem makes use of publicly available data and links it to some proprietary data sets to help communities understand, utilize, and collect new data about their towns and peer communities. The ecosystem will use statistical modeling and cutting-edge visualization strategies to make data more accessible to stakeholders in these communities, including city staff, local leaders, and the public.Assessing the Impact of Publicly Accessible Research Data: What Can Repositories Tell Us About Data Reuse?
Students and Mentors: Tiancheng Zhou, Jack Studier, Saul Varshavsky, and Sonyta Ung and Dr. Adisak Sukul
In this project, we aim to understand better what makes a data source reusable to another researcher by focusing on the repository component of the data-sharing ecosystem. We have explored a list of data repositories, looked for the associated metrics that suggest reusing a data source, and analyzed factors associated with higher levels of reuse and potential impact. Our two approaches are getting API requests and HTML scraping, which helped us extract the metrics from the repositories we assigned to, and use correlation plots to analyze the impact of reusability from each metric. Overall, this study is a repository-focused complement to a larger researcher-centered effort to develop a path for accelerating community readiness in creating reusable publicly accessible data products.Virginia State University
Equity in Access to Parks in Chesterfield, Virginia
Students: R. Mousa Touré, Kyle Jacobs, and Kwabeana Boateng
Chesterfield County's Department of Parks and Recreation is concerned with the changing demographics of Chesterfield County as the population increases, ages, and becomes more diverse. As Chesterfield grows, the Department of Parks and Recreation wants to ensure that the people served have equitable access to their facilities. For this project, we needed to find a way to define and quantify equity. The next leg of the project is determining equitable access, which involves an understanding of Chesterfield’s parks and their facilities. Our goals were to determine travel time and distance data for each park, create a measure of equity for each park, and rank each park based on its ability to serve its surrounding area. Using Chesterfield Parks’s GeoSpace resources we were able to attain detailed location and facilities data on their parks. The park’s qualities were ranked based on total facilities to provide a rudimentary “quality score” that can be further refined with more time and analysis of other park scoring systems. our literature determined the best variables to research for equity included analyzing vulnerable demographics, which we collected using the US Census’s 5-Year American Community Surveys and geospace information to determine the number of vulnerable population demographics on a census tract level and estimate an approximate number of of people of these groups that are closest to the parks in their census areas. With this information and further analysis, we can get a better understanding of who is closest to which parks and with further literature analysis understand if the facilities of each park best serve the communities closest to them.Understanding Unemployment in the Prince George and Hopewell Region
Students: John Wright and JaiDa Robinson
The current unemployment rate in the Prince George and Hopewell Region is a concern for the Prince George and Hopewell Chamber of Commerce. There are several barriers to employment according to the literature; however, we focused on job demands, transportation, education levels, and skills required for employment. Using data from the American Community Survey, Virginia Employment Commission, and Jobs EQ for Workforce, the project used exploratory data analysis tools to visualize the distribution of unemployment, taking into account job demands, transportation, education levels, and skills required. In addition, we explored the demand for labor and occupational gaps in the area. It seems that the job openings are unequally distributed in the area. For instance, it looks like most currently available jobs are available closer to the densely populated area and have a more diverse industry for employment. Whereas the sparsely populated area may have a higher travel time to work. This indicates that residents need to have means of transportation to get to current job opening locations. We compared the education level requirements of current job postings to the current education level of the unemployed. For example, in Hopewell, 318 people are collecting unemployment benefits with a high school diploma or higher and there are about 400 job ads that require one. This indicates there may be enough jobs available based on education, but they may need additional skills to obtain employment. It seems that 40- 50% of the current job postings for each area require hard cognitive skills such as Cash Handling and Microsoft skills. Whereas it seems that 35% of the current job opening ads require physical skills to lift 50lbs or more. The next step would be to explore the skill sets that the unemployed have. In addition, we would do statistical analysis to estimate the relationship between unemployment and the different barriers to employment.
Are you a current undergraduate, graduate student, or high schooler interested in working with BI?
Please contact Savanna Galambos, Assistant Director of Administrative Affairs.