Utility Mobile

Soldiers silhouettes getting on plane

Soldier Attrition

Contacts
Sponsor

The U.S. Army Research Institute for the Behavioral and Social Sciences

Our collaboration with the Army Research Institute for the Behavioral and Social Sciences signals a new era for data science in the military. Together we are assessing the ability for researchers to access, evaluate the quality of, and integrate Department of Defense data to support decision-making related to the military population and to expand the types of questions the data might inform —in particular, the study of important recurring issues to the military, such as attrition. DOD collects and maintains massive amounts of administrative and survey data about its military population and their families. This effort develops research approaches to combine available DOD transactional records with federal and other open data sources to provide insights that go beyond those typically obtained from single-source analyses of Soldiers. Demonstrated on a large-scale study of first-term attrition for enlisted Army Soldiers this effort walks through our data framework to discover data, access the data quality and fitness for use, creates intermediate data products, and develops statistical modeling approaches to handle this broad collection of data.

Project Overview

To carry out a broad-scale analysis of attrition of first term Army enlisted Soldiers, we combine a wide variety of DOD administrative data and external survey data sources to produce detailed information on over 700,000 Soldiers over a six-year period (2009--2015). While this collection of data holds rich detail regarding attrition, its size and structure makes it difficult to fully extract this information. For this reason we use a statistical model to handle this large data set, and produce an analysis of factors that affect attrition over this time period. We use a discrete-time Bayesian hierarchical modeling approach that makes it possible to extract detailed effects from these data while accounting for the uncertainty and dependence within the data.A three-fold strategy is executed for carrying out a broad-scale analysis of attrition of first term Army enlisted Soldiers. This strategy

  1. combines a wide variety of DOD, Federal, and open data sources within the Army Analytics Group (AAG) Research Facilitation Lab's (RFL) Person-Event Data Enclave (PDE) to produce detailed information,
  2. adapts a Bayesian hierarchical discrete-time hazard model to handle these data that involves over 80-million transactions, and
  3. produces an analysis of factors that affect attrition over this time period.

This effort is substantial in that it has brought together a broad collection of data sources within the PDE to inform the Army about the state of attrition of enlistees who leave before their first term ends. While this collection of data likely holds rich detail regarding attrition, its size and structure makes it difficult to fully extract this information. Hence we use a discrete-time Bayesian hierarchical modeling approach that makes it possible to extract detailed effects from these data while accounting for the uncertainty and dependence within the data.

Findings

To start we simulated records from an agent-based model to create a digitaltwin of theArmy for characterizing how attritions take place. Starting with this controlled, synthetic system allows us to encode simple, probabilistic rules that affect attrition. For example, attrition probability may depend only upon the unit the Soldier belongs to and the length of time they have served. Simulated records from this model are used to refine and test the attrition model before implementing it on the full Army dataset within PDE.

Figure 1 shows an agent-based model output for over 10 years of enlistments and attritions. This simulated enlistment histories are for individual Soldiers over the course of 10 years with Soldiers.

Using this simulation approach allowed us to refine the final method, a Bayesian hierarchical discrete-time hazard model. This approach is able to handle multiple time scales (e.g. year and time of service), random individual effects (e.g. and individual’s taste for military service), and flexible, non-parametric specifications for continuous effects (e.g. age, days deployed). Efficient inference for model parameters is done via Markov chain Monte Carlo.

The full model uses DOD data from a variety of sources within the PDE including a master personnel file, an analyst file with data collected at enlistment, and a transaction file. These provide individual (de-identified) data on the enlistees and their characteristics, information prior to entering the Army such as test scores and educational attainment, and transactional information (or events) upon joining the Army, including information on promotion progression, duty location, and attrition. Other data on training, physical fitness, and disciplinary actions are linked from the Digital Training Management System (DTMS) and Interactive Personnel Elective Records Management System (IPERMS). Non-DOD estimates from the American Community Survey (ACS) and Bureau of Labor Statistics Quarterly Census of Earnings and Wages (QCEW) are also included at the county level. These variables offer a description of the community surrounding the location where a Soldier is stationed.

Figures 2 and 3 show some of the results from this model. In Figure 2, we see the risk of attrition is much higher for Soldiers for whom we observe disciplinary actions including courts martial, article 15 hearings, and letters of reprimand. In Figure 3, we look at the characterization of the community around the base where a Soldier is currently serving. We find that lower income communities are associated with a higher risk of attrition, while communities with a large veteran population are associated with a lower risk.

Team

Distinguished Professor in Biocomplexity, Biocomplexity Institute

Professor of Public Health Sciences, School of Medicine

Acting Division Director

Research Professor

References

Cox, D. R. (1972). Regression models and life-tables (with discussions). Journal of the Royal Statistical Society (Series B), 34:187–220.