silhouettes of soldiers getting on a plane at sunset

Project Details

Funding Agency

The U.S. Army Research Institute for the Behavioral and Social Sciences

Our collaboration with the Army Research Institute for the Behavioral and Social Sciences signals a new era for data science in the military. Together we are assessing the ability for researchers to access, evaluate the quality of, and integrate Department of Defense data to support decision-making related to the military population and to expand the types of questions the data might inform —in particular, the study of important recurring issues to the military, such as attrition. DOD collects and maintains massive amounts of administrative and survey data about its military population and their families. This effort develops research approaches to combine available DOD transactional records with federal and other open data sources to provide insights that go beyond those typically obtained from single-source analyses of Soldiers. Demonstrated on a large-scale study of first-term attrition for enlisted Army Soldiers this effort walks through our data framework to discover data, access the data quality and fitness for use, creates intermediate data products, and develops statistical modeling approaches to handle this broad collection of data.

 

Project Overview

To carry out a broad-scale analysis of attrition of first-term Army enlisted Soldiers, we combine a wide variety of DOD administrative data and external survey data sources to produce detailed information on over 700,000 Soldiers over six years (2009--2015). While this collection of data holds rich detail regarding attrition, its size and structure make it difficult to fully extract this information. For this reason, we use a statistical model to handle this large data set and produce an analysis of factors that affect attrition over this time period. We use a discrete-time Bayesian hierarchical modeling approach that makes it possible to extract detailed effects from these data while accounting for the uncertainty and dependence within the data. A three-fold strategy is executed for carrying out a broad-scale analysis of attrition of first-term Army enlisted Soldiers. This strategy

  1. combines a wide variety of DOD, Federal, and open data sources within the Army Analytics Group (AAG) Research Facilitation Lab's (RFL) Person-Event Data Enclave (PDE) to produce detailed information,
  2. adapts a Bayesian hierarchical discrete-time hazard model to handle these data that involves over 80 million transactions, and
  3. produces an analysis of factors that affect attrition over this time period.

This effort is substantial in that it has brought together a broad collection of data sources within the PDE to inform the Army about the state of attrition of enlistees who leave before their first term ends. While this collection of data likely holds rich detail regarding attrition, its size and structure make it difficult to fully extract this information. Hence we use a discrete-time Bayesian hierarchical modeling approach that makes it possible to extract detailed effects from these data while accounting for the uncertainty and dependence within the data.

Findings

To start we simulated records from an agent-based model to create a digitaltwin of theArmy for characterizing how attritions take place. Starting with this controlled, synthetic system allows us to encode simple, probabilistic rules that affect attrition. For example, attrition probability may depend only upon the unit the Soldier belongs to and the length of time they have served. Simulated records from this model are used to refine and test the attrition model before implementing it on the full Army dataset within PDE.

Figure 1 shows an agent-based model output for over 10 years of enlistments and attritions. These simulated enlistment histories are for individual Soldiers over 10 years with Soldiers.

Using this simulation approach allowed us to refine the final method, a Bayesian hierarchical discrete-time hazard model. This approach can handle multiple time scales (e.g. year and time of service), random individual effects (e.g. individual’s taste for military service), and flexible, non-parametric specifications for continuous effects (e.g. age, days deployed). Efficient inference for model parameters is done via Markov chain Monte Carlo.

The full model uses DOD data from a variety of sources within the PDE including a master personnel file, an analyst file with data collected at enlistment, and a transaction file. These provide individual (de-identified) data on the enlistees and their characteristics, information before entering the Army such as test scores and educational attainment, and transactional information (or events) upon joining the Army, including information on promotion progression, duty location, and attrition. Other data on training, physical fitness, and disciplinary actions are linked from the Digital Training Management System (DTMS) and Interactive Personnel Elective Records Management System (IPERMS). Non-DOD estimates from the American Community Survey (ACS) and Bureau of Labor Statistics Quarterly Census of Earnings and Wages (QCEW) are also included at the county level. These variables offer a description of the community surrounding the location where a Soldier is stationed.

Figures 2 and 3 show some of the results from this model. In Figure 2, we see the risk of attrition is much higher for Soldiers for whom we observe disciplinary actions including court martials, article 15 hearings, and letters of reprimand. In Figure 3, we look at the characterization of the community around the base where a Soldier is currently serving. We find that lower-income communities are associated with a higher risk of attrition, while communities with a large veteran population are associated with a lower risk.

 

Figures

Trajectories of individual enlistees in an agent-based model
Figure 1. This plot shows the trajectories of individual enlistees in an agent-based model. Each vertical line shows a Soldier’s history –the line begins at the time of enlistment, color shows the Soldier’s unit assignment over time, and the plotting symbol shows whether or not the Soldier attritted or served out their initial term. Simulated records from the agent-based model are used to refine and test our statistical methodology before implementing models on the full Army enlisted population.
Sample outputs from the Bayesian attrition model
Figure 2. Sample outputs from the Bayesian attrition model. These show the fitted relative hazard rate, measuring the risk of attrition by disciplinary actions including courts martial, article 15 hearings, and letters of reprimand. Each is recorded as a binary factor based on whether an infraction of this type is observed in the Interactive Personnel Electronic Management System (IPERMS). Each of these outcomes is associated with higher risk of attrition.
The fitted relative hazard rate from the Bayesian attrition model
Figure 3. The fitted relative hazard rate from the Bayesian attrition model, measuring the risk of attrition by properties of the community around the base where the Soldier is currently stationed, measured using 5-year county level estimates from the American Community Survey. Left: The percent of people in the surrounding community who receive food stamps or Supplemental Nutrition Assistance Program (SNAP) benefits. A higher percent on SNAP is associated with a higher likelihood of Soldier attrition; these bases tend to be in lower income areas. Right: The percent of population who are veterans. Soldiers in areas with a large veteran population tend to see lower than average attrition rates.
References

Cox, D. R. (1972). Regression models and life-tables (with discussions). Journal of the Royal Statistical Society (Series B), 34:187–220.