Project Details
The U.S. Army Research Institute for the Behavioral and Social Sciences
Our collaboration with the Army Research Institute for the Behavioral and Social Sciences signals a new era for data science in the military. Together we are assessing the ability for researchers to access, evaluate the quality of, and integrate Department of Defense data to support decision-making related to the military population and to expand the types of questions the data might inform —in particular, the study of important recurring issues to the military, such as attrition. DOD collects and maintains massive amounts of administrative and survey data about its military population and their families. This effort develops research approaches to combine available DOD transactional records with federal and other open data sources to provide insights that go beyond those typically obtained from single-source analyses of Soldiers. Demonstrated on a large-scale study of first-term attrition for enlisted Army Soldiers this effort walks through our data framework to discover data, access the data quality and fitness for use, creates intermediate data products, and develops statistical modeling approaches to handle this broad collection of data.
To carry out a broad-scale analysis of attrition of first-term Army enlisted Soldiers, we combine a wide variety of DOD administrative data and external survey data sources to produce detailed information on over 700,000 Soldiers over six years (2009--2015). While this collection of data holds rich detail regarding attrition, its size and structure make it difficult to fully extract this information. For this reason, we use a statistical model to handle this large data set and produce an analysis of factors that affect attrition over this time period. We use a discrete-time Bayesian hierarchical modeling approach that makes it possible to extract detailed effects from these data while accounting for the uncertainty and dependence within the data. A three-fold strategy is executed for carrying out a broad-scale analysis of attrition of first-term Army enlisted Soldiers. This strategy
- combines a wide variety of DOD, Federal, and open data sources within the Army Analytics Group (AAG) Research Facilitation Lab's (RFL) Person-Event Data Enclave (PDE) to produce detailed information,
- adapts a Bayesian hierarchical discrete-time hazard model to handle these data that involves over 80 million transactions, and
- produces an analysis of factors that affect attrition over this time period.
This effort is substantial in that it has brought together a broad collection of data sources within the PDE to inform the Army about the state of attrition of enlistees who leave before their first term ends. While this collection of data likely holds rich detail regarding attrition, its size and structure make it difficult to fully extract this information. Hence we use a discrete-time Bayesian hierarchical modeling approach that makes it possible to extract detailed effects from these data while accounting for the uncertainty and dependence within the data.
To start we simulated records from an agent-based model to create a digitaltwin of theArmy for characterizing how attritions take place. Starting with this controlled, synthetic system allows us to encode simple, probabilistic rules that affect attrition. For example, attrition probability may depend only upon the unit the Soldier belongs to and the length of time they have served. Simulated records from this model are used to refine and test the attrition model before implementing it on the full Army dataset within PDE.
Figure 1 shows an agent-based model output for over 10 years of enlistments and attritions. These simulated enlistment histories are for individual Soldiers over 10 years with Soldiers.
Using this simulation approach allowed us to refine the final method, a Bayesian hierarchical discrete-time hazard model. This approach can handle multiple time scales (e.g. year and time of service), random individual effects (e.g. individual’s taste for military service), and flexible, non-parametric specifications for continuous effects (e.g. age, days deployed). Efficient inference for model parameters is done via Markov chain Monte Carlo.
The full model uses DOD data from a variety of sources within the PDE including a master personnel file, an analyst file with data collected at enlistment, and a transaction file. These provide individual (de-identified) data on the enlistees and their characteristics, information before entering the Army such as test scores and educational attainment, and transactional information (or events) upon joining the Army, including information on promotion progression, duty location, and attrition. Other data on training, physical fitness, and disciplinary actions are linked from the Digital Training Management System (DTMS) and Interactive Personnel Elective Records Management System (IPERMS). Non-DOD estimates from the American Community Survey (ACS) and Bureau of Labor Statistics Quarterly Census of Earnings and Wages (QCEW) are also included at the county level. These variables offer a description of the community surrounding the location where a Soldier is stationed.
Figures 2 and 3 show some of the results from this model. In Figure 2, we see the risk of attrition is much higher for Soldiers for whom we observe disciplinary actions including court martials, article 15 hearings, and letters of reprimand. In Figure 3, we look at the characterization of the community around the base where a Soldier is currently serving. We find that lower-income communities are associated with a higher risk of attrition, while communities with a large veteran population are associated with a lower risk.
Figures
Cox, D. R. (1972). Regression models and life-tables (with discussions). Journal of the Royal Statistical Society (Series B), 34:187–220.