Development of standard loss analysis model using big data: Focusing on Republic of Korea army

The Republic of Korea Army conducts simulations during peacetime using ground operation resource requirements analysis model ( G ORRAM) to determine potential losses when at war based on the latest operation plan. Although war-game simulation can yield reliable results, it takes considerable amount of time and effort to build a database and generate scenarios. Therefore, a study is required to supplement the detailed war-game simulation method to quickly determine expected losses. Using data built-in G ORRAM, we tested the significance of four factors using beta regression analysis. W hile multiple regression is most commonly used to model the causality, beta regression is a powerful method for modeling response variables in the (0,1) range, such as the loss ratio. W e verified that three factors, namely ‘ topography ’ , ‘ operational posture ’ , and ‘ friend / foe power ratio ’ were related to loss. This study proposes a new method for calculating the expected loss in real-time, overcoming a limitation of existing war-game simulation methods.


Ⅰ. INTRODUCTION
It is important to predict the outcome of a battle in wartime and to prepare resources, such as personnel, equipment, fuel, and ammunition, based on the predicted results. Recently, Ukraine did not fully reserve the artillery ammunition required for the war, therefore they could not respond effectively to Russia's attacks. The Center for Army Analysis and Simulations (CAAS) has been analyzing wartime resource requirements since the 1980s. The analysis of wartime resource requirements is conducted periodically whenever the operation plan and environment are changed. CAAS conducted the analysis using the ground operation resource requirements analysis model (GORRAM) developed in 2010. Using this model, we analyzed the war-time resource requirements of personnel, equipment, fuel, ammunition, repair parts, and materials.
The results of wartime resource requirement analysis (WRRA) are used as the basis for writing various war documents related to inventory requirements and military force, such as the joint strategic objective plan (JSOP) and oil war stockpile documents, according to the directive of the Korea Ministry of Defense. Although war-game simulation can provide relatively reliable results (De Lima Filho et al., 2022;Mittal & Davidson, 2020;Turnitsa, Blais, & Tolk, 2021), it is impossible to provide the results of judging the expected loss after simulating the war-game within a short period of time (e.g., Brathen, Seehuus, & Mevassvik, 2021;Hujer, Kratky, & Farlik, 2020).
During the recent combined command post-training (CCPT), we analyzed the expected loss reflecting the operational environment within a short period of time. This demand is a great challenge. This is because it takes several weeks to simulate a changed operating environment using war-game simulation. Therefore, a study is required to supplement detailed war-game simulation methods and to quickly determine expected losses by reflecting real-time changes in battlefield situations during war and CCPT. The purpose of present study is to develop a standard loss analysis model to overcome the limitations of the existing war game model, using a regression equation to predict the loss of equipment and troops. The three areas of focus in the standard loss analysis model are as follows.
First, we verified which factors were related to the loss of personnel and equipment in previous studies and documents using statistical techniques, such as topography, operational posture, and force ratio. Second, we calculated the regression formula for loss to predict the loss of personnel and equipment based on the war-game simulation data accumulated in GORRM.
Finally, we developed a program that could be used during wartime and CCPT. The statistical method used in this study was a beta regression model. The beta regression model is a type of generalized linear regression and is a method for exploring the causal relationship between response variables from 0 to 1, such as ratio and interval, and explanatory variables.
The remainder of this paper is organized as follows. Section 2 briefly introduces the force scoring mechanisms, including the background of the standard loss analysis model in the ROK Army. In section 3, we provide the estimated regression coefficients using the beta regression method. Section 4 verifies the regression results of the standard loss-analysis model. The development, configuration, and verification of standard loss-analysis model is explained in Section 5. Finally, we conclude this paper with a brief summary and discuss future work in Section 6. The approximate analytical procedures are presented in Table 1.
Step Contents This methodology had some limitations, primarily owing to the lack of accounting for situation-dependent combined arm effects. To overcome this limitation, research and development (RAND), which was founded as an American nonprofit global policy think tank in 1948, developed the situational force scoring (SFS) methodology (Allen, 1992). The SFS was developed to improve the representation of ground force close combat and to provide an alternative extrapolation mechanism for use in the more detailed weapon-on-weapon model. The

Definition of terms and factors effect
Prior to verifying the loss factors, we define terms related to the loss. The operational posture is the form of operation for the performance of the attacker and the defender's planned missions.
The force ratio of friend and foe is the difference in the ratio between the ally's and opponent's military force, based on the power index. The operational tempo is the relative operational speed of the enemy. Here are the loss factors and their effects based on previous research including 'How to make war: A comprehensive guide to modern warfare', etc. (Dunnigan, 2003). Existing studies, such as 'How to make war,' divided the loss influencing factors into operational, Weapon effectiveness and standard loss analysis / Lee, Seungho⋅Lim, Changkyu⋅Lee, Chahwa⋅Cho, Yongju⋅Kim, Jaeoh 163 environmental, and intangible factors and presented them, as listed in Table 2. These loss factors, which are listed in Table 2, were verified using statistical methods. For reference, the number in parentheses for each loss factor indicates the degree of impact on the loss.

Overview of Beta Regression
First, as proposed by Ferrari and Cribari-Neto (2004), beta regression is a statistical method in which the dependent variable is based on beta distribution. Beta regression employs a link function to map the data in real space to the bounded interval (0.1) and then performs a regression on the beta distribution using maximum likelihood estimation. While beta regression is most readily applied to the modeling of rates and proportions provided the constrained interval, because of the flexibility of the underlying beta distribution, beta regression is utilized in a wide range of disciplines including medicine, finance and economics, and social science.
Because the beta distribution is an extremely flexible distribution, beta regression can be useful for dependent variables y (0, 1), as shown in Figure 1. A beta regression model is a maximum likelihood derivation combining link functions, such as logit and probit, with reparametrization of the beta distribution. While multiple linear regression is the most widely used approach in modeling a variable with multiple explanatory variables when the output variable ranges between (-inf, inf), beta regression is particularly useful for modeling ratios because the beta distribution is constrained between (0, 1).

<Figure 1> Beta densities for different parameter values
A typical beta probability density distribution is as follows: f(y; p, q) =                , 0 < y < 1 where p, q > 0 and Γ(⋅) denotes the gamma distribution. These parameters can be transformed to μ = p / (p+q) and φ = p + q, as follows: where 0<μ<1 and a known precision parameter φ>0. When E(y) = μ and VAR(y)=μ(1-μ)/(1+ φ), the response variable follows a beta distribution, which is denoted as y∼B(μ, φ). Given random variables y1,…,yn such that yi∼B(μi, φ), the beta regression is defined as follows: where g(μi): (0,1) is a link function that is strictly increasing and twice differentiable; beta is a regression vector; an independent vector; and a linear predictor. Then, is estimated using the maximum likelihood. It is assumed that the precision parameters in Equations (2) and (3) are identical. However, this assumption may not be applicable to many practical problems. Simas et al. assumed yi∼B(μi,φi) and expanded the regression model as follows [6].
where β and γ denote regression vectors; xi and zi denote independent variable vectors; and η1i and η2i denote linear predictor variable vectors. Such models are referred to as variable dispersion beta regression models.

Verification of loss factors using statistical method
Among the various loss factors in Table 2, we selected four factors, such as topography, operational posture, force ratio, and operational period, which were accumulated as big-data in GORRAM. We used beta regression to verify these factors.

<Table 3> Verification results
From the regression output, three of the four loss-influencing factors were observed to be significant. Among the three, the force ratio, difference between the ally and opponent's military force based on the power index, had the largest influence on the daily loss rate. Operational posture, the form of operations for the planned mission of the attacker and defender, and the effects of topographic characteristics and weather were also observed to be statistically significant explanatory variables, while their influence on the daily loss rate was much smaller than that of the force ratio. The operation period (number of days) was observed to be statistically insignificant in terms of explaining the loss rate.

Building the database
The simulation output is stored in GORRAM in a tabular format, with the loss of resources indicated by the type of personnel and equipment for each corps/division and day of battle. To construct the database useful for this study, the original data were organized according to the three variables identified as significant in predicting the loss rate, geometry, operational posture, and force ratio. The appropriate classifications for geometry and operational posture listed in Table 4 were applied based on the domain knowledge of the operational condition and function of each corps/division in the ROK Army. In terms of topography, it is divided into flatland, hill, and mountain because the loss is different depending on the place where the war takes places.
From the same perspective, the type of operation is divided into attack, defense, fixing, and reserve.

Calculating the regression formula
After building the database, we calculated the beta regression formula, as listed in Table 6, using loss-influencing factors, such as topography, operational posture, and force ratio, which were verified in Section 3. The variables in the regression formula are displayed as random numbers for military security reasons. Although not all regression formulas were included in this paper owing to military security concerns, we were able to achieve findings that were supported by activities. All standard losses for the important equipment listed in Table 6 can be predicted.
We employed a beta regression model at the time, with the loss rate between 0 and 1.

Development and configuration of the model
A standard loss analysis model was developed using C#. C# is an object-oriented programming language developed by Microsoft that is intended to combine the computing power of C++ with the programming ease of Visual Basic. We chose the C# language as a development tool, considering processing speed and management after development. Figure  of war participants, such as officers and men. We can assist prepare for war effectively by predicting losses from D-day to D+60 days. These numbers were blanked out for military security concerns.
<Figure 3> The result window of the standard loss analysis model

Verification of the standard loss analysis model
To verify the standard loss analysis model, predictions were generated using the beta regression formula by varying each factor while keeping the others constant. The numbers have been blanked out for security reasons. For instance, in comparing the loss rate along topography, it was confirmed that the loss on flatland was higher than the rates on hills and mountains, as shown in Figure 4, while other variables were constant.

<Figure 4> Comparison of loss rate under different types of topography
Similarly, the loss rate was higher in the simulations where the power ratio was higher, as shown in Figure 5. These relationships are consistent with the results of existing studies.
However, it was feasible to estimate the loss rate more accurately in this study. This serves as an essential foundation for building various war preparation scenarios.
<Figure 5> Comparison of loss rate according to the force ratio

Ⅵ. SUMMARY AND FUTURE WORKS
To overcome the limitations of existing war-game simulations, the CAAS has conducted research on the standard loss analysis model. We verified the loss factors, such as topography, operational posture, and force ratio, using the simulation data of GORRAM using beta regression. Based on the verification, we calculated a regression formula to predict the loss of personnel and equipment. Finally, we developed a standard loss analysis model using C# and verified the model by comparing the results of the model with the existing studies. Throughout this study, we analyzed the war-time resource requirements within several hours by comparing 1-2 weeks of the existing war-game model. Through this study, we were able to overcome the time limitation of the existing war-game model and improve the accuracy of the method by simply evaluating loss with the index of the weapon system. In addition, the output of this study will be used as a means of judging the expected loss by reflecting the changed operational environment from this joint US-ROK exercise. This model will be distributed to the Corps and Division for analyzing the operational plan by comparing the loss of various operational plans.
Finally, we must update the regression formula using more recent simulation data. In addition, we will expand the variety of target equipment from 10-20 types to 100-120 types. After improving the standard loss analysis model by the end of this year, it will be used on a trial basis in the next ROK-US CCPT.