Random Forest Model with Prediction Error Decomposition Function
The random forest is one of the most widely used machine learning techniques. Its prediction accuracy is very high and its hyper-parameters are easier to tune compared to other models such as gradient boosting machine (GBM) and neural network (NN). Also, the basic idea of aggregating multiple randomized decision trees is straightforward and the number of tools that ensure its interpretability (e.g., feature importance, partial dependence plot, etc.) has been increasing in recent years. With these strengths, we believe the random forest would be a powerful model for actuaries. Moreover, research on statistical properties of the random forest has also made great progress, which could contribute significantly to the estimation of the error distribution and risk management tools.
In this paper, we firstly review how the random forest works in regression problems and then introduce a method for estimating prediction errors using it, focusing on its statistical properties and how to make well devised use of the in-bag and out-of-bag of the bootstrap sampling in the forest growth process. Here, we state that the random forest can give error distributions in addition to point estimates, which is a great advantage compared to other machine learning models like GBM or NN. We also propose a random forest specific method of decomposing the prediction error into process error, parameter error, and other error for the prediction target and confirm it is reasonable from the perspective of computational complexity. We show that the random forest enables us to evaluate the process error individually for each target, which is difficult with GLM. Lastly, we demonstrate the effectiveness of our proposed method through numerical experiments on artificial data and real datasets, and discuss its applications.
Find the Q&A here: Q&A on 'New Actuarial Approaches by Using Data'