Objective

According to IBM, random forest is a commonly used machine learning algorithm that combines the output of multiple decision trees to reach a single result. Random forest handles both classification and regression problems. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the output of the random forest is the mean prediction of the individual trees.

This vignette demonstrates random forest regression with two objectives. First, develop a model identifying the most significant explanatory variables associated with the target variable, reaction to organisational change. Second, predict an individual’s reaction to change and evaluate the accuracy of these predictions.

The original data set comprised 616 respondents from 10 public and private sector organisations experiencing organisational change. Respondents reported self-efficacy, irrational ideas, maladaptive defence mechanisms, emotion, behavioural intentions and reaction towards change in their organisation.

Workflow

The raw data set was wrangled and tidied before processing. Conducted a brief exploratory analysis comprising a statistical summary, explanatory variable correlation analysis and outcome variable density histogram to understand the variables.

The random forest model was developed and fitted on the training data using a workflow that considered resampling methods, feature engineering, model specifications and hyperparameter optimisation. Reviewed the results of the training model and identified important predictor variables associated with reaction to organisational change.

The trained model was then applied to the unseen test data to predict the target or outcome variable. Evaluated the model’s performance on the test data with regression metrics and a scatterplot comparing actual outcomes with model-predicted outcomes.

Results

1. Explore data

The data set was filtered and reduced to analyse only those respondents who reported experiencing significant organisational change. Table 1 is a statistical summary for each of the proposed explanatory variables.

Table 1 Statistical summary of explanatory variables
vars n mean sd median trimmed mad min max range skew kurtosis se
self_efficacy 1 218 5.56 0.80 5.65 5.64 0.76 1.71 6.94 5.24 -1.21 2.80 0.05
needs_approval 2 218 4.20 1.39 4.00 4.23 1.48 1.00 7.00 6.00 -0.14 -0.70 0.09
fears_failure 3 218 4.08 1.51 4.00 4.09 1.48 1.00 7.00 6.00 -0.01 -0.85 0.10
labelling_blame 4 218 3.19 1.32 3.00 3.11 1.48 1.00 7.00 6.00 0.54 -0.32 0.09
catastrophising 5 218 3.73 1.32 3.50 3.69 1.48 1.00 7.00 6.00 0.14 -0.89 0.09
managing_feelings 6 218 3.91 1.30 4.00 3.91 1.48 1.00 6.50 5.50 -0.08 -0.68 0.09
anxious_thoughts 7 218 4.12 1.16 4.00 4.15 0.74 1.50 7.00 5.50 -0.18 -0.39 0.08
avoidance 8 218 2.25 0.98 2.00 2.14 0.74 1.00 5.50 4.50 1.04 0.91 0.07
past_influences 9 218 2.79 1.31 2.50 2.71 1.48 1.00 6.00 5.00 0.58 -0.72 0.09
facing_reality 10 218 4.56 1.43 5.00 4.62 1.48 1.00 7.00 6.00 -0.34 -0.78 0.10
passive_existence 11 218 4.23 1.23 4.00 4.26 1.48 1.00 7.00 6.00 -0.15 -0.47 0.08
dissociation 12 218 2.73 1.20 2.50 2.61 0.74 1.00 6.50 5.50 0.89 0.27 0.08
displacement 13 218 3.06 1.28 3.00 3.01 1.48 1.00 7.00 6.00 0.34 -0.52 0.09
isolation_of_affect 14 218 3.40 1.50 3.50 3.37 2.22 1.00 7.00 6.00 0.16 -0.94 0.10
reaction_formation 15 218 4.14 1.23 4.00 4.14 1.48 1.50 7.00 5.50 0.11 -0.59 0.08
denial 16 218 2.51 1.04 2.00 2.44 0.74 1.00 6.50 5.50 0.77 0.30 0.07
projection 17 218 2.50 1.11 2.00 2.40 0.74 1.00 6.00 5.00 0.86 0.07 0.07
passive_aggression 18 218 2.59 1.06 2.50 2.49 0.74 1.00 6.50 5.50 0.92 0.67 0.07
acting_out 19 218 3.56 1.34 3.50 3.55 1.48 1.00 6.50 5.50 0.04 -0.87 0.09
emotion 20 218 3.80 1.13 3.80 3.79 1.11 1.00 6.90 5.90 0.09 -0.22 0.08
behavioural_intentions 21 218 5.08 1.11 5.25 5.15 1.19 1.75 7.00 5.25 -0.58 -0.17 0.08

Chart 1 supports Table 1, showing correlations between the proposed explanatory variables.

Chart 2 summarises the numeric outcome variable, reaction to organisational change.

2. Random forest regression

2.1 Train model

2.1.1 Build model

Commenced building the model by randomly splitting the data into a training and testing set at a 3:1 ratio using stratified sampling. Stratified sampling allocates approximately equal proportions of observations across the range of values for the outcome variable to balance the training and testing sets. Resampled data in the training set using five repeats of 10-fold cross-validation. The recipe for the random forest was a standard formula with no additional feature engineering. The model was specified, and a workflow created for implementation.

Because ideal parameters to tune and train the model are unknown, many models were tuned using different parameter values to assess model performance. This was implemented using parallel processing, as parameter tuning is computationally intensive. Chart 3 shows the results of the initial tuning. Various parameter values for min_n (minimum number of data points in a node required before splitting) and mtry (number of randomly sampled predictors) were assessed against the regression metric, root mean square error (rmse).

Guided by initial tuning results in Chart 3, investigated a targeted short-list of parameter tuning combinations to extract the best parameter tuning combination. Chart 4 shows the result.

With parameter tuning complete, the workflow and model were finalised for fitting.

2.1.2 Fit and review model

The tuned model was fitted on the training set. Chart 5 illustrates the importance of explanatory variables on the target variable, reaction to organisational change. The most important variables were behavioural intentions and emotion.

2.2 Test model

2.2.1 Predict on test data

The model fitted on the training data was then applied to the unseen testing data to predict the outcome variable. Table 2 compares actual reaction and predicted reaction (.pred) in a small sample of observations extracted from the testing data.

Table 2 Sample of outcome variable predictions in test data
reaction .pred self_efficacy needs_approval fears_failure labelling_blame catastrophising managing_feelings anxious_thoughts avoidance past_influences facing_reality passive_existence dissociation displacement isolation_of_affect reaction_formation denial projection passive_aggression acting_out emotion behavioural_intentions
2 2.34 6.35 5.5 4.0 4.0 5.5 3.5 3.0 1.0 4.5 6.5 5.0 4.5 5.5 3.0 3.0 2.5 2.0 2.5 4.5 2.95 4.40
2 3.68 4.35 6.0 6.0 4.5 5.5 6.0 5.5 1.5 3.5 6.0 6.5 2.0 5.5 2.5 2.0 2.0 5.5 2.0 5.5 2.05 5.05
3 3.04 6.35 6.0 7.0 1.0 3.0 4.0 4.0 1.0 1.5 7.0 4.5 1.5 1.0 1.0 4.0 1.5 1.0 1.5 6.5 2.55 4.90
6 5.84 6.53 4.5 2.5 2.0 2.5 2.0 5.5 1.5 2.0 5.0 5.0 2.5 1.0 3.0 4.5 2.0 1.5 1.0 5.5 4.80 5.85
5 5.03 5.82 3.5 3.5 2.5 5.0 3.5 4.0 3.5 3.5 5.5 5.0 3.5 4.0 6.0 5.5 2.0 2.5 2.5 5.0 3.85 5.25
6 5.59 6.29 2.5 3.0 1.0 1.0 4.0 2.5 1.0 1.0 4.5 2.5 1.5 1.0 1.5 2.5 1.0 1.0 1.0 1.5 3.95 6.25
5 4.30 5.00 5.0 5.0 5.0 3.0 4.0 5.0 3.0 2.5 3.0 3.0 2.5 3.5 4.5 6.5 2.0 2.0 3.0 5.0 3.90 5.20
6 5.42 5.71 2.0 2.5 2.0 2.0 2.5 2.0 2.0 3.0 2.5 3.0 2.0 2.0 3.0 3.5 2.0 2.0 2.0 2.0 3.75 5.75
2 2.87 5.71 3.0 3.0 2.0 3.0 3.0 4.0 2.0 2.0 3.0 2.0 2.0 3.0 4.0 5.0 2.0 2.0 2.0 3.0 3.75 3.40
3 2.16 4.94 4.5 4.0 5.0 5.0 4.0 4.5 3.0 5.0 5.5 5.5 2.5 4.5 4.5 3.5 3.5 4.0 4.0 5.5 2.40 4.00
2 2.96 4.47 4.0 1.5 2.0 2.0 3.5 2.0 1.5 2.0 4.0 5.0 2.0 2.0 5.5 4.0 2.0 2.0 2.0 3.5 2.65 4.85
6 5.29 5.47 5.5 4.0 3.0 4.5 3.5 5.5 1.5 1.5 7.0 4.5 4.0 4.0 6.0 4.0 2.5 2.0 3.0 3.5 4.10 5.85
2 3.08 6.12 2.0 4.5 2.5 2.0 2.0 1.5 2.5 1.0 3.0 4.5 1.5 2.0 1.5 3.5 1.5 2.0 1.5 1.0 3.75 4.30
1 1.85 3.24 5.5 6.0 4.5 5.0 4.0 5.0 3.0 5.0 5.0 4.5 3.5 3.5 3.5 6.0 4.0 4.5 4.0 5.0 1.55 3.60
2 3.35 5.18 7.0 7.0 5.0 6.5 6.5 6.0 4.0 3.0 7.0 6.0 1.5 6.0 2.5 5.0 2.0 5.5 2.5 6.0 2.60 4.60


2.2.2 Evaluate model on test data

Table 3 summarises key regression metrics for the test set.

Table 3 Regression metrics for the test set
.metric .estimator .estimate .config
rmse standard 1.0498 Preprocessor1_Model1
rsq standard 0.6949 Preprocessor1_Model1

Chart 6 scatterplot compares the actual reaction to change with the predicted reaction for the test set. The dotted line through the origin (x=y) represents the perfect model where all predicted values would equal the true value. Overall, this model delivered a favourable outcome with a coefficient of determination (R2) nudging 0.70 on the test set.


References:

Self-efficacy was measured using the ‘Self-efficacy scale: Construction and validation’ by Sherer, Maddux, Mercandante, Prentice-Dunn and Rogers, published in Psychological Reports.
Irrational ideas were measured using the ‘Irrational belief scale’ developed by Malouff and Schutte, published in the Sourcebook of Adult Assessment Strategies, based on Ellis and Harper’s work, published in A New Guide to Rational Living.
Maladaptive defence mechanisms were measured using selected items from ‘The Defense Style Questionnaire’ by Andrews, Singh and Bond, published in The Journal of Nervous and Mental Disease.
Emotion was measured using ‘A semantic differential mood scale’ by Lorr and Wunderlich, published in the Journal of Clinical Psychology.


Session information and package update

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.0 (2024-04-24 ucrt)
##  os       Windows 11 x64 (build 22631)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  English_Australia.utf8
##  ctype    English_Australia.utf8
##  tz       Australia/Brisbane
##  date     2024-07-30
##  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package      * version    date (UTC) lib source
##  backports      1.5.0      2024-05-23 [1] CRAN (R 4.4.0)
##  base64enc      0.1-3      2015-07-28 [1] CRAN (R 4.4.0)
##  broom        * 1.0.6      2024-05-17 [1] CRAN (R 4.4.0)
##  bslib          0.7.0      2024-03-29 [1] CRAN (R 4.4.0)
##  cachem         1.1.0      2024-05-16 [1] CRAN (R 4.4.0)
##  class          7.3-22     2023-05-03 [2] CRAN (R 4.4.0)
##  cli            3.6.3      2024-06-21 [1] CRAN (R 4.4.1)
##  codetools      0.2-20     2024-03-31 [2] CRAN (R 4.4.0)
##  colorspace     2.1-0      2023-01-23 [1] CRAN (R 4.4.1)
##  cvms         * 1.6.1      2024-02-27 [1] CRAN (R 4.4.0)
##  data.table   * 1.15.4     2024-03-30 [1] CRAN (R 4.4.0)
##  devtools       2.4.5      2022-10-11 [1] CRAN (R 4.4.0)
##  dials        * 1.2.1      2024-02-22 [1] CRAN (R 4.4.0)
##  DiceDesign     1.10       2023-12-07 [1] CRAN (R 4.4.0)
##  digest         0.6.36     2024-06-23 [1] CRAN (R 4.4.1)
##  doParallel   * 1.0.17     2022-02-07 [1] CRAN (R 4.4.0)
##  dplyr        * 1.1.4      2023-11-17 [1] CRAN (R 4.4.0)
##  ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.4.0)
##  evaluate       0.24.0     2024-06-10 [1] CRAN (R 4.4.0)
##  fansi          1.0.6      2023-12-08 [1] CRAN (R 4.4.0)
##  farver         2.1.2      2024-05-13 [1] CRAN (R 4.4.0)
##  fastmap        1.2.0      2024-05-15 [1] CRAN (R 4.4.0)
##  forcats      * 1.0.0      2023-01-29 [1] CRAN (R 4.4.0)
##  foreach      * 1.5.2      2022-02-02 [1] CRAN (R 4.4.0)
##  fs             1.6.4      2024-04-25 [1] CRAN (R 4.4.0)
##  furrr          0.3.1      2022-08-15 [1] CRAN (R 4.4.0)
##  future         1.33.2     2024-03-26 [1] CRAN (R 4.4.0)
##  future.apply   1.11.2     2024-03-28 [1] CRAN (R 4.4.0)
##  generics       0.1.3      2022-07-05 [1] CRAN (R 4.4.0)
##  GGally       * 2.2.1      2024-02-14 [1] CRAN (R 4.4.0)
##  ggplot2      * 3.5.1      2024-04-23 [1] CRAN (R 4.4.0)
##  ggstats        0.6.0      2024-04-05 [1] CRAN (R 4.4.0)
##  globals        0.16.3     2024-03-08 [1] CRAN (R 4.4.0)
##  glue           1.7.0      2024-01-09 [1] CRAN (R 4.4.0)
##  gower          1.0.1      2022-12-22 [1] CRAN (R 4.4.0)
##  GPfit          1.0-8      2019-02-08 [1] CRAN (R 4.4.0)
##  gtable         0.3.5      2024-04-22 [1] CRAN (R 4.4.0)
##  hardhat        1.4.0      2024-06-02 [1] CRAN (R 4.4.0)
##  here         * 1.0.1      2020-12-13 [1] CRAN (R 4.4.0)
##  highr          0.11       2024-05-26 [1] CRAN (R 4.4.0)
##  hms            1.1.3      2023-03-21 [1] CRAN (R 4.4.0)
##  htmltools      0.5.8.1    2024-04-04 [1] CRAN (R 4.4.0)
##  htmlwidgets    1.6.4      2023-12-06 [1] CRAN (R 4.4.0)
##  httpuv         1.6.15     2024-03-26 [1] CRAN (R 4.4.0)
##  infer        * 1.0.7      2024-03-25 [1] CRAN (R 4.4.0)
##  ipred          0.9-15     2024-07-18 [1] CRAN (R 4.4.1)
##  iterators    * 1.0.14     2022-02-05 [1] CRAN (R 4.4.0)
##  jquerylib      0.1.4      2021-04-26 [1] CRAN (R 4.4.0)
##  jsonlite       1.8.8      2023-12-04 [1] CRAN (R 4.4.0)
##  kableExtra   * 1.4.0      2024-01-24 [1] CRAN (R 4.4.0)
##  knitr          1.48       2024-07-07 [1] CRAN (R 4.4.1)
##  labeling       0.4.3      2023-08-29 [1] CRAN (R 4.4.0)
##  later          1.3.2      2023-12-06 [1] CRAN (R 4.4.0)
##  lattice        0.22-6     2024-03-20 [2] CRAN (R 4.4.0)
##  lava           1.8.0      2024-03-05 [1] CRAN (R 4.4.0)
##  lhs            1.2.0      2024-06-30 [1] CRAN (R 4.4.1)
##  lifecycle      1.0.4      2023-11-07 [1] CRAN (R 4.4.0)
##  listenv        0.9.1      2024-01-29 [1] CRAN (R 4.4.0)
##  lubridate    * 1.9.3      2023-09-27 [1] CRAN (R 4.4.0)
##  magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.4.0)
##  MASS           7.3-60.2   2024-04-24 [2] local
##  Matrix         1.7-0      2024-03-22 [2] CRAN (R 4.4.0)
##  memoise        2.0.1      2021-11-26 [1] CRAN (R 4.4.0)
##  mime           0.12       2021-09-28 [1] CRAN (R 4.4.0)
##  miniUI         0.1.1.1    2018-05-18 [1] CRAN (R 4.4.0)
##  mnormt         2.1.1      2022-09-26 [1] CRAN (R 4.4.0)
##  modeldata    * 1.4.0      2024-06-19 [1] CRAN (R 4.4.1)
##  munsell        0.5.1      2024-04-01 [1] CRAN (R 4.4.0)
##  nlme           3.1-164    2023-11-27 [2] CRAN (R 4.4.0)
##  nnet           7.3-19     2023-05-03 [2] CRAN (R 4.4.0)
##  parallelly     1.37.1     2024-02-29 [1] CRAN (R 4.4.0)
##  parsnip      * 1.2.1      2024-03-22 [1] CRAN (R 4.4.0)
##  pillar         1.9.0      2023-03-22 [1] CRAN (R 4.4.0)
##  pkgbuild       1.4.4      2024-03-17 [1] CRAN (R 4.4.0)
##  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.4.0)
##  pkgload        1.4.0      2024-06-28 [1] CRAN (R 4.4.1)
##  plyr           1.8.9      2023-10-02 [1] CRAN (R 4.4.0)
##  prodlim        2024.06.25 2024-06-24 [1] CRAN (R 4.4.1)
##  profvis        0.3.8      2023-05-02 [1] CRAN (R 4.4.0)
##  promises       1.3.0      2024-04-05 [1] CRAN (R 4.4.0)
##  psych        * 2.4.6.26   2024-06-27 [1] CRAN (R 4.4.1)
##  purrr        * 1.0.2      2023-08-10 [1] CRAN (R 4.4.0)
##  R6             2.5.1      2021-08-19 [1] CRAN (R 4.4.0)
##  ranger       * 0.16.0     2023-11-12 [1] CRAN (R 4.4.0)
##  RColorBrewer   1.1-3      2022-04-03 [1] CRAN (R 4.4.0)
##  Rcpp           1.0.13     2024-07-17 [1] CRAN (R 4.4.1)
##  readr        * 2.1.5      2024-01-10 [1] CRAN (R 4.4.0)
##  recipes      * 1.1.0      2024-07-04 [1] CRAN (R 4.4.1)
##  remotes        2.5.0      2024-03-17 [1] CRAN (R 4.4.0)
##  repr           1.1.7      2024-03-22 [1] CRAN (R 4.4.0)
##  rlang          1.1.4      2024-06-04 [1] CRAN (R 4.4.0)
##  rmarkdown      2.27       2024-05-17 [1] CRAN (R 4.4.0)
##  rpart          4.1.23     2023-12-05 [2] CRAN (R 4.4.0)
##  rprojroot      2.0.4      2023-11-05 [1] CRAN (R 4.4.0)
##  rsample      * 1.2.1      2024-03-25 [1] CRAN (R 4.4.0)
##  rstudioapi     0.16.0     2024-03-24 [1] CRAN (R 4.4.0)
##  sass           0.4.9      2024-03-15 [1] CRAN (R 4.4.0)
##  scales       * 1.3.0      2023-11-28 [1] CRAN (R 4.4.0)
##  sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.4.0)
##  shiny          1.8.1.1    2024-04-02 [1] CRAN (R 4.4.0)
##  skimr        * 2.1.5      2022-12-23 [1] CRAN (R 4.4.0)
##  stringi        1.8.4      2024-05-06 [1] CRAN (R 4.4.0)
##  stringr      * 1.5.1      2023-11-14 [1] CRAN (R 4.4.0)
##  survival       3.5-8      2024-02-14 [2] CRAN (R 4.4.0)
##  svglite        2.1.3      2023-12-08 [1] CRAN (R 4.4.0)
##  systemfonts    1.1.0      2024-05-15 [1] CRAN (R 4.4.0)
##  tibble       * 3.2.1      2023-03-20 [1] CRAN (R 4.4.0)
##  tidymodels   * 1.2.0      2024-03-25 [1] CRAN (R 4.4.0)
##  tidyr        * 1.3.1      2024-01-24 [1] CRAN (R 4.4.0)
##  tidyselect     1.2.1      2024-03-11 [1] CRAN (R 4.4.0)
##  tidyverse    * 2.0.0      2023-02-22 [1] CRAN (R 4.4.0)
##  timechange     0.3.0      2024-01-18 [1] CRAN (R 4.4.0)
##  timeDate       4032.109   2023-12-14 [1] CRAN (R 4.4.0)
##  tune         * 1.2.1      2024-04-18 [1] CRAN (R 4.4.0)
##  tzdb           0.4.0      2023-05-12 [1] CRAN (R 4.4.0)
##  urlchecker     1.0.1      2021-11-30 [1] CRAN (R 4.4.0)
##  usemodels    * 0.2.0      2022-02-18 [1] CRAN (R 4.4.0)
##  usethis        2.2.3      2024-02-19 [1] CRAN (R 4.4.0)
##  utf8           1.2.4      2023-10-22 [1] CRAN (R 4.4.0)
##  vctrs          0.6.5      2023-12-01 [1] CRAN (R 4.4.0)
##  vip          * 0.4.1      2023-08-21 [1] CRAN (R 4.4.0)
##  viridisLite    0.4.2      2023-05-02 [1] CRAN (R 4.4.0)
##  withr          3.0.0      2024-01-16 [1] CRAN (R 4.4.0)
##  workflows    * 1.1.4      2024-02-19 [1] CRAN (R 4.4.0)
##  workflowsets * 1.1.0      2024-03-21 [1] CRAN (R 4.4.0)
##  xfun           0.46       2024-07-18 [1] CRAN (R 4.4.1)
##  xml2           1.3.6      2023-12-04 [1] CRAN (R 4.4.0)
##  xtable         1.8-4      2019-04-21 [1] CRAN (R 4.4.0)
##  yaml           2.3.9      2024-07-05 [1] CRAN (R 4.4.1)
##  yardstick    * 1.3.1      2024-03-21 [1] CRAN (R 4.4.0)
## 
##  [1] C:/Users/wayne/AppData/Local/R/win-library/4.4
##  [2] C:/Program Files/R/R-4.4.0/library
## 
## ──────────────────────────────────────────────────────────────────────────────