Objective

According to IBM, logistic regression (also known as logit model) is often used for classification and predictive analytics. Logistic regression estimates the probability of an event occurring based on a given data set of independent variables.

This vignette conducts a machine learning multivariate binomial logistic regression on a dataset of 208 respondents experiencing significant organisational change. Respondents reported self-efficacy, irrational ideas, maladaptive defence mechanisms, emotion, behavioural intentions and reaction towards change in their organisation.

This vignette trains a multivariate binomial logistic regression model. The model is then tested and evaluated for accuracy in predicting the target or outcome variable.

Workflow

The raw data set was wrangled and tidied before processing. Since this was a logistic regression, the outcome variable, a seven-point Likert scale, was replaced with a binary variable. Conducted a brief exploratory analysis comprising a statistical summary, correlation and comparative analysis to understand the variables.

Commenced building the logit model by splitting the data into training and testing sets at a percentage ratio of 75:25 or 3:1 split. Implemented stratified sampling to ensure that both levels of the response variable in the test set had roughly equivalent proportions of values to the training set.

Model specifications and feature engineering were formulated before fitting the model on the training data and reviewing the results.

The model was then applied to the unseen test set to predict the response variable, reaction to organisational change. Evaluated the model’s prediction performance with a confusion matrix heatmap, model fit statistics and ROC curve.

Results

1. Explore variables

Before building the logit model, the outcome variable (reaction to organisational change) was transformed from an interval variable to a binary variable. The following tables show the conversion of the outcome variable from a seven-point Likert scale (Table 1) to a binary scale (Table 2) with corresponding frequencies. Because the focus is on predicting either support or opposition to change, the neutral measure on the Likert scale was removed from the binary outcome variable.

Table 1 Original seven-point Likert scale
Reaction to change Freq
Totally Oppose 73
Oppose 71
Partially Oppose 90
Neutral 124
Partially Support 189
Support 23
Totally Support 46
Table 2 New binary scale for logistic regression
Reaction to change Freq
oppose 184
support 359

The data set was filtered to analyse only those respondents who reported experiencing significant organisational change. Table 3 is a statistical summary of the proposed explanatory variables for the logistic regression.

Table 3 Statistical summary of explanatory variables
variable n mean sd median trimmed mad min max range skew kurtosis se
self_efficacy 208 5.55 0.81 5.65 5.63 0.78 1.71 6.94 5.24 −1.19 2.71 0.06
needs_approval 208 4.19 1.40 4.00 4.22 1.48 1.00 7.00 6.00 −0.14 −0.73 0.10
fears_failure 208 4.06 1.51 4.00 4.07 1.48 1.00 7.00 6.00 0.00 −0.83 0.10
labelling_blame 208 3.19 1.32 3.00 3.11 1.48 1.00 7.00 6.00 0.55 −0.31 0.09
catastrophising 208 3.70 1.32 3.50 3.66 1.48 1.00 7.00 6.00 0.15 −0.89 0.09
managing_feelings 208 3.88 1.31 4.00 3.88 1.48 1.00 6.50 5.50 −0.05 −0.72 0.09
anxious_thoughts 208 4.10 1.16 4.00 4.13 0.74 1.50 7.00 5.50 −0.17 −0.38 0.08
avoidance 208 2.25 0.99 2.00 2.14 0.74 1.00 5.50 4.50 1.05 0.90 0.07
past_influences 208 2.79 1.30 2.50 2.71 1.48 1.00 6.00 5.00 0.56 −0.76 0.09
facing_reality 208 4.54 1.44 5.00 4.60 1.48 1.00 7.00 6.00 −0.35 −0.80 0.10
passive_existence 208 4.23 1.24 4.00 4.26 1.48 1.00 7.00 6.00 −0.18 −0.46 0.09
dissociation 208 2.74 1.21 2.50 2.61 0.74 1.00 6.50 5.50 0.90 0.24 0.08
displacement 208 3.04 1.28 3.00 2.99 1.48 1.00 7.00 6.00 0.34 −0.50 0.09
isolation_of_affect 208 3.41 1.49 3.50 3.38 2.22 1.00 7.00 6.00 0.16 −0.92 0.10
reaction_formation 208 4.17 1.25 4.00 4.17 1.48 1.50 7.00 5.50 0.07 −0.65 0.09
denial 208 2.51 1.03 2.00 2.44 0.74 1.00 6.50 5.50 0.79 0.40 0.07
projection 208 2.49 1.11 2.00 2.39 0.74 1.00 6.00 5.00 0.88 0.09 0.08
passive_aggression 208 2.62 1.08 2.50 2.52 0.74 1.00 6.50 5.50 0.88 0.55 0.07
acting_out 208 3.55 1.34 3.50 3.54 1.48 1.00 6.50 5.50 0.04 −0.88 0.09
emotion 208 3.80 1.15 3.80 3.79 1.19 1.00 6.90 5.90 0.09 −0.32 0.08
behavioural_intentions 208 5.08 1.13 5.25 5.15 1.26 1.75 7.00 5.25 −0.57 −0.21 0.08

Chart 1 supports Table 1, showing the correlation coefficient between each variable.

Because of the number of explanatory variables under consideration, prepared two separate pairs plots. Chart 2 compares the relationship between irrational ideas and the outcome variable, reaction to organisational change. Chart 3 compares the relationship between maladaptive defence mechanisms and the outcome variable, reaction to organisational change.

2. Build and fit model

The logistic regression model was specified with additional feature engineering to normalise predictor variables. The model was then fitted to the training data. Table 4 shows model fit statistics for the training set arranged by p-value.

Table 4 Model fit statistics (arranged by p.value)
term estimate std.error statistic p.value
behavioural_intentions 5.5018 1.5519 3.5452 0.0004
emotion 3.7447 1.1378 3.2912 0.0010
reaction_formation −1.4276 0.5780 −2.4698 0.0135
past_influences 1.4822 0.7784 1.9043 0.0569
passive_aggression 1.2442 0.7388 1.6842 0.0921
isolation_of_affect 1.0488 0.6621 1.5842 0.1132
passive_existence −1.0412 0.7086 −1.4693 0.1417
projection −1.0643 0.8141 −1.3074 0.1911
displacement 0.9718 0.7580 1.2820 0.1998
anxious_thoughts 0.7126 0.5741 1.2413 0.2145
avoidance 0.7410 0.6106 1.2136 0.2249
facing_reality −0.8787 0.7411 −1.1857 0.2357
denial −0.9507 0.8341 −1.1398 0.2544
dissociation −0.8328 0.7811 −1.0662 0.2863
acting_out −0.6300 0.6772 −0.9303 0.3522
labelling_blame 0.6776 0.7691 0.8810 0.3783
self_efficacy 0.4152 0.6488 0.6399 0.5222
managing_feelings 0.4103 0.6813 0.6023 0.5470
needs_approval 0.3831 0.7651 0.5007 0.6166
catastrophising 0.2974 0.6966 0.4269 0.6695
fears_failure 0.2394 0.6650 0.3601 0.7188

It is noted in Table 4 and later in Chart 5 that the order of less significant variables may differ depending on the random split of observations in the training and test sets. Chart 4 illustrates the training set’s most significant explanatory variables, with a p-value less than 0.05.

Chart 5 visualises estimates for each explanatory variable in this random split.

3. Model prediction and evaluation

The model fitted on the training data was applied to the unseen test data to predict the response variable. Table 5 compares actual reaction and predicted reaction (.pred) to organisational change in a small sample of cases extracted from the test set (scroll to far right of table for a sample of predictions.)

Table 5 A sample of predictions from the test set
self_efficacy needs_approval fears_failure labelling_blame catastrophising managing_feelings anxious_thoughts avoidance past_influences facing_reality passive_existence dissociation displacement isolation_of_affect reaction_formation denial projection passive_aggression acting_out emotion behavioural_intentions reaction .pred_oppose .pred_support
6.35 5.50 4.00 4.00 5.50 3.50 3.00 1.00 4.50 6.50 5.00 4.50 5.50 3.00 3.00 2.50 2.00 2.50 4.50 2.95 4.40 oppose 0.9626 0.0374
6.00 3.50 4.50 2.50 4.00 5.00 4.50 2.50 4.00 5.50 4.00 4.00 4.00 5.50 4.00 2.50 4.00 3.50 3.50 2.85 3.20 oppose 0.9995 0.0005
6.53 4.50 2.50 2.00 2.50 2.00 5.50 1.50 2.00 5.00 5.00 2.50 1.00 3.00 4.50 2.00 1.50 1.00 5.50 4.80 5.85 support 0.1995 0.8005
5.71 6.00 5.00 3.00 4.00 4.50 5.50 3.50 2.00 4.50 3.50 3.00 3.50 4.50 3.50 2.50 2.00 3.00 3.00 3.85 5.70 support 0.0002 0.9998
4.41 6.00 3.00 5.50 5.00 6.00 5.50 3.00 4.50 6.00 5.00 2.00 2.50 4.00 5.50 2.00 4.00 4.50 3.50 2.55 3.35 oppose 0.9991 0.0009
5.59 6.50 6.00 2.50 6.00 5.50 6.00 1.50 5.00 6.00 5.00 1.50 4.00 1.00 3.50 3.00 2.50 2.50 2.50 2.25 4.80 support 0.9119 0.0881
5.59 4.00 3.50 3.50 2.50 2.50 3.50 2.00 2.50 6.00 3.00 2.50 4.00 4.50 4.00 2.00 2.50 3.00 5.00 3.95 5.50 support 0.0257 0.9743
6.18 4.50 6.50 2.00 5.00 4.00 5.50 3.50 2.50 5.00 5.50 2.00 3.50 4.00 5.00 3.00 2.00 2.50 5.00 5.10 6.20 support 0.0000 1.0000
5.71 2.00 2.50 2.00 2.00 2.50 2.00 2.00 3.00 2.50 3.00 2.00 2.00 3.00 3.50 2.00 2.00 2.00 2.00 3.75 5.75 support 0.0291 0.9709
5.24 6.00 6.00 4.00 4.00 5.00 3.50 3.00 4.50 5.50 5.00 4.50 4.50 1.50 3.00 3.50 4.00 4.50 5.50 2.75 3.55 oppose 0.9994 0.0006


The predictive performance of the machine learning logistic regression model was evaluated with a confusion matrix, model statistics and ROC curve.

Chart 6 confusion matrix summarises predictions by categorising and comparing predicted against the actual response for reaction to change. The confusion matrix calculates 83 per cent accuracy (true positive and true negative). False positive (top left) and false negative (bottom right) predictions account for the remaining 17 per cent.

Table 6 summarises the binomial logistic regression prediction performance.

Table 6 Summary of logistic regression prediction metrics
.metric .estimator .estimate
accuracy binary 0.8302
kap binary 0.6595
sens binary 0.8696
spec binary 0.8000
ppv binary 0.7692
npv binary 0.8889
mcc binary 0.6638
j_index binary 0.6696
bal_accuracy binary 0.8348
detection_prevalence binary 0.4906
precision binary 0.7692
recall binary 0.8696
f_meas binary 0.8163

The ROC curve (receiver operating characteristic curve) plots the true positive rate (sensitivity) against the false positive rate (specificity) at all classification thresholds. AUC (area under the curve) measures the entire two-dimensional area underneath the ROC curve. It evaluates how well a logistic regression model classifies positive and negative outcomes at every possible threshold. An AUC from 0.9 to 1 is regarded as “A” grade in classification performance. Chart 7 illustrates the ROC curve with an AUC of 0.8986 for this model.


References:

Self-efficacy was measured using the ‘Self-efficacy scale: Construction and validation’ by Sherer, Maddux, Mercandante, Prentice-Dunn and Rogers, published in Psychological Reports.
Irrational ideas were measured using the ‘Irrational belief scale’ developed by Malouff and Schutte, published in the Sourcebook of Adult Assessment Strategies, based on Ellis and Harper’s work, published in A New Guide to Rational Living.
Maladaptive defence mechanisms were measured using selected items from ‘The Defense Style Questionnaire’ by Andrews, Singh and Bond, published in The Journal of Nervous and Mental Disease.
Emotion was measured using ‘A semantic differential mood scale’ by Lorr and Wunderlich, published in the Journal of Clinical Psychology.


Session information and package update

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.0 (2024-04-24 ucrt)
##  os       Windows 11 x64 (build 22631)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  English_Australia.utf8
##  ctype    English_Australia.utf8
##  tz       Australia/Brisbane
##  date     2024-07-30
##  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package       * version    date (UTC) lib source
##  backports       1.5.0      2024-05-23 [1] CRAN (R 4.4.0)
##  bestNormalize * 1.9.1      2023-08-18 [1] CRAN (R 4.4.0)
##  broom         * 1.0.6      2024-05-17 [1] CRAN (R 4.4.0)
##  bslib           0.7.0      2024-03-29 [1] CRAN (R 4.4.0)
##  butcher         0.3.4      2024-04-11 [1] CRAN (R 4.4.0)
##  cachem          1.1.0      2024-05-16 [1] CRAN (R 4.4.0)
##  checkmate       2.3.1      2023-12-04 [1] CRAN (R 4.4.0)
##  class           7.3-22     2023-05-03 [2] CRAN (R 4.4.0)
##  cli             3.6.3      2024-06-21 [1] CRAN (R 4.4.1)
##  codetools       0.2-20     2024-03-31 [2] CRAN (R 4.4.0)
##  colorspace      2.1-0      2023-01-23 [1] CRAN (R 4.4.1)
##  cvms          * 1.6.1      2024-02-27 [1] CRAN (R 4.4.0)
##  data.table    * 1.15.4     2024-03-30 [1] CRAN (R 4.4.0)
##  devtools        2.4.5      2022-10-11 [1] CRAN (R 4.4.0)
##  dials         * 1.2.1      2024-02-22 [1] CRAN (R 4.4.0)
##  DiceDesign      1.10       2023-12-07 [1] CRAN (R 4.4.0)
##  digest          0.6.36     2024-06-23 [1] CRAN (R 4.4.1)
##  doParallel      1.0.17     2022-02-07 [1] CRAN (R 4.4.0)
##  doRNG           1.8.6      2023-01-16 [1] CRAN (R 4.4.0)
##  dplyr         * 1.1.4      2023-11-17 [1] CRAN (R 4.4.0)
##  ellipsis        0.3.2      2021-04-29 [1] CRAN (R 4.4.0)
##  evaluate        0.24.0     2024-06-10 [1] CRAN (R 4.4.0)
##  fansi           1.0.6      2023-12-08 [1] CRAN (R 4.4.0)
##  farver          2.1.2      2024-05-13 [1] CRAN (R 4.4.0)
##  fastmap         1.2.0      2024-05-15 [1] CRAN (R 4.4.0)
##  fontawesome     0.5.2      2023-08-19 [1] CRAN (R 4.4.0)
##  forcats       * 1.0.0      2023-01-29 [1] CRAN (R 4.4.0)
##  foreach         1.5.2      2022-02-02 [1] CRAN (R 4.4.0)
##  fs              1.6.4      2024-04-25 [1] CRAN (R 4.4.0)
##  furrr           0.3.1      2022-08-15 [1] CRAN (R 4.4.0)
##  future          1.33.2     2024-03-26 [1] CRAN (R 4.4.0)
##  future.apply    1.11.2     2024-03-28 [1] CRAN (R 4.4.0)
##  generics        0.1.3      2022-07-05 [1] CRAN (R 4.4.0)
##  GGally        * 2.2.1      2024-02-14 [1] CRAN (R 4.4.0)
##  ggplot2       * 3.5.1      2024-04-23 [1] CRAN (R 4.4.0)
##  ggstats         0.6.0      2024-04-05 [1] CRAN (R 4.4.0)
##  globals         0.16.3     2024-03-08 [1] CRAN (R 4.4.0)
##  glue            1.7.0      2024-01-09 [1] CRAN (R 4.4.0)
##  gower           1.0.1      2022-12-22 [1] CRAN (R 4.4.0)
##  GPfit           1.0-8      2019-02-08 [1] CRAN (R 4.4.0)
##  gt            * 0.11.0     2024-07-09 [1] CRAN (R 4.4.1)
##  gtable          0.3.5      2024-04-22 [1] CRAN (R 4.4.0)
##  gtExtras      * 0.5.0      2023-09-15 [1] CRAN (R 4.4.0)
##  hardhat         1.4.0      2024-06-02 [1] CRAN (R 4.4.0)
##  here          * 1.0.1      2020-12-13 [1] CRAN (R 4.4.0)
##  highr           0.11       2024-05-26 [1] CRAN (R 4.4.0)
##  hms             1.1.3      2023-03-21 [1] CRAN (R 4.4.0)
##  htmltools       0.5.8.1    2024-04-04 [1] CRAN (R 4.4.0)
##  htmlwidgets     1.6.4      2023-12-06 [1] CRAN (R 4.4.0)
##  httpuv          1.6.15     2024-03-26 [1] CRAN (R 4.4.0)
##  infer         * 1.0.7      2024-03-25 [1] CRAN (R 4.4.0)
##  ipred           0.9-15     2024-07-18 [1] CRAN (R 4.4.1)
##  iterators       1.0.14     2022-02-05 [1] CRAN (R 4.4.0)
##  jquerylib       0.1.4      2021-04-26 [1] CRAN (R 4.4.0)
##  jsonlite        1.8.8      2023-12-04 [1] CRAN (R 4.4.0)
##  knitr           1.48       2024-07-07 [1] CRAN (R 4.4.1)
##  labeling        0.4.3      2023-08-29 [1] CRAN (R 4.4.0)
##  later           1.3.2      2023-12-06 [1] CRAN (R 4.4.0)
##  lattice         0.22-6     2024-03-20 [2] CRAN (R 4.4.0)
##  lava            1.8.0      2024-03-05 [1] CRAN (R 4.4.0)
##  lhs             1.2.0      2024-06-30 [1] CRAN (R 4.4.1)
##  lifecycle       1.0.4      2023-11-07 [1] CRAN (R 4.4.0)
##  listenv         0.9.1      2024-01-29 [1] CRAN (R 4.4.0)
##  lubridate     * 1.9.3      2023-09-27 [1] CRAN (R 4.4.0)
##  magrittr        2.0.3      2022-03-30 [1] CRAN (R 4.4.0)
##  MASS            7.3-60.2   2024-04-24 [2] local
##  Matrix          1.7-0      2024-03-22 [2] CRAN (R 4.4.0)
##  memoise         2.0.1      2021-11-26 [1] CRAN (R 4.4.0)
##  mime            0.12       2021-09-28 [1] CRAN (R 4.4.0)
##  miniUI          0.1.1.1    2018-05-18 [1] CRAN (R 4.4.0)
##  mnormt          2.1.1      2022-09-26 [1] CRAN (R 4.4.0)
##  modeldata     * 1.4.0      2024-06-19 [1] CRAN (R 4.4.1)
##  munsell         0.5.1      2024-04-01 [1] CRAN (R 4.4.0)
##  nlme            3.1-164    2023-11-27 [2] CRAN (R 4.4.0)
##  nnet            7.3-19     2023-05-03 [2] CRAN (R 4.4.0)
##  nortest         1.0-4      2015-07-30 [1] CRAN (R 4.4.0)
##  paletteer       1.6.0      2024-01-21 [1] CRAN (R 4.4.0)
##  parallelly      1.37.1     2024-02-29 [1] CRAN (R 4.4.0)
##  parsnip       * 1.2.1      2024-03-22 [1] CRAN (R 4.4.0)
##  pillar          1.9.0      2023-03-22 [1] CRAN (R 4.4.0)
##  pkgbuild        1.4.4      2024-03-17 [1] CRAN (R 4.4.0)
##  pkgconfig       2.0.3      2019-09-22 [1] CRAN (R 4.4.0)
##  pkgload         1.4.0      2024-06-28 [1] CRAN (R 4.4.1)
##  plyr            1.8.9      2023-10-02 [1] CRAN (R 4.4.0)
##  pROC            1.18.5     2023-11-01 [1] CRAN (R 4.4.0)
##  prodlim         2024.06.25 2024-06-24 [1] CRAN (R 4.4.1)
##  profvis         0.3.8      2023-05-02 [1] CRAN (R 4.4.0)
##  promises        1.3.0      2024-04-05 [1] CRAN (R 4.4.0)
##  psych         * 2.4.6.26   2024-06-27 [1] CRAN (R 4.4.1)
##  purrr         * 1.0.2      2023-08-10 [1] CRAN (R 4.4.0)
##  R6              2.5.1      2021-08-19 [1] CRAN (R 4.4.0)
##  RColorBrewer    1.1-3      2022-04-03 [1] CRAN (R 4.4.0)
##  Rcpp            1.0.13     2024-07-17 [1] CRAN (R 4.4.1)
##  readr         * 2.1.5      2024-01-10 [1] CRAN (R 4.4.0)
##  recipes       * 1.1.0      2024-07-04 [1] CRAN (R 4.4.1)
##  rematch2        2.1.2      2020-05-01 [1] CRAN (R 4.4.0)
##  remotes         2.5.0      2024-03-17 [1] CRAN (R 4.4.0)
##  rlang           1.1.4      2024-06-04 [1] CRAN (R 4.4.0)
##  rmarkdown       2.27       2024-05-17 [1] CRAN (R 4.4.0)
##  rngtools        1.5.2      2021-09-20 [1] CRAN (R 4.4.0)
##  rpart           4.1.23     2023-12-05 [2] CRAN (R 4.4.0)
##  rprojroot       2.0.4      2023-11-05 [1] CRAN (R 4.4.0)
##  rsample       * 1.2.1      2024-03-25 [1] CRAN (R 4.4.0)
##  rstudioapi      0.16.0     2024-03-24 [1] CRAN (R 4.4.0)
##  sass            0.4.9      2024-03-15 [1] CRAN (R 4.4.0)
##  scales        * 1.3.0      2023-11-28 [1] CRAN (R 4.4.0)
##  sessioninfo     1.2.2      2021-12-06 [1] CRAN (R 4.4.0)
##  shiny           1.8.1.1    2024-04-02 [1] CRAN (R 4.4.0)
##  stringi         1.8.4      2024-05-06 [1] CRAN (R 4.4.0)
##  stringr       * 1.5.1      2023-11-14 [1] CRAN (R 4.4.0)
##  survival        3.5-8      2024-02-14 [2] CRAN (R 4.4.0)
##  tibble        * 3.2.1      2023-03-20 [1] CRAN (R 4.4.0)
##  tidymodels    * 1.2.0      2024-03-25 [1] CRAN (R 4.4.0)
##  tidyr         * 1.3.1      2024-01-24 [1] CRAN (R 4.4.0)
##  tidyselect      1.2.1      2024-03-11 [1] CRAN (R 4.4.0)
##  tidyverse     * 2.0.0      2023-02-22 [1] CRAN (R 4.4.0)
##  timechange      0.3.0      2024-01-18 [1] CRAN (R 4.4.0)
##  timeDate        4032.109   2023-12-14 [1] CRAN (R 4.4.0)
##  tune          * 1.2.1      2024-04-18 [1] CRAN (R 4.4.0)
##  tzdb            0.4.0      2023-05-12 [1] CRAN (R 4.4.0)
##  urlchecker      1.0.1      2021-11-30 [1] CRAN (R 4.4.0)
##  usethis         2.2.3      2024-02-19 [1] CRAN (R 4.4.0)
##  utf8            1.2.4      2023-10-22 [1] CRAN (R 4.4.0)
##  vctrs           0.6.5      2023-12-01 [1] CRAN (R 4.4.0)
##  vip           * 0.4.1      2023-08-21 [1] CRAN (R 4.4.0)
##  withr           3.0.0      2024-01-16 [1] CRAN (R 4.4.0)
##  workflows     * 1.1.4      2024-02-19 [1] CRAN (R 4.4.0)
##  workflowsets  * 1.1.0      2024-03-21 [1] CRAN (R 4.4.0)
##  xfun            0.46       2024-07-18 [1] CRAN (R 4.4.1)
##  xml2            1.3.6      2023-12-04 [1] CRAN (R 4.4.0)
##  xtable          1.8-4      2019-04-21 [1] CRAN (R 4.4.0)
##  yaml            2.3.9      2024-07-05 [1] CRAN (R 4.4.1)
##  yardstick     * 1.3.1      2024-03-21 [1] CRAN (R 4.4.0)
## 
##  [1] C:/Users/wayne/AppData/Local/R/win-library/4.4
##  [2] C:/Program Files/R/R-4.4.0/library
## 
## ──────────────────────────────────────────────────────────────────────────────