Stepwise Binomial Logistic Regression

Objective

According to IBM, logistic regression (also known as logit model) is often used for classification and predictive analytics. Logistic regression estimates the probability of an event occurring based on a given data set of independent variables. Since the outcome is a probability, the dependent variable is bounded between 0 and 1.

The vignette conducts a stepwise logistic regression on a dataset of 208 respondents experiencing significant organisational change. Respondents reported self-efficacy, irrational ideas, maladaptive defence mechanisms, emotion, behavioural intentions and reaction towards change in their organisation.

This vignette has two objectives. First, model and identify statistically significant relationships between the outcome and explanatory variables. Second, predict outcomes and evaluate the accuracy of those predictions.

Workflow

The raw data set was wrangled and tidied before processing. Since this was a logistic regression, the outcome variable, a seven-point Likert scale, was replaced with a binary variable. Next, a brief exploratory analysis comprising a statistical summary, correlation, and comparative analysis was conducted to understand the variables.

Proceeded to conduct a stepwise binomial logistic regression, identifying statistically significant explanatory variables. Reviewed fit statistics for the stepwise model.

In the final section of this vignette, predicted outcomes using the model. Evaluated the model’s prediction performance with a confusion matrix heatmap, model fit statistics and ROC curve.

Results

1. Explore variables

Before building the logit model, the outcome variable (reaction to organisational change) was transformed from an interval variable to a binary variable. The following tables show the conversion of the outcome variable from a seven-point Likert scale (Table 1) to a binary scale (Table 2) with corresponding frequencies. Scale measures opposing change were coded as “0”, and measures supporting change were coded as “1”. The neutral measure on the Likert scale was dropped from the binary scale.

Reaction to change	Freq
Table 1 Original seven-point Likert scale
Totally Oppose	73
Oppose	71
Partially Oppose	90
Neutral	124
Partially Support	189
Support	23
Totally Support	46

Reaction to change	Freq
Table 2 New binary scale for logistic regression
0	184
1	359

The data set was then filtered to only those respondents who reported experiencing significant organisational change. Table 3 provides a statistical summary of the proposed explanatory variables.

variable	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
Table 3 Statistical summary of explanatory variables
self_efficacy	208	5.55	0.81	5.65	5.63	0.78	1.71	6.94	5.24	−1.19	2.71	0.06
needs_approval	208	4.19	1.40	4.00	4.22	1.48	1.00	7.00	6.00	−0.14	−0.73	0.10
fears_failure	208	4.06	1.51	4.00	4.07	1.48	1.00	7.00	6.00	0.00	−0.83	0.10
labelling_blame	208	3.19	1.32	3.00	3.11	1.48	1.00	7.00	6.00	0.55	−0.31	0.09
catastrophising	208	3.70	1.32	3.50	3.66	1.48	1.00	7.00	6.00	0.15	−0.89	0.09
managing_feelings	208	3.88	1.31	4.00	3.88	1.48	1.00	6.50	5.50	−0.05	−0.72	0.09
anxious_thoughts	208	4.10	1.16	4.00	4.13	0.74	1.50	7.00	5.50	−0.17	−0.38	0.08
avoidance	208	2.25	0.99	2.00	2.14	0.74	1.00	5.50	4.50	1.05	0.90	0.07
past_influences	208	2.79	1.30	2.50	2.71	1.48	1.00	6.00	5.00	0.56	−0.76	0.09
facing_reality	208	4.54	1.44	5.00	4.60	1.48	1.00	7.00	6.00	−0.35	−0.80	0.10
passive_existence	208	4.23	1.24	4.00	4.26	1.48	1.00	7.00	6.00	−0.18	−0.46	0.09
dissociation	208	2.74	1.21	2.50	2.61	0.74	1.00	6.50	5.50	0.90	0.24	0.08
displacement	208	3.04	1.28	3.00	2.99	1.48	1.00	7.00	6.00	0.34	−0.50	0.09
isolation_of_affect	208	3.41	1.49	3.50	3.38	2.22	1.00	7.00	6.00	0.16	−0.92	0.10
reaction_formation	208	4.17	1.25	4.00	4.17	1.48	1.50	7.00	5.50	0.07	−0.65	0.09
denial	208	2.51	1.03	2.00	2.44	0.74	1.00	6.50	5.50	0.79	0.40	0.07
projection	208	2.49	1.11	2.00	2.39	0.74	1.00	6.00	5.00	0.88	0.09	0.08
passive_aggression	208	2.62	1.08	2.50	2.52	0.74	1.00	6.50	5.50	0.88	0.55	0.07
acting_out	208	3.55	1.34	3.50	3.54	1.48	1.00	6.50	5.50	0.04	−0.88	0.09
emotion	208	3.80	1.15	3.80	3.79	1.19	1.00	6.90	5.90	0.09	−0.32	0.08
behavioural_intentions	208	5.08	1.13	5.25	5.15	1.26	1.75	7.00	5.25	−0.57	−0.21	0.08

Chart 1 supports Table 1, showing the correlation coefficients between each explanatory variable.

Because of the number of explanatory variables under consideration, prepared two separate pairs plots. Chart 2 compares the relationship between irrational ideas and the outcome variable, reaction to organisational change. Chart 3 compares the relationship between maladaptive defence mechanisms and the outcome variable, reaction to organisational change.

2. Stepwise logistic regression

Conducted a binomial stepwise logistic regression implementing forward selection and backward elimination. Both models derived the same statistically significant explanatory variables.

Table 4 summarises logistic regression fit statistics using the stepwise approach.

term	estimate	std.error	statistic	p.value
Table 4 Stepwise logistic regression model fit statistics (arranged by p.value)
(Intercept)	−27.3799	4.6754	−5.8562	0.0000
behavioural_intentions	3.0873	0.5860	5.2685	0.0000
emotion	2.3993	0.5055	4.7461	0.0000
reaction_formation	−0.7021	0.2517	−2.7898	0.0053
past_influences	0.8061	0.2986	2.6996	0.0069
anxious_thoughts	0.6389	0.2959	2.1593	0.0308
avoidance	0.6689	0.3334	2.0062	0.0448

In place of the coefficient of determination (R²) as a measure of fit, a pseudo-R2 value is adopted when the outcome variable is nominal or ordinal. There are several variants of pseudo-R2. Table 5 shows pseudo-R2 ranging from 0.6908 to 0.8996 for the selected variants.

variant	pseudo-R2
Table 5 Pseudo-R2 variants for stepwise model
McFadden	0.6908
Nagelkerke	0.8199
VeallZimmermann	0.8409
McKelveyZavoina	0.8996

3. Prediction and evaluation

3.1 Model prediction

The next step involved predicting each respondent’s reaction to organisational change. Table 6 shows the predicted reaction compared to the actual reaction in a small sample of cases, along with significant explanatory variables.

anxious_thoughts	avoidance	past_influences	reaction_formation	emotion	behavioural_intentions	reaction	prediction
Table 6 Sample of model predictions
3.00	1.00	4.50	3.00	2.95	4.40	0	0.0684
5.50	2.00	5.50	4.00	2.45	4.40	0	0.1914
4.50	2.50	4.00	4.00	2.85	3.20	0	0.0033
5.50	1.50	3.50	2.00	2.05	5.05	0	0.2816
4.00	1.00	1.50	4.00	2.55	4.90	0	0.0109
6.00	3.00	5.50	4.00	4.50	6.25	1	1.0000
3.50	2.00	3.50	4.00	1.90	1.85	0	0.0000
5.50	1.50	2.00	4.50	4.80	5.85	1	0.9943
5.00	1.00	3.50	4.00	4.70	4.65	0	0.8936
6.00	1.50	2.00	3.50	2.70	4.90	0	0.1439
2.00	2.00	2.00	5.50	2.35	4.35	0	0.0004
4.00	1.00	2.00	3.00	3.80	5.05	1	0.5145
6.50	4.00	1.00	4.00	3.95	5.80	1	0.9921
4.00	1.50	4.00	2.50	3.60	5.10	1	0.8839
5.50	3.50	2.00	3.50	3.85	5.70	1	0.9886

3.2 Model evaluation

The predictive performance of the stepwise model was evaluated with a confusion matrix, model statistics and ROC curve.

Chart 4 confusion matrix summarises predictions by categorising and comparing predicted against the actual response for reaction to change. The confusion matrix shows good performance for the stepwise model, recording 92.8 per cent accuracy (true positive and true negative). False positive (top left) and false negative (bottom right) predictions account for the remaining 7.2 per cent.

Table 7 summarises the stepwise model prediction performance.

.metric	.estimator	.estimate
Table 7 Summary of model prediction metrics
accuracy	binary	0.9279
kap	binary	0.8529
sens	binary	0.9328
spec	binary	0.9213
ppv	binary	0.9407
npv	binary	0.9111
mcc	binary	0.8530
j_index	binary	0.8541
bal_accuracy	binary	0.9271
detection_prevalence	binary	0.5673
precision	binary	0.9407
recall	binary	0.9328
f_meas	binary	0.9367

The ROC curve (receiver operating characteristic curve) plots the true positive rate (sensitivity) against the false positive rate (specificity) at all classification thresholds. AUC (area under the curve) measures the entire two-dimensional area underneath the ROC curve. It evaluates how well a logistic regression model classifies positive and negative outcomes at every possible threshold. An AUC from 0.9 to 1 is regarded as “A” grade in classification performance. Chart 5 illustrates the ROC curve for the stepwise model with AUC of 0.972.

References:

Self-efficacy was measured using the ‘Self-efficacy scale: Construction and validation’ by Sherer, Maddux, Mercandante, Prentice-Dunn and Rogers, published in Psychological Reports.
Irrational ideas were measured using the ‘Irrational belief scale’ developed by Malouff and Schutte, published in the Sourcebook of Adult Assessment Strategies, based on Ellis and Harper’s work, published in A New Guide to Rational Living.
Maladaptive defence mechanisms were measured using selected items from ‘The Defense Style Questionnaire’ by Andrews, Singh and Bond, published in The Journal of Nervous and Mental Disease.
Emotion was measured using ‘A semantic differential mood scale’ by Lorr and Wunderlich, published in the Journal of Clinical Psychology.

Session information and package update

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.0 (2024-04-24 ucrt)
##  os       Windows 11 x64 (build 22631)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  English_Australia.utf8
##  ctype    English_Australia.utf8
##  tz       Australia/Brisbane
##  date     2024-07-30
##  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package      * version    date (UTC) lib source
##  backports      1.5.0      2024-05-23 [1] CRAN (R 4.4.0)
##  boot           1.3-30     2024-02-26 [2] CRAN (R 4.4.0)
##  broom        * 1.0.6      2024-05-17 [1] CRAN (R 4.4.0)
##  bslib          0.7.0      2024-03-29 [1] CRAN (R 4.4.0)
##  cachem         1.1.0      2024-05-16 [1] CRAN (R 4.4.0)
##  cellranger     1.1.0      2016-07-27 [1] CRAN (R 4.4.0)
##  checkmate      2.3.1      2023-12-04 [1] CRAN (R 4.4.0)
##  class          7.3-22     2023-05-03 [2] CRAN (R 4.4.0)
##  cli            3.6.3      2024-06-21 [1] CRAN (R 4.4.1)
##  codetools      0.2-20     2024-03-31 [2] CRAN (R 4.4.0)
##  colorspace     2.1-0      2023-01-23 [1] CRAN (R 4.4.1)
##  cvms         * 1.6.1      2024-02-27 [1] CRAN (R 4.4.0)
##  data.table   * 1.15.4     2024-03-30 [1] CRAN (R 4.4.0)
##  DescTools    * 0.99.54    2024-02-03 [1] CRAN (R 4.4.0)
##  devtools       2.4.5      2022-10-11 [1] CRAN (R 4.4.0)
##  dials        * 1.2.1      2024-02-22 [1] CRAN (R 4.4.0)
##  DiceDesign     1.10       2023-12-07 [1] CRAN (R 4.4.0)
##  digest         0.6.36     2024-06-23 [1] CRAN (R 4.4.1)
##  dplyr        * 1.1.4      2023-11-17 [1] CRAN (R 4.4.0)
##  e1071          1.7-14     2023-12-06 [1] CRAN (R 4.4.0)
##  ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.4.0)
##  evaluate       0.24.0     2024-06-10 [1] CRAN (R 4.4.0)
##  Exact          3.3        2024-07-21 [1] CRAN (R 4.4.1)
##  expm           0.999-9    2024-01-11 [1] CRAN (R 4.4.0)
##  fansi          1.0.6      2023-12-08 [1] CRAN (R 4.4.0)
##  farver         2.1.2      2024-05-13 [1] CRAN (R 4.4.0)
##  fastmap        1.2.0      2024-05-15 [1] CRAN (R 4.4.0)
##  fontawesome    0.5.2      2023-08-19 [1] CRAN (R 4.4.0)
##  forcats      * 1.0.0      2023-01-29 [1] CRAN (R 4.4.0)
##  foreach        1.5.2      2022-02-02 [1] CRAN (R 4.4.0)
##  fs             1.6.4      2024-04-25 [1] CRAN (R 4.4.0)
##  furrr          0.3.1      2022-08-15 [1] CRAN (R 4.4.0)
##  future         1.33.2     2024-03-26 [1] CRAN (R 4.4.0)
##  future.apply   1.11.2     2024-03-28 [1] CRAN (R 4.4.0)
##  generics       0.1.3      2022-07-05 [1] CRAN (R 4.4.0)
##  GGally       * 2.2.1      2024-02-14 [1] CRAN (R 4.4.0)
##  ggplot2      * 3.5.1      2024-04-23 [1] CRAN (R 4.4.0)
##  ggstats        0.6.0      2024-04-05 [1] CRAN (R 4.4.0)
##  gld            2.6.6      2022-10-23 [1] CRAN (R 4.4.0)
##  globals        0.16.3     2024-03-08 [1] CRAN (R 4.4.0)
##  glue           1.7.0      2024-01-09 [1] CRAN (R 4.4.0)
##  gower          1.0.1      2022-12-22 [1] CRAN (R 4.4.0)
##  GPfit          1.0-8      2019-02-08 [1] CRAN (R 4.4.0)
##  gt           * 0.11.0     2024-07-09 [1] CRAN (R 4.4.1)
##  gtable         0.3.5      2024-04-22 [1] CRAN (R 4.4.0)
##  gtExtras     * 0.5.0      2023-09-15 [1] CRAN (R 4.4.0)
##  hardhat        1.4.0      2024-06-02 [1] CRAN (R 4.4.0)
##  here         * 1.0.1      2020-12-13 [1] CRAN (R 4.4.0)
##  highr          0.11       2024-05-26 [1] CRAN (R 4.4.0)
##  hms            1.1.3      2023-03-21 [1] CRAN (R 4.4.0)
##  htmltools      0.5.8.1    2024-04-04 [1] CRAN (R 4.4.0)
##  htmlwidgets    1.6.4      2023-12-06 [1] CRAN (R 4.4.0)
##  httpuv         1.6.15     2024-03-26 [1] CRAN (R 4.4.0)
##  httr           1.4.7      2023-08-15 [1] CRAN (R 4.4.0)
##  infer        * 1.0.7      2024-03-25 [1] CRAN (R 4.4.0)
##  ipred          0.9-15     2024-07-18 [1] CRAN (R 4.4.1)
##  iterators      1.0.14     2022-02-05 [1] CRAN (R 4.4.0)
##  jquerylib      0.1.4      2021-04-26 [1] CRAN (R 4.4.0)
##  jsonlite       1.8.8      2023-12-04 [1] CRAN (R 4.4.0)
##  knitr          1.48       2024-07-07 [1] CRAN (R 4.4.1)
##  labeling       0.4.3      2023-08-29 [1] CRAN (R 4.4.0)
##  later          1.3.2      2023-12-06 [1] CRAN (R 4.4.0)
##  lattice        0.22-6     2024-03-20 [2] CRAN (R 4.4.0)
##  lava           1.8.0      2024-03-05 [1] CRAN (R 4.4.0)
##  lhs            1.2.0      2024-06-30 [1] CRAN (R 4.4.1)
##  lifecycle      1.0.4      2023-11-07 [1] CRAN (R 4.4.0)
##  listenv        0.9.1      2024-01-29 [1] CRAN (R 4.4.0)
##  lmom           3.0        2023-08-29 [1] CRAN (R 4.4.0)
##  lubridate    * 1.9.3      2023-09-27 [1] CRAN (R 4.4.0)
##  magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.4.0)
##  MASS           7.3-60.2   2024-04-24 [2] local
##  Matrix         1.7-0      2024-03-22 [2] CRAN (R 4.4.0)
##  memoise        2.0.1      2021-11-26 [1] CRAN (R 4.4.0)
##  mime           0.12       2021-09-28 [1] CRAN (R 4.4.0)
##  miniUI         0.1.1.1    2018-05-18 [1] CRAN (R 4.4.0)
##  mnormt         2.1.1      2022-09-26 [1] CRAN (R 4.4.0)
##  modeldata    * 1.4.0      2024-06-19 [1] CRAN (R 4.4.1)
##  munsell        0.5.1      2024-04-01 [1] CRAN (R 4.4.0)
##  mvtnorm        1.2-5      2024-05-21 [1] CRAN (R 4.4.0)
##  nlme           3.1-164    2023-11-27 [2] CRAN (R 4.4.0)
##  nnet           7.3-19     2023-05-03 [2] CRAN (R 4.4.0)
##  paletteer      1.6.0      2024-01-21 [1] CRAN (R 4.4.0)
##  parallelly     1.37.1     2024-02-29 [1] CRAN (R 4.4.0)
##  parsnip      * 1.2.1      2024-03-22 [1] CRAN (R 4.4.0)
##  pillar         1.9.0      2023-03-22 [1] CRAN (R 4.4.0)
##  pkgbuild       1.4.4      2024-03-17 [1] CRAN (R 4.4.0)
##  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.4.0)
##  pkgload        1.4.0      2024-06-28 [1] CRAN (R 4.4.1)
##  plyr           1.8.9      2023-10-02 [1] CRAN (R 4.4.0)
##  pROC         * 1.18.5     2023-11-01 [1] CRAN (R 4.4.0)
##  prodlim        2024.06.25 2024-06-24 [1] CRAN (R 4.4.1)
##  profvis        0.3.8      2023-05-02 [1] CRAN (R 4.4.0)
##  promises       1.3.0      2024-04-05 [1] CRAN (R 4.4.0)
##  proxy          0.4-27     2022-06-09 [1] CRAN (R 4.4.0)
##  psych        * 2.4.6.26   2024-06-27 [1] CRAN (R 4.4.1)
##  purrr        * 1.0.2      2023-08-10 [1] CRAN (R 4.4.0)
##  R6             2.5.1      2021-08-19 [1] CRAN (R 4.4.0)
##  RColorBrewer   1.1-3      2022-04-03 [1] CRAN (R 4.4.0)
##  Rcpp           1.0.13     2024-07-17 [1] CRAN (R 4.4.1)
##  readr        * 2.1.5      2024-01-10 [1] CRAN (R 4.4.0)
##  readxl         1.4.3      2023-07-06 [1] CRAN (R 4.4.0)
##  recipes      * 1.1.0      2024-07-04 [1] CRAN (R 4.4.1)
##  rematch2       2.1.2      2020-05-01 [1] CRAN (R 4.4.0)
##  remotes        2.5.0      2024-03-17 [1] CRAN (R 4.4.0)
##  rlang          1.1.4      2024-06-04 [1] CRAN (R 4.4.0)
##  rmarkdown      2.27       2024-05-17 [1] CRAN (R 4.4.0)
##  rootSolve      1.8.2.4    2023-09-21 [1] CRAN (R 4.4.0)
##  rpart          4.1.23     2023-12-05 [2] CRAN (R 4.4.0)
##  rprojroot      2.0.4      2023-11-05 [1] CRAN (R 4.4.0)
##  rsample      * 1.2.1      2024-03-25 [1] CRAN (R 4.4.0)
##  rstudioapi     0.16.0     2024-03-24 [1] CRAN (R 4.4.0)
##  sass           0.4.9      2024-03-15 [1] CRAN (R 4.4.0)
##  scales       * 1.3.0      2023-11-28 [1] CRAN (R 4.4.0)
##  sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.4.0)
##  shiny          1.8.1.1    2024-04-02 [1] CRAN (R 4.4.0)
##  stringi        1.8.4      2024-05-06 [1] CRAN (R 4.4.0)
##  stringr      * 1.5.1      2023-11-14 [1] CRAN (R 4.4.0)
##  survival       3.5-8      2024-02-14 [2] CRAN (R 4.4.0)
##  tibble       * 3.2.1      2023-03-20 [1] CRAN (R 4.4.0)
##  tidymodels   * 1.2.0      2024-03-25 [1] CRAN (R 4.4.0)
##  tidyr        * 1.3.1      2024-01-24 [1] CRAN (R 4.4.0)
##  tidyselect     1.2.1      2024-03-11 [1] CRAN (R 4.4.0)
##  tidyverse    * 2.0.0      2023-02-22 [1] CRAN (R 4.4.0)
##  timechange     0.3.0      2024-01-18 [1] CRAN (R 4.4.0)
##  timeDate       4032.109   2023-12-14 [1] CRAN (R 4.4.0)
##  tune         * 1.2.1      2024-04-18 [1] CRAN (R 4.4.0)
##  tzdb           0.4.0      2023-05-12 [1] CRAN (R 4.4.0)
##  urlchecker     1.0.1      2021-11-30 [1] CRAN (R 4.4.0)
##  usethis        2.2.3      2024-02-19 [1] CRAN (R 4.4.0)
##  utf8           1.2.4      2023-10-22 [1] CRAN (R 4.4.0)
##  vctrs          0.6.5      2023-12-01 [1] CRAN (R 4.4.0)
##  withr          3.0.0      2024-01-16 [1] CRAN (R 4.4.0)
##  workflows    * 1.1.4      2024-02-19 [1] CRAN (R 4.4.0)
##  workflowsets * 1.1.0      2024-03-21 [1] CRAN (R 4.4.0)
##  xfun           0.46       2024-07-18 [1] CRAN (R 4.4.1)
##  xml2           1.3.6      2023-12-04 [1] CRAN (R 4.4.0)
##  xtable         1.8-4      2019-04-21 [1] CRAN (R 4.4.0)
##  yaml           2.3.9      2024-07-05 [1] CRAN (R 4.4.1)
##  yardstick    * 1.3.1      2024-03-21 [1] CRAN (R 4.4.0)
## 
##  [1] C:/Users/wayne/AppData/Local/R/win-library/4.4
##  [2] C:/Program Files/R/R-4.4.0/library
## 
## ──────────────────────────────────────────────────────────────────────────────