Multivariate Binomial Logistic Regression

Objective

According to IBM, logistic regression (also known as logit model) is often used for classification and predictive analytics. Logistic regression estimates the probability of an event occurring based on a given data set of independent variables. Since the outcome is a probability, the dependent variable is bounded between 0 and 1.

This vignette explores overall satisfaction with service quality. Service quality is critical for business success. It is essential for attracting new customers and retaining existing customers. This vignette demonstrates a multivariate binomial logistic regression with a binary outcome variable (overall level of satisfaction) and several explanatory variables (service quality measures). The data set contains observations from 395 respondents in a business-to-business (B2B) relationship. Gathered data using a custom-designed survey instrument based on the SERVQUAL theoretical framework. SERVQUAL consists of five dimensions described as follows:

Tangibles – the appearance of physical facilities, equipment, personnel and communication materials
Assurance – knowledge and ability to inspire trust and confidence
Reliability – ability to perform service dependably and accurately
Responsiveness – willingness to help and provide prompt service
Empathy – providing caring and individualised attention.

This vignette has two objectives. First, model and identify statistically significant relationships between the outcome and explanatory variables. Second, predict outcomes and evaluate the accuracy of those predictions.

Workflow

The raw data set was wrangled and tidied before processing. Since this was a logistic regression, the outcome variable, a seven-point Likert scale, was replaced with a binary variable. Conducted a brief exploratory analysis comprising a statistical summary and visualisations to understand the variables.

Proceeded to conduct a multivariate binomial logistic regression on the service quality data set, identifying the significance of explanatory variables (Model 1). Explored the most significant explanatory variable in more detail by conducting a simple binomial logistic regression and visualising the result (Model 2).

In the final section of this vignette, predicted outcomes with Model 1. Evaluated the model’s prediction performance with a confusion matrix heatmap, model fit statistics and ROC curve.

Results

1. Explore variables

Before building the logit model, the outcome variable (overall satisfaction) was transformed from an interval variable to a binary variable. The following tables show the conversion of the outcome variable from a seven-point Likert scale (Table 1) to a binary scale (Table 2) with corresponding frequencies. Scale measures recording dissatisfaction were coded as “0”, and measures recording satisfaction were coded as “1”. The neutral measure on the Likert scale was dropped from the binary scale.

Overall level of satisfaction	Freq
Table 1 Original seven-point Likert scale
Very Dissatisfied	3
Dissatisfied	15
Partially Dissatisfied	16
Neutral	29
Partially Satisfied	39
Satisfied	245
Very Satisfied	77

Overall level of satisfaction	Freq
Table 2 New binary scale for logistic regression
0	34
1	361

Table 3 is a statistical summary of the five explanatory variables.

variable	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
Table 3 Statistical summary of service quality measures
tangibles	395	5.99	0.86	6.00	6.11	0.74	2.00	7.00	5.00	−1.80	4.83	0.04
assurance	395	5.62	1.03	6.00	5.72	0.74	2.00	7.00	5.00	−1.06	0.76	0.05
reliability	395	5.54	1.28	6.00	5.73	0.74	1.00	7.00	6.00	−1.34	1.29	0.06
responsiveness	395	5.40	1.35	6.00	5.58	1.48	1.00	7.00	6.00	−1.10	0.77	0.07
empathy	395	5.48	1.14	5.50	5.58	0.74	1.00	7.00	6.00	−0.93	1.11	0.06

Chart 1 box plot shows the distribution of the five explanatory variables. All explanatory variables indicate favourable levels of service quality.

Chart 2 pairs plot compares the relationship between explanatory variables and the binary outcome variable, overall satisfaction.

2. Build and fit models

2.1 Model 1 Multivariate logistic regression

Model 1 was formulated to include all five explanatory service quality measures. Output 1 presents the fit statistics for Model 1.

Output 1 Results of multivariate logistic regression


Call:
glm(formula = overall_satisfaction ~ ., family = binomial(link = "logit"), 
    data = b2b_binomial_df)

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -11.3168     2.4099  -4.696 2.65e-06 ***
tangibles        0.2753     0.3735   0.737 0.461099    
assurance        0.3168     0.4583   0.691 0.489364    
reliability      0.5265     0.3347   1.573 0.115740    
responsiveness   1.4966     0.3978   3.762 0.000169 ***
empathy          0.5142     0.4085   1.259 0.208082    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 231.757  on 394  degrees of freedom
Residual deviance:  62.493  on 389  degrees of freedom
AIC: 74.493

Number of Fisher Scoring iterations: 8

In place of the coefficient of determination (R²) as a measure of fit, a pseudo-R2 value is adopted when the outcome variable is nominal or ordinal. There are several variants of pseudo-R2. Table 4 shows pseudo-R2 for Model 1, ranging between 0.73035 to 0.8112 for selected variants.

variant	pseudo-R2
Table 4 Pseudo-R2 variants for Model 1
McFadden	0.7303
Nagelkerke	0.7852
VeallZimmermann	0.8112
McKelveyZavoina	0.7879

Output 1 highlights “responsiveness” as highly significant and associated with overall satisfaction.

2.2 Model 2 Univariate logistic regression

Based on the findings of Model 1, Model 2 is a univariate logistic regression between the outcome variable and the explanatory variable, responsiveness. Output 2 shows fit statistics for Model 2.

Output 2 Results of univariate logistic regression


Call:
glm(formula = overall_satisfaction ~ responsiveness, family = binomial(link = "logit"), 
    data = b2b_binomial_df)

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)      -6.667      1.151  -5.795 6.84e-09 ***
responsiveness    2.213      0.320   6.916 4.66e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 231.76  on 394  degrees of freedom
Residual deviance:  76.00  on 393  degrees of freedom
AIC: 80

Number of Fisher Scoring iterations: 8

Chart 3 visualises Model 2, the univariate logistic regression. Chart 3 illustrates low levels of responsiveness associated with client dissatisfaction. In contrast, high levels of responsiveness are associated with a high probability of client satisfaction.

Table 5 shows pseudo-R2 for responsiveness, and the overall level of client satisfaction ranges between 0.6721 to 0.7648 for the different variants.

variant	pseudo-R2
Table 5 Pseudo-R2 variants for Model 2
McFadden	0.6721
Nagelkerke	0.7342
VeallZimmermann	0.7648
McKelveyZavoina	0.7288

3. Prediction and evaluation

3.1 Model 1 Prediction

Implemented a fitting function for Model 1 to predict overall satisfaction for each observation in the data set. Table 6 shows the actual overall satisfaction compared to the predicted overall satisfaction in a small sample of cases.

tangibles	assurance	reliability	responsiveness	empathy	overall_satisfaction	prediction
Table 6 Sample of Model 1 predictions
7.0	6.5	6.5	6.5	7.0	1	0.9999
4.5	4.0	6.0	5.5	4.5	1	0.9926
6.5	5.5	4.5	2.5	3.5	1	0.5314
6.0	6.0	5.5	6.0	6.0	1	0.9993
7.0	7.0	7.0	7.0	7.0	1	1.0000
6.0	6.0	6.0	5.0	5.5	1	0.9967
6.0	5.5	5.0	5.5	3.0	1	0.9888
3.0	4.0	4.0	1.5	2.5	0	0.0269
6.0	6.0	6.0	4.5	4.0	1	0.9850
6.0	5.0	3.0	2.0	4.0	0	0.1898
6.0	6.0	6.0	5.5	5.0	1	0.9980
5.0	5.0	5.0	5.0	5.0	1	0.9870
4.5	3.0	2.0	2.0	4.5	0	0.0591
2.0	3.5	3.0	2.5	3.5	0	0.0733
6.5	4.0	6.0	5.0	5.5	1	0.9946

3.2 Model 1 Evaluation

The predictive performance of Model 1 was evaluated with a confusion matrix, model statistics and ROC curve.

Chart 4 confusion matrix summarises predictions by categorising and comparing predicted against the actual response for client satisfaction. The confusion matrix shows good performance for Model 1, recording 97.7 per cent accuracy (true positive and true negative). False positive (top left) and false negative (bottom right) predictions account for the remaining 2.3 per cent.

Table 7 summarises Model 1 prediction performance.

.metric	.estimator	.estimate
Table 7 Summary of Model 1 prediction metrics
accuracy	binary	0.9772
kap	binary	0.8492
sens	binary	0.9917
spec	binary	0.8235
ppv	binary	0.9835
npv	binary	0.9032
mcc	binary	0.8502
j_index	binary	0.8152
bal_accuracy	binary	0.9076
detection_prevalence	binary	0.9215
precision	binary	0.9835
recall	binary	0.9917
f_meas	binary	0.9876

The ROC curve (receiver operating characteristic curve) plots the true positive rate (sensitivity) against the false positive rate (specificity) at all classification thresholds. AUC (area under the curve) measures the entire two-dimensional area underneath the ROC curve. It evaluates how well a logistic regression model classifies positive and negative outcomes at every possible threshold. An AUC from 0.9 to 1 is regarded as “A” grade in classification performance. Chart 5 illustrates ROC curve for Model 1 with an AUC of 0.985.

Reference:

Data was gathered using a custom-designed survey instrument based on the SERVQUAL theoretical framework. The SERVQUAL methodology is documented in Delivering Quality Service: Balancing Customer Perceptions and Expectations by Zeithaml, Parasuraman and Berry.

Session information and package update

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.0 (2024-04-24 ucrt)
##  os       Windows 11 x64 (build 22631)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  English_Australia.utf8
##  ctype    English_Australia.utf8
##  tz       Australia/Brisbane
##  date     2024-07-30
##  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package      * version    date (UTC) lib source
##  backports      1.5.0      2024-05-23 [1] CRAN (R 4.4.0)
##  boot           1.3-30     2024-02-26 [2] CRAN (R 4.4.0)
##  broom        * 1.0.6      2024-05-17 [1] CRAN (R 4.4.0)
##  bslib          0.7.0      2024-03-29 [1] CRAN (R 4.4.0)
##  cachem         1.1.0      2024-05-16 [1] CRAN (R 4.4.0)
##  cellranger     1.1.0      2016-07-27 [1] CRAN (R 4.4.0)
##  checkmate      2.3.1      2023-12-04 [1] CRAN (R 4.4.0)
##  class          7.3-22     2023-05-03 [2] CRAN (R 4.4.0)
##  cli            3.6.3      2024-06-21 [1] CRAN (R 4.4.1)
##  codetools      0.2-20     2024-03-31 [2] CRAN (R 4.4.0)
##  colorspace     2.1-0      2023-01-23 [1] CRAN (R 4.4.1)
##  cvms         * 1.6.1      2024-02-27 [1] CRAN (R 4.4.0)
##  data.table   * 1.15.4     2024-03-30 [1] CRAN (R 4.4.0)
##  DescTools    * 0.99.54    2024-02-03 [1] CRAN (R 4.4.0)
##  devtools       2.4.5      2022-10-11 [1] CRAN (R 4.4.0)
##  dials        * 1.2.1      2024-02-22 [1] CRAN (R 4.4.0)
##  DiceDesign     1.10       2023-12-07 [1] CRAN (R 4.4.0)
##  digest         0.6.36     2024-06-23 [1] CRAN (R 4.4.1)
##  dplyr        * 1.1.4      2023-11-17 [1] CRAN (R 4.4.0)
##  e1071          1.7-14     2023-12-06 [1] CRAN (R 4.4.0)
##  ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.4.0)
##  evaluate       0.24.0     2024-06-10 [1] CRAN (R 4.4.0)
##  Exact          3.3        2024-07-21 [1] CRAN (R 4.4.1)
##  expm           0.999-9    2024-01-11 [1] CRAN (R 4.4.0)
##  fansi          1.0.6      2023-12-08 [1] CRAN (R 4.4.0)
##  farver         2.1.2      2024-05-13 [1] CRAN (R 4.4.0)
##  fastmap        1.2.0      2024-05-15 [1] CRAN (R 4.4.0)
##  fontawesome    0.5.2      2023-08-19 [1] CRAN (R 4.4.0)
##  forcats      * 1.0.0      2023-01-29 [1] CRAN (R 4.4.0)
##  foreach        1.5.2      2022-02-02 [1] CRAN (R 4.4.0)
##  fs             1.6.4      2024-04-25 [1] CRAN (R 4.4.0)
##  furrr          0.3.1      2022-08-15 [1] CRAN (R 4.4.0)
##  future         1.33.2     2024-03-26 [1] CRAN (R 4.4.0)
##  future.apply   1.11.2     2024-03-28 [1] CRAN (R 4.4.0)
##  generics       0.1.3      2022-07-05 [1] CRAN (R 4.4.0)
##  GGally       * 2.2.1      2024-02-14 [1] CRAN (R 4.4.0)
##  ggplot2      * 3.5.1      2024-04-23 [1] CRAN (R 4.4.0)
##  ggstats        0.6.0      2024-04-05 [1] CRAN (R 4.4.0)
##  gld            2.6.6      2022-10-23 [1] CRAN (R 4.4.0)
##  globals        0.16.3     2024-03-08 [1] CRAN (R 4.4.0)
##  glue           1.7.0      2024-01-09 [1] CRAN (R 4.4.0)
##  gower          1.0.1      2022-12-22 [1] CRAN (R 4.4.0)
##  GPfit          1.0-8      2019-02-08 [1] CRAN (R 4.4.0)
##  gt           * 0.11.0     2024-07-09 [1] CRAN (R 4.4.1)
##  gtable         0.3.5      2024-04-22 [1] CRAN (R 4.4.0)
##  gtExtras     * 0.5.0      2023-09-15 [1] CRAN (R 4.4.0)
##  hardhat        1.4.0      2024-06-02 [1] CRAN (R 4.4.0)
##  here         * 1.0.1      2020-12-13 [1] CRAN (R 4.4.0)
##  highr          0.11       2024-05-26 [1] CRAN (R 4.4.0)
##  hms            1.1.3      2023-03-21 [1] CRAN (R 4.4.0)
##  htmltools      0.5.8.1    2024-04-04 [1] CRAN (R 4.4.0)
##  htmlwidgets    1.6.4      2023-12-06 [1] CRAN (R 4.4.0)
##  httpuv         1.6.15     2024-03-26 [1] CRAN (R 4.4.0)
##  httr           1.4.7      2023-08-15 [1] CRAN (R 4.4.0)
##  infer        * 1.0.7      2024-03-25 [1] CRAN (R 4.4.0)
##  ipred          0.9-15     2024-07-18 [1] CRAN (R 4.4.1)
##  iterators      1.0.14     2022-02-05 [1] CRAN (R 4.4.0)
##  jquerylib      0.1.4      2021-04-26 [1] CRAN (R 4.4.0)
##  jsonlite       1.8.8      2023-12-04 [1] CRAN (R 4.4.0)
##  knitr          1.48       2024-07-07 [1] CRAN (R 4.4.1)
##  labeling       0.4.3      2023-08-29 [1] CRAN (R 4.4.0)
##  later          1.3.2      2023-12-06 [1] CRAN (R 4.4.0)
##  lattice        0.22-6     2024-03-20 [2] CRAN (R 4.4.0)
##  lava           1.8.0      2024-03-05 [1] CRAN (R 4.4.0)
##  lhs            1.2.0      2024-06-30 [1] CRAN (R 4.4.1)
##  lifecycle      1.0.4      2023-11-07 [1] CRAN (R 4.4.0)
##  listenv        0.9.1      2024-01-29 [1] CRAN (R 4.4.0)
##  lmom           3.0        2023-08-29 [1] CRAN (R 4.4.0)
##  lubridate    * 1.9.3      2023-09-27 [1] CRAN (R 4.4.0)
##  magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.4.0)
##  MASS           7.3-60.2   2024-04-24 [2] local
##  Matrix         1.7-0      2024-03-22 [2] CRAN (R 4.4.0)
##  memoise        2.0.1      2021-11-26 [1] CRAN (R 4.4.0)
##  mgcv           1.9-1      2023-12-21 [2] CRAN (R 4.4.0)
##  mime           0.12       2021-09-28 [1] CRAN (R 4.4.0)
##  miniUI         0.1.1.1    2018-05-18 [1] CRAN (R 4.4.0)
##  mnormt         2.1.1      2022-09-26 [1] CRAN (R 4.4.0)
##  modeldata    * 1.4.0      2024-06-19 [1] CRAN (R 4.4.1)
##  munsell        0.5.1      2024-04-01 [1] CRAN (R 4.4.0)
##  mvtnorm        1.2-5      2024-05-21 [1] CRAN (R 4.4.0)
##  nlme           3.1-164    2023-11-27 [2] CRAN (R 4.4.0)
##  nnet           7.3-19     2023-05-03 [2] CRAN (R 4.4.0)
##  paletteer      1.6.0      2024-01-21 [1] CRAN (R 4.4.0)
##  parallelly     1.37.1     2024-02-29 [1] CRAN (R 4.4.0)
##  parsnip      * 1.2.1      2024-03-22 [1] CRAN (R 4.4.0)
##  pillar         1.9.0      2023-03-22 [1] CRAN (R 4.4.0)
##  pkgbuild       1.4.4      2024-03-17 [1] CRAN (R 4.4.0)
##  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.4.0)
##  pkgload        1.4.0      2024-06-28 [1] CRAN (R 4.4.1)
##  plyr           1.8.9      2023-10-02 [1] CRAN (R 4.4.0)
##  pROC         * 1.18.5     2023-11-01 [1] CRAN (R 4.4.0)
##  prodlim        2024.06.25 2024-06-24 [1] CRAN (R 4.4.1)
##  profvis        0.3.8      2023-05-02 [1] CRAN (R 4.4.0)
##  promises       1.3.0      2024-04-05 [1] CRAN (R 4.4.0)
##  proxy          0.4-27     2022-06-09 [1] CRAN (R 4.4.0)
##  psych        * 2.4.6.26   2024-06-27 [1] CRAN (R 4.4.1)
##  purrr        * 1.0.2      2023-08-10 [1] CRAN (R 4.4.0)
##  R6             2.5.1      2021-08-19 [1] CRAN (R 4.4.0)
##  RColorBrewer   1.1-3      2022-04-03 [1] CRAN (R 4.4.0)
##  Rcpp           1.0.13     2024-07-17 [1] CRAN (R 4.4.1)
##  readr        * 2.1.5      2024-01-10 [1] CRAN (R 4.4.0)
##  readxl         1.4.3      2023-07-06 [1] CRAN (R 4.4.0)
##  recipes      * 1.1.0      2024-07-04 [1] CRAN (R 4.4.1)
##  rematch2       2.1.2      2020-05-01 [1] CRAN (R 4.4.0)
##  remotes        2.5.0      2024-03-17 [1] CRAN (R 4.4.0)
##  rlang          1.1.4      2024-06-04 [1] CRAN (R 4.4.0)
##  rmarkdown      2.27       2024-05-17 [1] CRAN (R 4.4.0)
##  rootSolve      1.8.2.4    2023-09-21 [1] CRAN (R 4.4.0)
##  rpart          4.1.23     2023-12-05 [2] CRAN (R 4.4.0)
##  rprojroot      2.0.4      2023-11-05 [1] CRAN (R 4.4.0)
##  rsample      * 1.2.1      2024-03-25 [1] CRAN (R 4.4.0)
##  rstudioapi     0.16.0     2024-03-24 [1] CRAN (R 4.4.0)
##  sass           0.4.9      2024-03-15 [1] CRAN (R 4.4.0)
##  scales       * 1.3.0      2023-11-28 [1] CRAN (R 4.4.0)
##  sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.4.0)
##  shiny          1.8.1.1    2024-04-02 [1] CRAN (R 4.4.0)
##  stringi        1.8.4      2024-05-06 [1] CRAN (R 4.4.0)
##  stringr      * 1.5.1      2023-11-14 [1] CRAN (R 4.4.0)
##  survival       3.5-8      2024-02-14 [2] CRAN (R 4.4.0)
##  tibble       * 3.2.1      2023-03-20 [1] CRAN (R 4.4.0)
##  tidymodels   * 1.2.0      2024-03-25 [1] CRAN (R 4.4.0)
##  tidyr        * 1.3.1      2024-01-24 [1] CRAN (R 4.4.0)
##  tidyselect     1.2.1      2024-03-11 [1] CRAN (R 4.4.0)
##  tidyverse    * 2.0.0      2023-02-22 [1] CRAN (R 4.4.0)
##  timechange     0.3.0      2024-01-18 [1] CRAN (R 4.4.0)
##  timeDate       4032.109   2023-12-14 [1] CRAN (R 4.4.0)
##  tune         * 1.2.1      2024-04-18 [1] CRAN (R 4.4.0)
##  tzdb           0.4.0      2023-05-12 [1] CRAN (R 4.4.0)
##  urlchecker     1.0.1      2021-11-30 [1] CRAN (R 4.4.0)
##  usethis        2.2.3      2024-02-19 [1] CRAN (R 4.4.0)
##  utf8           1.2.4      2023-10-22 [1] CRAN (R 4.4.0)
##  vctrs          0.6.5      2023-12-01 [1] CRAN (R 4.4.0)
##  withr          3.0.0      2024-01-16 [1] CRAN (R 4.4.0)
##  workflows    * 1.1.4      2024-02-19 [1] CRAN (R 4.4.0)
##  workflowsets * 1.1.0      2024-03-21 [1] CRAN (R 4.4.0)
##  xfun           0.46       2024-07-18 [1] CRAN (R 4.4.1)
##  xml2           1.3.6      2023-12-04 [1] CRAN (R 4.4.0)
##  xtable         1.8-4      2019-04-21 [1] CRAN (R 4.4.0)
##  yaml           2.3.9      2024-07-05 [1] CRAN (R 4.4.1)
##  yardstick    * 1.3.1      2024-03-21 [1] CRAN (R 4.4.0)
## 
##  [1] C:/Users/wayne/AppData/Local/R/win-library/4.4
##  [2] C:/Program Files/R/R-4.4.0/library
## 
## ──────────────────────────────────────────────────────────────────────────────