Multivariate Binomial Logistic Regression
According to IBM, logistic regression (also known as logit model) is often used for classification and predictive analytics. Logistic regression estimates the probability of an event occurring based on a given data set of independent variables. Since the outcome is a probability, the dependent variable is bounded between 0 and 1.
This vignette explores overall satisfaction with service quality. Service quality is critical for business success. It is essential for attracting new customers and retaining existing customers. This vignette demonstrates a multivariate binomial logistic regression with a binary outcome variable (overall level of satisfaction) and several explanatory variables (service quality measures). The data set contains observations from 395 respondents in a business-to-business (B2B) relationship. Gathered data using a custom-designed survey instrument based on the SERVQUAL theoretical framework. SERVQUAL consists of five dimensions described as follows:
This vignette has two objectives. First, model and identify statistically significant relationships between the outcome and explanatory variables. Second, predict outcomes and evaluate the accuracy of those predictions.
The raw data set was wrangled and tidied before processing. Since this was a logistic regression, the outcome variable, a seven-point Likert scale, was replaced with a binary variable. Conducted a brief exploratory analysis comprising a statistical summary and visualisations to understand the variables.
Proceeded to conduct a multivariate binomial logistic regression on the service quality data set, identifying the significance of explanatory variables (Model 1). Explored the most significant explanatory variable in more detail by conducting a simple binomial logistic regression and visualising the result (Model 2).
In the final section of this vignette, predicted outcomes with Model 1. Evaluated the model’s prediction performance with a confusion matrix heatmap, model fit statistics and ROC curve.
Before building the logit model, the outcome variable (overall satisfaction) was transformed from an interval variable to a binary variable. The following tables show the conversion of the outcome variable from a seven-point Likert scale (Table 1) to a binary scale (Table 2) with corresponding frequencies. Scale measures recording dissatisfaction were coded as “0”, and measures recording satisfaction were coded as “1”. The neutral measure on the Likert scale was dropped from the binary scale.
| Table 1 Original seven-point Likert scale | |
| Overall level of satisfaction | Freq |
|---|---|
| Very Dissatisfied | 3 |
| Dissatisfied | 15 |
| Partially Dissatisfied | 16 |
| Neutral | 29 |
| Partially Satisfied | 39 |
| Satisfied | 245 |
| Very Satisfied | 77 |
| Table 2 New binary scale for logistic regression | |
| Overall level of satisfaction | Freq |
|---|---|
| 0 | 34 |
| 1 | 361 |
Table 3 is a statistical summary of the five explanatory variables.
| Table 3 Statistical summary of service quality measures | ||||||||||||
| variable | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| tangibles | 395 | 5.99 | 0.86 | 6.00 | 6.11 | 0.74 | 2.00 | 7.00 | 5.00 | −1.80 | 4.83 | 0.04 |
| assurance | 395 | 5.62 | 1.03 | 6.00 | 5.72 | 0.74 | 2.00 | 7.00 | 5.00 | −1.06 | 0.76 | 0.05 |
| reliability | 395 | 5.54 | 1.28 | 6.00 | 5.73 | 0.74 | 1.00 | 7.00 | 6.00 | −1.34 | 1.29 | 0.06 |
| responsiveness | 395 | 5.40 | 1.35 | 6.00 | 5.58 | 1.48 | 1.00 | 7.00 | 6.00 | −1.10 | 0.77 | 0.07 |
| empathy | 395 | 5.48 | 1.14 | 5.50 | 5.58 | 0.74 | 1.00 | 7.00 | 6.00 | −0.93 | 1.11 | 0.06 |
Chart 1 box plot shows the distribution of the five explanatory variables. All explanatory variables indicate favourable levels of service quality.
Chart 2 pairs plot compares the relationship between explanatory variables and the binary outcome variable, overall satisfaction.
Model 1 was formulated to include all five explanatory service quality measures. Output 1 presents the fit statistics for Model 1.
Output 1 Results of multivariate logistic regression
Call:
glm(formula = overall_satisfaction ~ ., family = binomial(link = "logit"),
data = b2b_binomial_df)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -11.3168 2.4099 -4.696 2.65e-06 ***
tangibles 0.2753 0.3735 0.737 0.461099
assurance 0.3168 0.4583 0.691 0.489364
reliability 0.5265 0.3347 1.573 0.115740
responsiveness 1.4966 0.3978 3.762 0.000169 ***
empathy 0.5142 0.4085 1.259 0.208082
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 231.757 on 394 degrees of freedom
Residual deviance: 62.493 on 389 degrees of freedom
AIC: 74.493
Number of Fisher Scoring iterations: 8
In place of the coefficient of determination (R2) as a measure of fit, a pseudo-R2 value is adopted when the outcome variable is nominal or ordinal. There are several variants of pseudo-R2. Table 4 shows pseudo-R2 for Model 1, ranging between 0.73035 to 0.8112 for selected variants.
| Table 4 Pseudo-R2 variants for Model 1 | |
| variant | pseudo-R2 |
|---|---|
| McFadden | 0.7303 |
| Nagelkerke | 0.7852 |
| VeallZimmermann | 0.8112 |
| McKelveyZavoina | 0.7879 |
Output 1 highlights “responsiveness” as highly significant and associated with overall satisfaction.
Based on the findings of Model 1, Model 2 is a univariate logistic regression between the outcome variable and the explanatory variable, responsiveness. Output 2 shows fit statistics for Model 2.
Output 2 Results of univariate logistic regression
Call:
glm(formula = overall_satisfaction ~ responsiveness, family = binomial(link = "logit"),
data = b2b_binomial_df)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.667 1.151 -5.795 6.84e-09 ***
responsiveness 2.213 0.320 6.916 4.66e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 231.76 on 394 degrees of freedom
Residual deviance: 76.00 on 393 degrees of freedom
AIC: 80
Number of Fisher Scoring iterations: 8
Chart 3 visualises Model 2, the univariate logistic regression. Chart 3 illustrates low levels of responsiveness associated with client dissatisfaction. In contrast, high levels of responsiveness are associated with a high probability of client satisfaction.
Table 5 shows pseudo-R2 for responsiveness, and the overall level of client satisfaction ranges between 0.6721 to 0.7648 for the different variants.
| Table 5 Pseudo-R2 variants for Model 2 | |
| variant | pseudo-R2 |
|---|---|
| McFadden | 0.6721 |
| Nagelkerke | 0.7342 |
| VeallZimmermann | 0.7648 |
| McKelveyZavoina | 0.7288 |
Implemented a fitting function for Model 1 to predict overall satisfaction for each observation in the data set. Table 6 shows the actual overall satisfaction compared to the predicted overall satisfaction in a small sample of cases.
| Table 6 Sample of Model 1 predictions | ||||||
| tangibles | assurance | reliability | responsiveness | empathy | overall_satisfaction | prediction |
|---|---|---|---|---|---|---|
| 7.0 | 6.5 | 6.5 | 6.5 | 7.0 | 1 | 0.9999 |
| 4.5 | 4.0 | 6.0 | 5.5 | 4.5 | 1 | 0.9926 |
| 6.5 | 5.5 | 4.5 | 2.5 | 3.5 | 1 | 0.5314 |
| 6.0 | 6.0 | 5.5 | 6.0 | 6.0 | 1 | 0.9993 |
| 7.0 | 7.0 | 7.0 | 7.0 | 7.0 | 1 | 1.0000 |
| 6.0 | 6.0 | 6.0 | 5.0 | 5.5 | 1 | 0.9967 |
| 6.0 | 5.5 | 5.0 | 5.5 | 3.0 | 1 | 0.9888 |
| 3.0 | 4.0 | 4.0 | 1.5 | 2.5 | 0 | 0.0269 |
| 6.0 | 6.0 | 6.0 | 4.5 | 4.0 | 1 | 0.9850 |
| 6.0 | 5.0 | 3.0 | 2.0 | 4.0 | 0 | 0.1898 |
| 6.0 | 6.0 | 6.0 | 5.5 | 5.0 | 1 | 0.9980 |
| 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 1 | 0.9870 |
| 4.5 | 3.0 | 2.0 | 2.0 | 4.5 | 0 | 0.0591 |
| 2.0 | 3.5 | 3.0 | 2.5 | 3.5 | 0 | 0.0733 |
| 6.5 | 4.0 | 6.0 | 5.0 | 5.5 | 1 | 0.9946 |
The predictive performance of Model 1 was evaluated with a confusion matrix, model statistics and ROC curve.
Chart 4 confusion matrix summarises predictions by categorising and comparing predicted against the actual response for client satisfaction. The confusion matrix shows good performance for Model 1, recording 97.7 per cent accuracy (true positive and true negative). False positive (top left) and false negative (bottom right) predictions account for the remaining 2.3 per cent.
Table 7 summarises Model 1 prediction performance.
| Table 7 Summary of Model 1 prediction metrics | ||
| .metric | .estimator | .estimate |
|---|---|---|
| accuracy | binary | 0.9772 |
| kap | binary | 0.8492 |
| sens | binary | 0.9917 |
| spec | binary | 0.8235 |
| ppv | binary | 0.9835 |
| npv | binary | 0.9032 |
| mcc | binary | 0.8502 |
| j_index | binary | 0.8152 |
| bal_accuracy | binary | 0.9076 |
| detection_prevalence | binary | 0.9215 |
| precision | binary | 0.9835 |
| recall | binary | 0.9917 |
| f_meas | binary | 0.9876 |
The ROC curve (receiver operating characteristic curve) plots the true positive rate (sensitivity) against the false positive rate (specificity) at all classification thresholds. AUC (area under the curve) measures the entire two-dimensional area underneath the ROC curve. It evaluates how well a logistic regression model classifies positive and negative outcomes at every possible threshold. An AUC from 0.9 to 1 is regarded as “A” grade in classification performance. Chart 5 illustrates ROC curve for Model 1 with an AUC of 0.985.
Reference:
Data was gathered using a custom-designed survey instrument based on the SERVQUAL theoretical framework. The SERVQUAL methodology is documented in Delivering Quality Service: Balancing Customer Perceptions and Expectations by Zeithaml, Parasuraman and Berry.
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.4.0 (2024-04-24 ucrt)
## os Windows 11 x64 (build 22631)
## system x86_64, mingw32
## ui RTerm
## language (EN)
## collate English_Australia.utf8
## ctype English_Australia.utf8
## tz Australia/Brisbane
## date 2024-07-30
## pandoc 3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## backports 1.5.0 2024-05-23 [1] CRAN (R 4.4.0)
## boot 1.3-30 2024-02-26 [2] CRAN (R 4.4.0)
## broom * 1.0.6 2024-05-17 [1] CRAN (R 4.4.0)
## bslib 0.7.0 2024-03-29 [1] CRAN (R 4.4.0)
## cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.0)
## cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.4.0)
## checkmate 2.3.1 2023-12-04 [1] CRAN (R 4.4.0)
## class 7.3-22 2023-05-03 [2] CRAN (R 4.4.0)
## cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.1)
## codetools 0.2-20 2024-03-31 [2] CRAN (R 4.4.0)
## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.4.1)
## cvms * 1.6.1 2024-02-27 [1] CRAN (R 4.4.0)
## data.table * 1.15.4 2024-03-30 [1] CRAN (R 4.4.0)
## DescTools * 0.99.54 2024-02-03 [1] CRAN (R 4.4.0)
## devtools 2.4.5 2022-10-11 [1] CRAN (R 4.4.0)
## dials * 1.2.1 2024-02-22 [1] CRAN (R 4.4.0)
## DiceDesign 1.10 2023-12-07 [1] CRAN (R 4.4.0)
## digest 0.6.36 2024-06-23 [1] CRAN (R 4.4.1)
## dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.4.0)
## e1071 1.7-14 2023-12-06 [1] CRAN (R 4.4.0)
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.0)
## evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.4.0)
## Exact 3.3 2024-07-21 [1] CRAN (R 4.4.1)
## expm 0.999-9 2024-01-11 [1] CRAN (R 4.4.0)
## fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.0)
## farver 2.1.2 2024-05-13 [1] CRAN (R 4.4.0)
## fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
## fontawesome 0.5.2 2023-08-19 [1] CRAN (R 4.4.0)
## forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.4.0)
## foreach 1.5.2 2022-02-02 [1] CRAN (R 4.4.0)
## fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0)
## furrr 0.3.1 2022-08-15 [1] CRAN (R 4.4.0)
## future 1.33.2 2024-03-26 [1] CRAN (R 4.4.0)
## future.apply 1.11.2 2024-03-28 [1] CRAN (R 4.4.0)
## generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.0)
## GGally * 2.2.1 2024-02-14 [1] CRAN (R 4.4.0)
## ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.4.0)
## ggstats 0.6.0 2024-04-05 [1] CRAN (R 4.4.0)
## gld 2.6.6 2022-10-23 [1] CRAN (R 4.4.0)
## globals 0.16.3 2024-03-08 [1] CRAN (R 4.4.0)
## glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0)
## gower 1.0.1 2022-12-22 [1] CRAN (R 4.4.0)
## GPfit 1.0-8 2019-02-08 [1] CRAN (R 4.4.0)
## gt * 0.11.0 2024-07-09 [1] CRAN (R 4.4.1)
## gtable 0.3.5 2024-04-22 [1] CRAN (R 4.4.0)
## gtExtras * 0.5.0 2023-09-15 [1] CRAN (R 4.4.0)
## hardhat 1.4.0 2024-06-02 [1] CRAN (R 4.4.0)
## here * 1.0.1 2020-12-13 [1] CRAN (R 4.4.0)
## highr 0.11 2024-05-26 [1] CRAN (R 4.4.0)
## hms 1.1.3 2023-03-21 [1] CRAN (R 4.4.0)
## htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
## htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.0)
## httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.4.0)
## httr 1.4.7 2023-08-15 [1] CRAN (R 4.4.0)
## infer * 1.0.7 2024-03-25 [1] CRAN (R 4.4.0)
## ipred 0.9-15 2024-07-18 [1] CRAN (R 4.4.1)
## iterators 1.0.14 2022-02-05 [1] CRAN (R 4.4.0)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.4.0)
## jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.4.0)
## knitr 1.48 2024-07-07 [1] CRAN (R 4.4.1)
## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.4.0)
## later 1.3.2 2023-12-06 [1] CRAN (R 4.4.0)
## lattice 0.22-6 2024-03-20 [2] CRAN (R 4.4.0)
## lava 1.8.0 2024-03-05 [1] CRAN (R 4.4.0)
## lhs 1.2.0 2024-06-30 [1] CRAN (R 4.4.1)
## lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
## listenv 0.9.1 2024-01-29 [1] CRAN (R 4.4.0)
## lmom 3.0 2023-08-29 [1] CRAN (R 4.4.0)
## lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.4.0)
## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0)
## MASS 7.3-60.2 2024-04-24 [2] local
## Matrix 1.7-0 2024-03-22 [2] CRAN (R 4.4.0)
## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.0)
## mgcv 1.9-1 2023-12-21 [2] CRAN (R 4.4.0)
## mime 0.12 2021-09-28 [1] CRAN (R 4.4.0)
## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0)
## mnormt 2.1.1 2022-09-26 [1] CRAN (R 4.4.0)
## modeldata * 1.4.0 2024-06-19 [1] CRAN (R 4.4.1)
## munsell 0.5.1 2024-04-01 [1] CRAN (R 4.4.0)
## mvtnorm 1.2-5 2024-05-21 [1] CRAN (R 4.4.0)
## nlme 3.1-164 2023-11-27 [2] CRAN (R 4.4.0)
## nnet 7.3-19 2023-05-03 [2] CRAN (R 4.4.0)
## paletteer 1.6.0 2024-01-21 [1] CRAN (R 4.4.0)
## parallelly 1.37.1 2024-02-29 [1] CRAN (R 4.4.0)
## parsnip * 1.2.1 2024-03-22 [1] CRAN (R 4.4.0)
## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.0)
## pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.4.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0)
## pkgload 1.4.0 2024-06-28 [1] CRAN (R 4.4.1)
## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.4.0)
## pROC * 1.18.5 2023-11-01 [1] CRAN (R 4.4.0)
## prodlim 2024.06.25 2024-06-24 [1] CRAN (R 4.4.1)
## profvis 0.3.8 2023-05-02 [1] CRAN (R 4.4.0)
## promises 1.3.0 2024-04-05 [1] CRAN (R 4.4.0)
## proxy 0.4-27 2022-06-09 [1] CRAN (R 4.4.0)
## psych * 2.4.6.26 2024-06-27 [1] CRAN (R 4.4.1)
## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.4.0)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0)
## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.4.0)
## Rcpp 1.0.13 2024-07-17 [1] CRAN (R 4.4.1)
## readr * 2.1.5 2024-01-10 [1] CRAN (R 4.4.0)
## readxl 1.4.3 2023-07-06 [1] CRAN (R 4.4.0)
## recipes * 1.1.0 2024-07-04 [1] CRAN (R 4.4.1)
## rematch2 2.1.2 2020-05-01 [1] CRAN (R 4.4.0)
## remotes 2.5.0 2024-03-17 [1] CRAN (R 4.4.0)
## rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.0)
## rmarkdown 2.27 2024-05-17 [1] CRAN (R 4.4.0)
## rootSolve 1.8.2.4 2023-09-21 [1] CRAN (R 4.4.0)
## rpart 4.1.23 2023-12-05 [2] CRAN (R 4.4.0)
## rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.4.0)
## rsample * 1.2.1 2024-03-25 [1] CRAN (R 4.4.0)
## rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0)
## sass 0.4.9 2024-03-15 [1] CRAN (R 4.4.0)
## scales * 1.3.0 2023-11-28 [1] CRAN (R 4.4.0)
## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
## shiny 1.8.1.1 2024-04-02 [1] CRAN (R 4.4.0)
## stringi 1.8.4 2024-05-06 [1] CRAN (R 4.4.0)
## stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.4.0)
## survival 3.5-8 2024-02-14 [2] CRAN (R 4.4.0)
## tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.4.0)
## tidymodels * 1.2.0 2024-03-25 [1] CRAN (R 4.4.0)
## tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.4.0)
## tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0)
## tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.4.0)
## timechange 0.3.0 2024-01-18 [1] CRAN (R 4.4.0)
## timeDate 4032.109 2023-12-14 [1] CRAN (R 4.4.0)
## tune * 1.2.1 2024-04-18 [1] CRAN (R 4.4.0)
## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.4.0)
## urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.4.0)
## usethis 2.2.3 2024-02-19 [1] CRAN (R 4.4.0)
## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0)
## vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0)
## withr 3.0.0 2024-01-16 [1] CRAN (R 4.4.0)
## workflows * 1.1.4 2024-02-19 [1] CRAN (R 4.4.0)
## workflowsets * 1.1.0 2024-03-21 [1] CRAN (R 4.4.0)
## xfun 0.46 2024-07-18 [1] CRAN (R 4.4.1)
## xml2 1.3.6 2023-12-04 [1] CRAN (R 4.4.0)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.0)
## yaml 2.3.9 2024-07-05 [1] CRAN (R 4.4.1)
## yardstick * 1.3.1 2024-03-21 [1] CRAN (R 4.4.0)
##
## [1] C:/Users/wayne/AppData/Local/R/win-library/4.4
## [2] C:/Program Files/R/R-4.4.0/library
##
## ──────────────────────────────────────────────────────────────────────────────