Objective

According to IBM, logistic regression (also known as logit model) is often used for classification and predictive analytics. Logistic regression estimates the probability of an event occurring based on a given data set of independent variables. Since the outcome is a probability, the dependent variable is bounded between 0 and 1.

This vignette explores overall satisfaction with service quality. Service quality is critical for business success. It is essential for attracting new customers and retaining existing customers. This vignette demonstrates a multivariate binomial logistic regression with a binary outcome variable (overall level of satisfaction) and several explanatory variables (service quality measures). The data set contains observations from 395 respondents in a business-to-business (B2B) relationship. Gathered data using a custom-designed survey instrument based on the SERVQUAL theoretical framework. SERVQUAL consists of five dimensions described as follows:

  • Tangibles – the appearance of physical facilities, equipment, personnel and communication materials
  • Assurance – knowledge and ability to inspire trust and confidence
  • Reliability – ability to perform service dependably and accurately
  • Responsiveness – willingness to help and provide prompt service
  • Empathy – providing caring and individualised attention.

This vignette has two objectives. First, model and identify statistically significant relationships between the outcome and explanatory variables. Second, predict outcomes and evaluate the accuracy of those predictions.

Workflow

The raw data set was wrangled and tidied before processing. Since this was a logistic regression, the outcome variable, a seven-point Likert scale, was replaced with a binary variable. Conducted a brief exploratory analysis comprising a statistical summary and visualisations to understand the variables.

Proceeded to conduct a multivariate binomial logistic regression on the service quality data set, identifying the significance of explanatory variables (Model 1). Explored the most significant explanatory variable in more detail by conducting a simple binomial logistic regression and visualising the result (Model 2).

In the final section of this vignette, predicted outcomes with Model 1. Evaluated the model’s prediction performance with a confusion matrix heatmap, model fit statistics and ROC curve.

Results

1. Explore variables

Before building the logit model, the outcome variable (overall satisfaction) was transformed from an interval variable to a binary variable. The following tables show the conversion of the outcome variable from a seven-point Likert scale (Table 1) to a binary scale (Table 2) with corresponding frequencies. Scale measures recording dissatisfaction were coded as “0”, and measures recording satisfaction were coded as “1”. The neutral measure on the Likert scale was dropped from the binary scale.

Table 1 Original seven-point Likert scale
Overall level of satisfaction Freq
Very Dissatisfied 3
Dissatisfied 15
Partially Dissatisfied 16
Neutral 29
Partially Satisfied 39
Satisfied 245
Very Satisfied 77
Table 2 New binary scale for logistic regression
Overall level of satisfaction Freq
0 34
1 361

Table 3 is a statistical summary of the five explanatory variables.

Table 3 Statistical summary of service quality measures
variable n mean sd median trimmed mad min max range skew kurtosis se
tangibles 395 5.99 0.86 6.00 6.11 0.74 2.00 7.00 5.00 −1.80 4.83 0.04
assurance 395 5.62 1.03 6.00 5.72 0.74 2.00 7.00 5.00 −1.06 0.76 0.05
reliability 395 5.54 1.28 6.00 5.73 0.74 1.00 7.00 6.00 −1.34 1.29 0.06
responsiveness 395 5.40 1.35 6.00 5.58 1.48 1.00 7.00 6.00 −1.10 0.77 0.07
empathy 395 5.48 1.14 5.50 5.58 0.74 1.00 7.00 6.00 −0.93 1.11 0.06

Chart 1 box plot shows the distribution of the five explanatory variables. All explanatory variables indicate favourable levels of service quality.

Chart 2 pairs plot compares the relationship between explanatory variables and the binary outcome variable, overall satisfaction.

2. Build and fit models

2.1 Model 1 Multivariate logistic regression

Model 1 was formulated to include all five explanatory service quality measures. Output 1 presents the fit statistics for Model 1.

Output 1 Results of multivariate logistic regression


Call:
glm(formula = overall_satisfaction ~ ., family = binomial(link = "logit"), 
    data = b2b_binomial_df)

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -11.3168     2.4099  -4.696 2.65e-06 ***
tangibles        0.2753     0.3735   0.737 0.461099    
assurance        0.3168     0.4583   0.691 0.489364    
reliability      0.5265     0.3347   1.573 0.115740    
responsiveness   1.4966     0.3978   3.762 0.000169 ***
empathy          0.5142     0.4085   1.259 0.208082    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 231.757  on 394  degrees of freedom
Residual deviance:  62.493  on 389  degrees of freedom
AIC: 74.493

Number of Fisher Scoring iterations: 8

In place of the coefficient of determination (R2) as a measure of fit, a pseudo-R2 value is adopted when the outcome variable is nominal or ordinal. There are several variants of pseudo-R2. Table 4 shows pseudo-R2 for Model 1, ranging between 0.73035 to 0.8112 for selected variants.

Table 4 Pseudo-R2 variants for Model 1
variant pseudo-R2
McFadden 0.7303
Nagelkerke 0.7852
VeallZimmermann 0.8112
McKelveyZavoina 0.7879

Output 1 highlights “responsiveness” as highly significant and associated with overall satisfaction.

2.2 Model 2 Univariate logistic regression

Based on the findings of Model 1, Model 2 is a univariate logistic regression between the outcome variable and the explanatory variable, responsiveness. Output 2 shows fit statistics for Model 2.

Output 2 Results of univariate logistic regression


Call:
glm(formula = overall_satisfaction ~ responsiveness, family = binomial(link = "logit"), 
    data = b2b_binomial_df)

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)      -6.667      1.151  -5.795 6.84e-09 ***
responsiveness    2.213      0.320   6.916 4.66e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 231.76  on 394  degrees of freedom
Residual deviance:  76.00  on 393  degrees of freedom
AIC: 80

Number of Fisher Scoring iterations: 8

Chart 3 visualises Model 2, the univariate logistic regression. Chart 3 illustrates low levels of responsiveness associated with client dissatisfaction. In contrast, high levels of responsiveness are associated with a high probability of client satisfaction.

Table 5 shows pseudo-R2 for responsiveness, and the overall level of client satisfaction ranges between 0.6721 to 0.7648 for the different variants.

Table 5 Pseudo-R2 variants for Model 2
variant pseudo-R2
McFadden 0.6721
Nagelkerke 0.7342
VeallZimmermann 0.7648
McKelveyZavoina 0.7288

3. Prediction and evaluation

3.1 Model 1 Prediction

Implemented a fitting function for Model 1 to predict overall satisfaction for each observation in the data set. Table 6 shows the actual overall satisfaction compared to the predicted overall satisfaction in a small sample of cases.

Table 6 Sample of Model 1 predictions
tangibles assurance reliability responsiveness empathy overall_satisfaction prediction
7.0 6.5 6.5 6.5 7.0 1 0.9999
4.5 4.0 6.0 5.5 4.5 1 0.9926
6.5 5.5 4.5 2.5 3.5 1 0.5314
6.0 6.0 5.5 6.0 6.0 1 0.9993
7.0 7.0 7.0 7.0 7.0 1 1.0000
6.0 6.0 6.0 5.0 5.5 1 0.9967
6.0 5.5 5.0 5.5 3.0 1 0.9888
3.0 4.0 4.0 1.5 2.5 0 0.0269
6.0 6.0 6.0 4.5 4.0 1 0.9850
6.0 5.0 3.0 2.0 4.0 0 0.1898
6.0 6.0 6.0 5.5 5.0 1 0.9980
5.0 5.0 5.0 5.0 5.0 1 0.9870
4.5 3.0 2.0 2.0 4.5 0 0.0591
2.0 3.5 3.0 2.5 3.5 0 0.0733
6.5 4.0 6.0 5.0 5.5 1 0.9946

3.2 Model 1 Evaluation

The predictive performance of Model 1 was evaluated with a confusion matrix, model statistics and ROC curve.

Chart 4 confusion matrix summarises predictions by categorising and comparing predicted against the actual response for client satisfaction. The confusion matrix shows good performance for Model 1, recording 97.7 per cent accuracy (true positive and true negative). False positive (top left) and false negative (bottom right) predictions account for the remaining 2.3 per cent.

Table 7 summarises Model 1 prediction performance.

Table 7 Summary of Model 1 prediction metrics
.metric .estimator .estimate
accuracy binary 0.9772
kap binary 0.8492
sens binary 0.9917
spec binary 0.8235
ppv binary 0.9835
npv binary 0.9032
mcc binary 0.8502
j_index binary 0.8152
bal_accuracy binary 0.9076
detection_prevalence binary 0.9215
precision binary 0.9835
recall binary 0.9917
f_meas binary 0.9876

The ROC curve (receiver operating characteristic curve) plots the true positive rate (sensitivity) against the false positive rate (specificity) at all classification thresholds. AUC (area under the curve) measures the entire two-dimensional area underneath the ROC curve. It evaluates how well a logistic regression model classifies positive and negative outcomes at every possible threshold. An AUC from 0.9 to 1 is regarded as “A” grade in classification performance. Chart 5 illustrates ROC curve for Model 1 with an AUC of 0.985.


Reference:

Data was gathered using a custom-designed survey instrument based on the SERVQUAL theoretical framework. The SERVQUAL methodology is documented in Delivering Quality Service: Balancing Customer Perceptions and Expectations by Zeithaml, Parasuraman and Berry.


Session information and package update

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.0 (2024-04-24 ucrt)
##  os       Windows 11 x64 (build 22631)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  English_Australia.utf8
##  ctype    English_Australia.utf8
##  tz       Australia/Brisbane
##  date     2024-07-30
##  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package      * version    date (UTC) lib source
##  backports      1.5.0      2024-05-23 [1] CRAN (R 4.4.0)
##  boot           1.3-30     2024-02-26 [2] CRAN (R 4.4.0)
##  broom        * 1.0.6      2024-05-17 [1] CRAN (R 4.4.0)
##  bslib          0.7.0      2024-03-29 [1] CRAN (R 4.4.0)
##  cachem         1.1.0      2024-05-16 [1] CRAN (R 4.4.0)
##  cellranger     1.1.0      2016-07-27 [1] CRAN (R 4.4.0)
##  checkmate      2.3.1      2023-12-04 [1] CRAN (R 4.4.0)
##  class          7.3-22     2023-05-03 [2] CRAN (R 4.4.0)
##  cli            3.6.3      2024-06-21 [1] CRAN (R 4.4.1)
##  codetools      0.2-20     2024-03-31 [2] CRAN (R 4.4.0)
##  colorspace     2.1-0      2023-01-23 [1] CRAN (R 4.4.1)
##  cvms         * 1.6.1      2024-02-27 [1] CRAN (R 4.4.0)
##  data.table   * 1.15.4     2024-03-30 [1] CRAN (R 4.4.0)
##  DescTools    * 0.99.54    2024-02-03 [1] CRAN (R 4.4.0)
##  devtools       2.4.5      2022-10-11 [1] CRAN (R 4.4.0)
##  dials        * 1.2.1      2024-02-22 [1] CRAN (R 4.4.0)
##  DiceDesign     1.10       2023-12-07 [1] CRAN (R 4.4.0)
##  digest         0.6.36     2024-06-23 [1] CRAN (R 4.4.1)
##  dplyr        * 1.1.4      2023-11-17 [1] CRAN (R 4.4.0)
##  e1071          1.7-14     2023-12-06 [1] CRAN (R 4.4.0)
##  ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.4.0)
##  evaluate       0.24.0     2024-06-10 [1] CRAN (R 4.4.0)
##  Exact          3.3        2024-07-21 [1] CRAN (R 4.4.1)
##  expm           0.999-9    2024-01-11 [1] CRAN (R 4.4.0)
##  fansi          1.0.6      2023-12-08 [1] CRAN (R 4.4.0)
##  farver         2.1.2      2024-05-13 [1] CRAN (R 4.4.0)
##  fastmap        1.2.0      2024-05-15 [1] CRAN (R 4.4.0)
##  fontawesome    0.5.2      2023-08-19 [1] CRAN (R 4.4.0)
##  forcats      * 1.0.0      2023-01-29 [1] CRAN (R 4.4.0)
##  foreach        1.5.2      2022-02-02 [1] CRAN (R 4.4.0)
##  fs             1.6.4      2024-04-25 [1] CRAN (R 4.4.0)
##  furrr          0.3.1      2022-08-15 [1] CRAN (R 4.4.0)
##  future         1.33.2     2024-03-26 [1] CRAN (R 4.4.0)
##  future.apply   1.11.2     2024-03-28 [1] CRAN (R 4.4.0)
##  generics       0.1.3      2022-07-05 [1] CRAN (R 4.4.0)
##  GGally       * 2.2.1      2024-02-14 [1] CRAN (R 4.4.0)
##  ggplot2      * 3.5.1      2024-04-23 [1] CRAN (R 4.4.0)
##  ggstats        0.6.0      2024-04-05 [1] CRAN (R 4.4.0)
##  gld            2.6.6      2022-10-23 [1] CRAN (R 4.4.0)
##  globals        0.16.3     2024-03-08 [1] CRAN (R 4.4.0)
##  glue           1.7.0      2024-01-09 [1] CRAN (R 4.4.0)
##  gower          1.0.1      2022-12-22 [1] CRAN (R 4.4.0)
##  GPfit          1.0-8      2019-02-08 [1] CRAN (R 4.4.0)
##  gt           * 0.11.0     2024-07-09 [1] CRAN (R 4.4.1)
##  gtable         0.3.5      2024-04-22 [1] CRAN (R 4.4.0)
##  gtExtras     * 0.5.0      2023-09-15 [1] CRAN (R 4.4.0)
##  hardhat        1.4.0      2024-06-02 [1] CRAN (R 4.4.0)
##  here         * 1.0.1      2020-12-13 [1] CRAN (R 4.4.0)
##  highr          0.11       2024-05-26 [1] CRAN (R 4.4.0)
##  hms            1.1.3      2023-03-21 [1] CRAN (R 4.4.0)
##  htmltools      0.5.8.1    2024-04-04 [1] CRAN (R 4.4.0)
##  htmlwidgets    1.6.4      2023-12-06 [1] CRAN (R 4.4.0)
##  httpuv         1.6.15     2024-03-26 [1] CRAN (R 4.4.0)
##  httr           1.4.7      2023-08-15 [1] CRAN (R 4.4.0)
##  infer        * 1.0.7      2024-03-25 [1] CRAN (R 4.4.0)
##  ipred          0.9-15     2024-07-18 [1] CRAN (R 4.4.1)
##  iterators      1.0.14     2022-02-05 [1] CRAN (R 4.4.0)
##  jquerylib      0.1.4      2021-04-26 [1] CRAN (R 4.4.0)
##  jsonlite       1.8.8      2023-12-04 [1] CRAN (R 4.4.0)
##  knitr          1.48       2024-07-07 [1] CRAN (R 4.4.1)
##  labeling       0.4.3      2023-08-29 [1] CRAN (R 4.4.0)
##  later          1.3.2      2023-12-06 [1] CRAN (R 4.4.0)
##  lattice        0.22-6     2024-03-20 [2] CRAN (R 4.4.0)
##  lava           1.8.0      2024-03-05 [1] CRAN (R 4.4.0)
##  lhs            1.2.0      2024-06-30 [1] CRAN (R 4.4.1)
##  lifecycle      1.0.4      2023-11-07 [1] CRAN (R 4.4.0)
##  listenv        0.9.1      2024-01-29 [1] CRAN (R 4.4.0)
##  lmom           3.0        2023-08-29 [1] CRAN (R 4.4.0)
##  lubridate    * 1.9.3      2023-09-27 [1] CRAN (R 4.4.0)
##  magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.4.0)
##  MASS           7.3-60.2   2024-04-24 [2] local
##  Matrix         1.7-0      2024-03-22 [2] CRAN (R 4.4.0)
##  memoise        2.0.1      2021-11-26 [1] CRAN (R 4.4.0)
##  mgcv           1.9-1      2023-12-21 [2] CRAN (R 4.4.0)
##  mime           0.12       2021-09-28 [1] CRAN (R 4.4.0)
##  miniUI         0.1.1.1    2018-05-18 [1] CRAN (R 4.4.0)
##  mnormt         2.1.1      2022-09-26 [1] CRAN (R 4.4.0)
##  modeldata    * 1.4.0      2024-06-19 [1] CRAN (R 4.4.1)
##  munsell        0.5.1      2024-04-01 [1] CRAN (R 4.4.0)
##  mvtnorm        1.2-5      2024-05-21 [1] CRAN (R 4.4.0)
##  nlme           3.1-164    2023-11-27 [2] CRAN (R 4.4.0)
##  nnet           7.3-19     2023-05-03 [2] CRAN (R 4.4.0)
##  paletteer      1.6.0      2024-01-21 [1] CRAN (R 4.4.0)
##  parallelly     1.37.1     2024-02-29 [1] CRAN (R 4.4.0)
##  parsnip      * 1.2.1      2024-03-22 [1] CRAN (R 4.4.0)
##  pillar         1.9.0      2023-03-22 [1] CRAN (R 4.4.0)
##  pkgbuild       1.4.4      2024-03-17 [1] CRAN (R 4.4.0)
##  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.4.0)
##  pkgload        1.4.0      2024-06-28 [1] CRAN (R 4.4.1)
##  plyr           1.8.9      2023-10-02 [1] CRAN (R 4.4.0)
##  pROC         * 1.18.5     2023-11-01 [1] CRAN (R 4.4.0)
##  prodlim        2024.06.25 2024-06-24 [1] CRAN (R 4.4.1)
##  profvis        0.3.8      2023-05-02 [1] CRAN (R 4.4.0)
##  promises       1.3.0      2024-04-05 [1] CRAN (R 4.4.0)
##  proxy          0.4-27     2022-06-09 [1] CRAN (R 4.4.0)
##  psych        * 2.4.6.26   2024-06-27 [1] CRAN (R 4.4.1)
##  purrr        * 1.0.2      2023-08-10 [1] CRAN (R 4.4.0)
##  R6             2.5.1      2021-08-19 [1] CRAN (R 4.4.0)
##  RColorBrewer   1.1-3      2022-04-03 [1] CRAN (R 4.4.0)
##  Rcpp           1.0.13     2024-07-17 [1] CRAN (R 4.4.1)
##  readr        * 2.1.5      2024-01-10 [1] CRAN (R 4.4.0)
##  readxl         1.4.3      2023-07-06 [1] CRAN (R 4.4.0)
##  recipes      * 1.1.0      2024-07-04 [1] CRAN (R 4.4.1)
##  rematch2       2.1.2      2020-05-01 [1] CRAN (R 4.4.0)
##  remotes        2.5.0      2024-03-17 [1] CRAN (R 4.4.0)
##  rlang          1.1.4      2024-06-04 [1] CRAN (R 4.4.0)
##  rmarkdown      2.27       2024-05-17 [1] CRAN (R 4.4.0)
##  rootSolve      1.8.2.4    2023-09-21 [1] CRAN (R 4.4.0)
##  rpart          4.1.23     2023-12-05 [2] CRAN (R 4.4.0)
##  rprojroot      2.0.4      2023-11-05 [1] CRAN (R 4.4.0)
##  rsample      * 1.2.1      2024-03-25 [1] CRAN (R 4.4.0)
##  rstudioapi     0.16.0     2024-03-24 [1] CRAN (R 4.4.0)
##  sass           0.4.9      2024-03-15 [1] CRAN (R 4.4.0)
##  scales       * 1.3.0      2023-11-28 [1] CRAN (R 4.4.0)
##  sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.4.0)
##  shiny          1.8.1.1    2024-04-02 [1] CRAN (R 4.4.0)
##  stringi        1.8.4      2024-05-06 [1] CRAN (R 4.4.0)
##  stringr      * 1.5.1      2023-11-14 [1] CRAN (R 4.4.0)
##  survival       3.5-8      2024-02-14 [2] CRAN (R 4.4.0)
##  tibble       * 3.2.1      2023-03-20 [1] CRAN (R 4.4.0)
##  tidymodels   * 1.2.0      2024-03-25 [1] CRAN (R 4.4.0)
##  tidyr        * 1.3.1      2024-01-24 [1] CRAN (R 4.4.0)
##  tidyselect     1.2.1      2024-03-11 [1] CRAN (R 4.4.0)
##  tidyverse    * 2.0.0      2023-02-22 [1] CRAN (R 4.4.0)
##  timechange     0.3.0      2024-01-18 [1] CRAN (R 4.4.0)
##  timeDate       4032.109   2023-12-14 [1] CRAN (R 4.4.0)
##  tune         * 1.2.1      2024-04-18 [1] CRAN (R 4.4.0)
##  tzdb           0.4.0      2023-05-12 [1] CRAN (R 4.4.0)
##  urlchecker     1.0.1      2021-11-30 [1] CRAN (R 4.4.0)
##  usethis        2.2.3      2024-02-19 [1] CRAN (R 4.4.0)
##  utf8           1.2.4      2023-10-22 [1] CRAN (R 4.4.0)
##  vctrs          0.6.5      2023-12-01 [1] CRAN (R 4.4.0)
##  withr          3.0.0      2024-01-16 [1] CRAN (R 4.4.0)
##  workflows    * 1.1.4      2024-02-19 [1] CRAN (R 4.4.0)
##  workflowsets * 1.1.0      2024-03-21 [1] CRAN (R 4.4.0)
##  xfun           0.46       2024-07-18 [1] CRAN (R 4.4.1)
##  xml2           1.3.6      2023-12-04 [1] CRAN (R 4.4.0)
##  xtable         1.8-4      2019-04-21 [1] CRAN (R 4.4.0)
##  yaml           2.3.9      2024-07-05 [1] CRAN (R 4.4.1)
##  yardstick    * 1.3.1      2024-03-21 [1] CRAN (R 4.4.0)
## 
##  [1] C:/Users/wayne/AppData/Local/R/win-library/4.4
##  [2] C:/Program Files/R/R-4.4.0/library
## 
## ──────────────────────────────────────────────────────────────────────────────