Machine Learning Multivariate Regression
This vignette demonstrates machine learning linear regression to predict client satisfaction with service quality. Service quality is critical for business success. It is essential for attracting new customers and retaining existing customers.
The data set contains observations from 424 clients in a business-to-business (B2B) relationship. Data was gathered on a custom-designed survey instrument based on the SERVQUAL theoretical framework. SERVQUAL consists of five dimensions described as follows:
In addition to the above dimensions, and to assist with statistical modelling, also measured each client’s overall satisfaction with service quality.
The raw data set was wrangled and tidied before processing. Conducted a brief exploratory analysis comprising a statistical summary, distribution visualisation and correlation analysis to understand the variables.
Building and implementing the machine learning model commenced by randomly splitting the data into training and testing sets at a 75:25 percentage ratio or 3:1 split. Implemented stratified sampling to ensure that all levels of the response variable in the test set had roughly equivalent proportions of values to the training set.
Then, formulated model specifications and feature engineering before fitting the model to the training data and reviewing the results.
The model was then applied to the unseen test set to predict the response variable, overall satisfaction. Evaluated the model’s prediction performance on the test set. Finally, estimated the importance of predictor variables with overall satisfaction for the test set and compared this result with the training set.
Table 1 is a statistical summary of the five explanatory variables and the response variable, overall satisfaction.
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| tangibles | 1 | 424 | 5.92 | 0.90 | 6.0 | 6.05 | 0.74 | 2 | 7 | 5 | -1.61 | 3.51 | 0.04 |
| assurance | 2 | 424 | 5.52 | 1.07 | 6.0 | 5.61 | 0.74 | 2 | 7 | 5 | -0.85 | 0.09 | 0.05 |
| reliability | 3 | 424 | 5.43 | 1.32 | 6.0 | 5.60 | 0.74 | 1 | 7 | 6 | -1.13 | 0.60 | 0.06 |
| responsiveness | 4 | 424 | 5.29 | 1.37 | 6.0 | 5.44 | 1.48 | 1 | 7 | 6 | -0.90 | 0.23 | 0.07 |
| empathy | 5 | 424 | 5.39 | 1.17 | 5.5 | 5.47 | 1.48 | 1 | 7 | 6 | -0.78 | 0.71 | 0.06 |
| overall satisfaction | 6 | 424 | 5.66 | 1.23 | 6.0 | 5.86 | 0.00 | 1 | 7 | 6 | -1.63 | 2.52 | 0.06 |
Chart 1 combination violin box plots show the distribution of the five explanatory variables and the response variable. The box plots show favourable levels of service quality for each of the explanatory variables and the response variable.
Chart 2 correlation heatmap shows all variables positively correlated with coefficients ranging from 0.55 to 0.85. The intangible measures are closely correlated compared to the tangible measure.
The regression model was specified with additional feature engineering to normalise predictor variables and mitigate skewness. The model was fitted to the training set, resulting in fit statistics summarised in Table 2.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 5.69 | 0.04 | 136.95 | 0.00 |
| responsiveness | 0.46 | 0.10 | 4.86 | 0.00 |
| reliability | 0.27 | 0.09 | 3.00 | 0.00 |
| empathy | 0.24 | 0.08 | 2.94 | 0.00 |
| assurance | 0.17 | 0.08 | 2.06 | 0.04 |
| tangibles | -0.02 | 0.07 | -0.35 | 0.73 |
Table 3 summarises model performance metrics on the training set. The adjusted R2 on the training set data was 0.6379.
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.6436 | 0.6379 | 0.7397 | 112.3364 | 0 | 5 | -351.2092 | 716.4183 | 742.7307 | 170.1798 | 311 | 317 |
Chart 3 illustrates the importance of explanatory variables modelled on the randomly split training data.
The model fitted on the training data was applied to the unseen test data to predict the response variable, overall satisfaction. Table 4 compares actual and predicted overall satisfaction (.pred) in a small sample of observations extracted from the test set.
| tangibles | assurance | reliability | responsiveness | empathy | overall_satisfaction | .pred |
|---|---|---|---|---|---|---|
| 7.0 | 6.5 | 6.5 | 6.5 | 7.0 | 7 | 6.86 |
| 6.5 | 5.5 | 4.5 | 2.5 | 3.5 | 6 | 4.20 |
| 7.0 | 7.0 | 7.0 | 7.0 | 7.0 | 7 | 7.44 |
| 6.0 | 4.0 | 3.5 | 2.0 | 4.0 | 4 | 3.93 |
| 2.0 | 3.5 | 3.0 | 2.5 | 3.5 | 3 | 3.90 |
| 6.0 | 5.0 | 2.0 | 1.5 | 4.5 | 2 | 3.73 |
| 6.5 | 6.5 | 6.5 | 6.5 | 6.5 | 7 | 6.71 |
| 6.0 | 4.5 | 4.5 | 4.5 | 4.0 | 6 | 4.69 |
| 4.0 | 4.0 | 3.5 | 4.0 | 4.5 | 4 | 4.51 |
| 6.0 | 6.0 | 6.5 | 7.0 | 7.0 | 7 | 7.03 |
Table 5 summarises regression metrics on the test set. The R2 estimate for the test set was 0.6999, higher than but similar to the R2 for the training set (0.6379 adjusted).
| .metric | .estimator | .estimate |
|---|---|---|
| rmse | standard | 0.6825 |
| rsq | standard | 0.6999 |
Chart 4 visualises the actual level of satisfaction for the test set compared to the model prediction for the test set. The dotted line through the origin (x=y) represents the perfect model where all predicted values would equal the actual value in the test set. The chart shows that the model is more accurate at predicting higher overall satisfaction compared to lower overall satisfaction. This is due to insufficient actual observations with low overall satisfaction compared to high overall satisfaction (see Chart 1) for the model to be fully trained across all values.
Chart 5 shows responsiveness and reliability are estimated to be the most important predictor variables in the test set explaining the overall level of client satisfaction with service quality. This is consistent with the results in the training set shown in Chart 3.
Reference:
Data was gathered using a custom-designed survey instrument based on the SERVQUAL theoretical framework. The SERVQUAL methodology is documented in Delivering Quality Service: Balancing Customer Perceptions and Expectations by Zeithaml, Parasuraman and Berry.
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.4.0 (2024-04-24 ucrt)
## os Windows 11 x64 (build 22631)
## system x86_64, mingw32
## ui RTerm
## language (EN)
## collate English_Australia.utf8
## ctype English_Australia.utf8
## tz Australia/Brisbane
## date 2024-07-30
## pandoc 3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## backports 1.5.0 2024-05-23 [1] CRAN (R 4.4.0)
## bestNormalize * 1.9.1 2023-08-18 [1] CRAN (R 4.4.0)
## broom * 1.0.6 2024-05-17 [1] CRAN (R 4.4.0)
## bslib 0.7.0 2024-03-29 [1] CRAN (R 4.4.0)
## butcher 0.3.4 2024-04-11 [1] CRAN (R 4.4.0)
## cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.0)
## class 7.3-22 2023-05-03 [2] CRAN (R 4.4.0)
## cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.1)
## codetools 0.2-20 2024-03-31 [2] CRAN (R 4.4.0)
## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.4.1)
## data.table * 1.15.4 2024-03-30 [1] CRAN (R 4.4.0)
## devtools 2.4.5 2022-10-11 [1] CRAN (R 4.4.0)
## dials * 1.2.1 2024-02-22 [1] CRAN (R 4.4.0)
## DiceDesign 1.10 2023-12-07 [1] CRAN (R 4.4.0)
## digest 0.6.36 2024-06-23 [1] CRAN (R 4.4.1)
## doParallel 1.0.17 2022-02-07 [1] CRAN (R 4.4.0)
## doRNG 1.8.6 2023-01-16 [1] CRAN (R 4.4.0)
## dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.4.0)
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.0)
## evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.4.0)
## fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.0)
## farver 2.1.2 2024-05-13 [1] CRAN (R 4.4.0)
## fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
## forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.4.0)
## foreach 1.5.2 2022-02-02 [1] CRAN (R 4.4.0)
## fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0)
## furrr 0.3.1 2022-08-15 [1] CRAN (R 4.4.0)
## future 1.33.2 2024-03-26 [1] CRAN (R 4.4.0)
## future.apply 1.11.2 2024-03-28 [1] CRAN (R 4.4.0)
## generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.0)
## GGally * 2.2.1 2024-02-14 [1] CRAN (R 4.4.0)
## ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.4.0)
## ggstats 0.6.0 2024-04-05 [1] CRAN (R 4.4.0)
## globals 0.16.3 2024-03-08 [1] CRAN (R 4.4.0)
## glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0)
## gower 1.0.1 2022-12-22 [1] CRAN (R 4.4.0)
## GPfit 1.0-8 2019-02-08 [1] CRAN (R 4.4.0)
## gtable 0.3.5 2024-04-22 [1] CRAN (R 4.4.0)
## hardhat 1.4.0 2024-06-02 [1] CRAN (R 4.4.0)
## here * 1.0.1 2020-12-13 [1] CRAN (R 4.4.0)
## highr 0.11 2024-05-26 [1] CRAN (R 4.4.0)
## hms 1.1.3 2023-03-21 [1] CRAN (R 4.4.0)
## htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
## htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.0)
## httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.4.0)
## infer * 1.0.7 2024-03-25 [1] CRAN (R 4.4.0)
## ipred 0.9-15 2024-07-18 [1] CRAN (R 4.4.1)
## iterators 1.0.14 2022-02-05 [1] CRAN (R 4.4.0)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.4.0)
## jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.4.0)
## kableExtra * 1.4.0 2024-01-24 [1] CRAN (R 4.4.0)
## knitr 1.48 2024-07-07 [1] CRAN (R 4.4.1)
## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.4.0)
## later 1.3.2 2023-12-06 [1] CRAN (R 4.4.0)
## lattice 0.22-6 2024-03-20 [2] CRAN (R 4.4.0)
## lava 1.8.0 2024-03-05 [1] CRAN (R 4.4.0)
## lhs 1.2.0 2024-06-30 [1] CRAN (R 4.4.1)
## lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
## listenv 0.9.1 2024-01-29 [1] CRAN (R 4.4.0)
## lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.4.0)
## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0)
## MASS 7.3-60.2 2024-04-24 [2] local
## Matrix 1.7-0 2024-03-22 [2] CRAN (R 4.4.0)
## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.0)
## mime 0.12 2021-09-28 [1] CRAN (R 4.4.0)
## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0)
## mnormt 2.1.1 2022-09-26 [1] CRAN (R 4.4.0)
## modeldata * 1.4.0 2024-06-19 [1] CRAN (R 4.4.1)
## munsell 0.5.1 2024-04-01 [1] CRAN (R 4.4.0)
## nlme 3.1-164 2023-11-27 [2] CRAN (R 4.4.0)
## nnet 7.3-19 2023-05-03 [2] CRAN (R 4.4.0)
## nortest 1.0-4 2015-07-30 [1] CRAN (R 4.4.0)
## parallelly 1.37.1 2024-02-29 [1] CRAN (R 4.4.0)
## parsnip * 1.2.1 2024-03-22 [1] CRAN (R 4.4.0)
## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.0)
## pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.4.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0)
## pkgload 1.4.0 2024-06-28 [1] CRAN (R 4.4.1)
## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.4.0)
## prodlim 2024.06.25 2024-06-24 [1] CRAN (R 4.4.1)
## profvis 0.3.8 2023-05-02 [1] CRAN (R 4.4.0)
## promises 1.3.0 2024-04-05 [1] CRAN (R 4.4.0)
## psych * 2.4.6.26 2024-06-27 [1] CRAN (R 4.4.1)
## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.4.0)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0)
## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.4.0)
## Rcpp 1.0.13 2024-07-17 [1] CRAN (R 4.4.1)
## readr * 2.1.5 2024-01-10 [1] CRAN (R 4.4.0)
## recipes * 1.1.0 2024-07-04 [1] CRAN (R 4.4.1)
## remotes 2.5.0 2024-03-17 [1] CRAN (R 4.4.0)
## rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.0)
## rmarkdown 2.27 2024-05-17 [1] CRAN (R 4.4.0)
## rngtools 1.5.2 2021-09-20 [1] CRAN (R 4.4.0)
## rpart 4.1.23 2023-12-05 [2] CRAN (R 4.4.0)
## rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.4.0)
## rsample * 1.2.1 2024-03-25 [1] CRAN (R 4.4.0)
## rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0)
## sass 0.4.9 2024-03-15 [1] CRAN (R 4.4.0)
## scales * 1.3.0 2023-11-28 [1] CRAN (R 4.4.0)
## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
## shiny 1.8.1.1 2024-04-02 [1] CRAN (R 4.4.0)
## stringi 1.8.4 2024-05-06 [1] CRAN (R 4.4.0)
## stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.4.0)
## survival 3.5-8 2024-02-14 [2] CRAN (R 4.4.0)
## svglite 2.1.3 2023-12-08 [1] CRAN (R 4.4.0)
## systemfonts 1.1.0 2024-05-15 [1] CRAN (R 4.4.0)
## tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.4.0)
## tidymodels * 1.2.0 2024-03-25 [1] CRAN (R 4.4.0)
## tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.4.0)
## tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0)
## tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.4.0)
## timechange 0.3.0 2024-01-18 [1] CRAN (R 4.4.0)
## timeDate 4032.109 2023-12-14 [1] CRAN (R 4.4.0)
## tune * 1.2.1 2024-04-18 [1] CRAN (R 4.4.0)
## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.4.0)
## urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.4.0)
## usethis 2.2.3 2024-02-19 [1] CRAN (R 4.4.0)
## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0)
## vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0)
## vip * 0.4.1 2023-08-21 [1] CRAN (R 4.4.0)
## viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.4.0)
## withr 3.0.0 2024-01-16 [1] CRAN (R 4.4.0)
## workflows * 1.1.4 2024-02-19 [1] CRAN (R 4.4.0)
## workflowsets * 1.1.0 2024-03-21 [1] CRAN (R 4.4.0)
## xfun 0.46 2024-07-18 [1] CRAN (R 4.4.1)
## xml2 1.3.6 2023-12-04 [1] CRAN (R 4.4.0)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.0)
## yaml 2.3.9 2024-07-05 [1] CRAN (R 4.4.1)
## yardstick * 1.3.1 2024-03-21 [1] CRAN (R 4.4.0)
##
## [1] C:/Users/wayne/AppData/Local/R/win-library/4.4
## [2] C:/Program Files/R/R-4.4.0/library
##
## ──────────────────────────────────────────────────────────────────────────────