Machine Learning Multivariate Regression

Objective

This vignette demonstrates machine learning linear regression to predict client satisfaction with service quality. Service quality is critical for business success. It is essential for attracting new customers and retaining existing customers.

The data set contains observations from 424 clients in a business-to-business (B2B) relationship. Data was gathered on a custom-designed survey instrument based on the SERVQUAL theoretical framework. SERVQUAL consists of five dimensions described as follows:

Tangibles – the appearance of physical facilities, equipment, personnel and communication materials
Assurance – knowledge and ability to inspire trust and confidence
Reliability – ability to perform service dependably and accurately
Responsiveness – willingness to help and provide prompt service
Empathy – providing caring and individualised attention.

In addition to the above dimensions, and to assist with statistical modelling, also measured each client’s overall satisfaction with service quality.

Workflow

The raw data set was wrangled and tidied before processing. Conducted a brief exploratory analysis comprising a statistical summary, distribution visualisation and correlation analysis to understand the variables.

Building and implementing the machine learning model commenced by randomly splitting the data into training and testing sets at a 75:25 percentage ratio or 3:1 split. Implemented stratified sampling to ensure that all levels of the response variable in the test set had roughly equivalent proportions of values to the training set.

Then, formulated model specifications and feature engineering before fitting the model to the training data and reviewing the results.

The model was then applied to the unseen test set to predict the response variable, overall satisfaction. Evaluated the model’s prediction performance on the test set. Finally, estimated the importance of predictor variables with overall satisfaction for the test set and compared this result with the training set.

Results

1. Explore data

Table 1 is a statistical summary of the five explanatory variables and the response variable, overall satisfaction.

Table 1 Service quality statistical summary
	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
tangibles	1	424	5.92	0.90	6.0	6.05	0.74	2	7	5	-1.61	3.51	0.04
assurance	2	424	5.52	1.07	6.0	5.61	0.74	2	7	5	-0.85	0.09	0.05
reliability	3	424	5.43	1.32	6.0	5.60	0.74	1	7	6	-1.13	0.60	0.06
responsiveness	4	424	5.29	1.37	6.0	5.44	1.48	1	7	6	-0.90	0.23	0.07
empathy	5	424	5.39	1.17	5.5	5.47	1.48	1	7	6	-0.78	0.71	0.06
overall satisfaction	6	424	5.66	1.23	6.0	5.86	0.00	1	7	6	-1.63	2.52	0.06

Chart 1 combination violin box plots show the distribution of the five explanatory variables and the response variable. The box plots show favourable levels of service quality for each of the explanatory variables and the response variable.

Chart 2 correlation heatmap shows all variables positively correlated with coefficients ranging from 0.55 to 0.85. The intangible measures are closely correlated compared to the tangible measure.

2. Build and fit model

The regression model was specified with additional feature engineering to normalise predictor variables and mitigate skewness. The model was fitted to the training set, resulting in fit statistics summarised in Table 2.

Table 2 Model fit statistics on the training set (arrange by p.value)
term	estimate	std.error	statistic	p.value
(Intercept)	5.69	0.04	136.95	0.00
responsiveness	0.46	0.10	4.86	0.00
reliability	0.27	0.09	3.00	0.00
empathy	0.24	0.08	2.94	0.00
assurance	0.17	0.08	2.06	0.04
tangibles	-0.02	0.07	-0.35	0.73

Table 3 summarises model performance metrics on the training set. The adjusted R² on the training set data was 0.6379.

Table 3 Fit metrics on training set
r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual	nobs
0.6436	0.6379	0.7397	112.3364	0	5	-351.2092	716.4183	742.7307	170.1798	311	317

Chart 3 illustrates the importance of explanatory variables modelled on the randomly split training data.

3. Model prediction and evaluation

The model fitted on the training data was applied to the unseen test data to predict the response variable, overall satisfaction. Table 4 compares actual and predicted overall satisfaction (.pred) in a small sample of observations extracted from the test set.

Table 4 A sample of predictions on the test set
tangibles	assurance	reliability	responsiveness	empathy	overall_satisfaction	.pred
7.0	6.5	6.5	6.5	7.0	7	6.86
6.5	5.5	4.5	2.5	3.5	6	4.20
7.0	7.0	7.0	7.0	7.0	7	7.44
6.0	4.0	3.5	2.0	4.0	4	3.93
2.0	3.5	3.0	2.5	3.5	3	3.90
6.0	5.0	2.0	1.5	4.5	2	3.73
6.5	6.5	6.5	6.5	6.5	7	6.71
6.0	4.5	4.5	4.5	4.0	6	4.69
4.0	4.0	3.5	4.0	4.5	4	4.51
6.0	6.0	6.5	7.0	7.0	7	7.03

Table 5 summarises regression metrics on the test set. The R² estimate for the test set was 0.6999, higher than but similar to the R² for the training set (0.6379 adjusted).

Table 5 Predictive performance metrics on the test set
.metric	.estimator	.estimate
rmse	standard	0.6825
rsq	standard	0.6999

Chart 4 visualises the actual level of satisfaction for the test set compared to the model prediction for the test set. The dotted line through the origin (x=y) represents the perfect model where all predicted values would equal the actual value in the test set. The chart shows that the model is more accurate at predicting higher overall satisfaction compared to lower overall satisfaction. This is due to insufficient actual observations with low overall satisfaction compared to high overall satisfaction (see Chart 1) for the model to be fully trained across all values.

Chart 5 shows responsiveness and reliability are estimated to be the most important predictor variables in the test set explaining the overall level of client satisfaction with service quality. This is consistent with the results in the training set shown in Chart 3.

Reference:

Data was gathered using a custom-designed survey instrument based on the SERVQUAL theoretical framework. The SERVQUAL methodology is documented in Delivering Quality Service: Balancing Customer Perceptions and Expectations by Zeithaml, Parasuraman and Berry.

Session information and package update

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.0 (2024-04-24 ucrt)
##  os       Windows 11 x64 (build 22631)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  English_Australia.utf8
##  ctype    English_Australia.utf8
##  tz       Australia/Brisbane
##  date     2024-07-30
##  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package       * version    date (UTC) lib source
##  backports       1.5.0      2024-05-23 [1] CRAN (R 4.4.0)
##  bestNormalize * 1.9.1      2023-08-18 [1] CRAN (R 4.4.0)
##  broom         * 1.0.6      2024-05-17 [1] CRAN (R 4.4.0)
##  bslib           0.7.0      2024-03-29 [1] CRAN (R 4.4.0)
##  butcher         0.3.4      2024-04-11 [1] CRAN (R 4.4.0)
##  cachem          1.1.0      2024-05-16 [1] CRAN (R 4.4.0)
##  class           7.3-22     2023-05-03 [2] CRAN (R 4.4.0)
##  cli             3.6.3      2024-06-21 [1] CRAN (R 4.4.1)
##  codetools       0.2-20     2024-03-31 [2] CRAN (R 4.4.0)
##  colorspace      2.1-0      2023-01-23 [1] CRAN (R 4.4.1)
##  data.table    * 1.15.4     2024-03-30 [1] CRAN (R 4.4.0)
##  devtools        2.4.5      2022-10-11 [1] CRAN (R 4.4.0)
##  dials         * 1.2.1      2024-02-22 [1] CRAN (R 4.4.0)
##  DiceDesign      1.10       2023-12-07 [1] CRAN (R 4.4.0)
##  digest          0.6.36     2024-06-23 [1] CRAN (R 4.4.1)
##  doParallel      1.0.17     2022-02-07 [1] CRAN (R 4.4.0)
##  doRNG           1.8.6      2023-01-16 [1] CRAN (R 4.4.0)
##  dplyr         * 1.1.4      2023-11-17 [1] CRAN (R 4.4.0)
##  ellipsis        0.3.2      2021-04-29 [1] CRAN (R 4.4.0)
##  evaluate        0.24.0     2024-06-10 [1] CRAN (R 4.4.0)
##  fansi           1.0.6      2023-12-08 [1] CRAN (R 4.4.0)
##  farver          2.1.2      2024-05-13 [1] CRAN (R 4.4.0)
##  fastmap         1.2.0      2024-05-15 [1] CRAN (R 4.4.0)
##  forcats       * 1.0.0      2023-01-29 [1] CRAN (R 4.4.0)
##  foreach         1.5.2      2022-02-02 [1] CRAN (R 4.4.0)
##  fs              1.6.4      2024-04-25 [1] CRAN (R 4.4.0)
##  furrr           0.3.1      2022-08-15 [1] CRAN (R 4.4.0)
##  future          1.33.2     2024-03-26 [1] CRAN (R 4.4.0)
##  future.apply    1.11.2     2024-03-28 [1] CRAN (R 4.4.0)
##  generics        0.1.3      2022-07-05 [1] CRAN (R 4.4.0)
##  GGally        * 2.2.1      2024-02-14 [1] CRAN (R 4.4.0)
##  ggplot2       * 3.5.1      2024-04-23 [1] CRAN (R 4.4.0)
##  ggstats         0.6.0      2024-04-05 [1] CRAN (R 4.4.0)
##  globals         0.16.3     2024-03-08 [1] CRAN (R 4.4.0)
##  glue            1.7.0      2024-01-09 [1] CRAN (R 4.4.0)
##  gower           1.0.1      2022-12-22 [1] CRAN (R 4.4.0)
##  GPfit           1.0-8      2019-02-08 [1] CRAN (R 4.4.0)
##  gtable          0.3.5      2024-04-22 [1] CRAN (R 4.4.0)
##  hardhat         1.4.0      2024-06-02 [1] CRAN (R 4.4.0)
##  here          * 1.0.1      2020-12-13 [1] CRAN (R 4.4.0)
##  highr           0.11       2024-05-26 [1] CRAN (R 4.4.0)
##  hms             1.1.3      2023-03-21 [1] CRAN (R 4.4.0)
##  htmltools       0.5.8.1    2024-04-04 [1] CRAN (R 4.4.0)
##  htmlwidgets     1.6.4      2023-12-06 [1] CRAN (R 4.4.0)
##  httpuv          1.6.15     2024-03-26 [1] CRAN (R 4.4.0)
##  infer         * 1.0.7      2024-03-25 [1] CRAN (R 4.4.0)
##  ipred           0.9-15     2024-07-18 [1] CRAN (R 4.4.1)
##  iterators       1.0.14     2022-02-05 [1] CRAN (R 4.4.0)
##  jquerylib       0.1.4      2021-04-26 [1] CRAN (R 4.4.0)
##  jsonlite        1.8.8      2023-12-04 [1] CRAN (R 4.4.0)
##  kableExtra    * 1.4.0      2024-01-24 [1] CRAN (R 4.4.0)
##  knitr           1.48       2024-07-07 [1] CRAN (R 4.4.1)
##  labeling        0.4.3      2023-08-29 [1] CRAN (R 4.4.0)
##  later           1.3.2      2023-12-06 [1] CRAN (R 4.4.0)
##  lattice         0.22-6     2024-03-20 [2] CRAN (R 4.4.0)
##  lava            1.8.0      2024-03-05 [1] CRAN (R 4.4.0)
##  lhs             1.2.0      2024-06-30 [1] CRAN (R 4.4.1)
##  lifecycle       1.0.4      2023-11-07 [1] CRAN (R 4.4.0)
##  listenv         0.9.1      2024-01-29 [1] CRAN (R 4.4.0)
##  lubridate     * 1.9.3      2023-09-27 [1] CRAN (R 4.4.0)
##  magrittr        2.0.3      2022-03-30 [1] CRAN (R 4.4.0)
##  MASS            7.3-60.2   2024-04-24 [2] local
##  Matrix          1.7-0      2024-03-22 [2] CRAN (R 4.4.0)
##  memoise         2.0.1      2021-11-26 [1] CRAN (R 4.4.0)
##  mime            0.12       2021-09-28 [1] CRAN (R 4.4.0)
##  miniUI          0.1.1.1    2018-05-18 [1] CRAN (R 4.4.0)
##  mnormt          2.1.1      2022-09-26 [1] CRAN (R 4.4.0)
##  modeldata     * 1.4.0      2024-06-19 [1] CRAN (R 4.4.1)
##  munsell         0.5.1      2024-04-01 [1] CRAN (R 4.4.0)
##  nlme            3.1-164    2023-11-27 [2] CRAN (R 4.4.0)
##  nnet            7.3-19     2023-05-03 [2] CRAN (R 4.4.0)
##  nortest         1.0-4      2015-07-30 [1] CRAN (R 4.4.0)
##  parallelly      1.37.1     2024-02-29 [1] CRAN (R 4.4.0)
##  parsnip       * 1.2.1      2024-03-22 [1] CRAN (R 4.4.0)
##  pillar          1.9.0      2023-03-22 [1] CRAN (R 4.4.0)
##  pkgbuild        1.4.4      2024-03-17 [1] CRAN (R 4.4.0)
##  pkgconfig       2.0.3      2019-09-22 [1] CRAN (R 4.4.0)
##  pkgload         1.4.0      2024-06-28 [1] CRAN (R 4.4.1)
##  plyr            1.8.9      2023-10-02 [1] CRAN (R 4.4.0)
##  prodlim         2024.06.25 2024-06-24 [1] CRAN (R 4.4.1)
##  profvis         0.3.8      2023-05-02 [1] CRAN (R 4.4.0)
##  promises        1.3.0      2024-04-05 [1] CRAN (R 4.4.0)
##  psych         * 2.4.6.26   2024-06-27 [1] CRAN (R 4.4.1)
##  purrr         * 1.0.2      2023-08-10 [1] CRAN (R 4.4.0)
##  R6              2.5.1      2021-08-19 [1] CRAN (R 4.4.0)
##  RColorBrewer    1.1-3      2022-04-03 [1] CRAN (R 4.4.0)
##  Rcpp            1.0.13     2024-07-17 [1] CRAN (R 4.4.1)
##  readr         * 2.1.5      2024-01-10 [1] CRAN (R 4.4.0)
##  recipes       * 1.1.0      2024-07-04 [1] CRAN (R 4.4.1)
##  remotes         2.5.0      2024-03-17 [1] CRAN (R 4.4.0)
##  rlang           1.1.4      2024-06-04 [1] CRAN (R 4.4.0)
##  rmarkdown       2.27       2024-05-17 [1] CRAN (R 4.4.0)
##  rngtools        1.5.2      2021-09-20 [1] CRAN (R 4.4.0)
##  rpart           4.1.23     2023-12-05 [2] CRAN (R 4.4.0)
##  rprojroot       2.0.4      2023-11-05 [1] CRAN (R 4.4.0)
##  rsample       * 1.2.1      2024-03-25 [1] CRAN (R 4.4.0)
##  rstudioapi      0.16.0     2024-03-24 [1] CRAN (R 4.4.0)
##  sass            0.4.9      2024-03-15 [1] CRAN (R 4.4.0)
##  scales        * 1.3.0      2023-11-28 [1] CRAN (R 4.4.0)
##  sessioninfo     1.2.2      2021-12-06 [1] CRAN (R 4.4.0)
##  shiny           1.8.1.1    2024-04-02 [1] CRAN (R 4.4.0)
##  stringi         1.8.4      2024-05-06 [1] CRAN (R 4.4.0)
##  stringr       * 1.5.1      2023-11-14 [1] CRAN (R 4.4.0)
##  survival        3.5-8      2024-02-14 [2] CRAN (R 4.4.0)
##  svglite         2.1.3      2023-12-08 [1] CRAN (R 4.4.0)
##  systemfonts     1.1.0      2024-05-15 [1] CRAN (R 4.4.0)
##  tibble        * 3.2.1      2023-03-20 [1] CRAN (R 4.4.0)
##  tidymodels    * 1.2.0      2024-03-25 [1] CRAN (R 4.4.0)
##  tidyr         * 1.3.1      2024-01-24 [1] CRAN (R 4.4.0)
##  tidyselect      1.2.1      2024-03-11 [1] CRAN (R 4.4.0)
##  tidyverse     * 2.0.0      2023-02-22 [1] CRAN (R 4.4.0)
##  timechange      0.3.0      2024-01-18 [1] CRAN (R 4.4.0)
##  timeDate        4032.109   2023-12-14 [1] CRAN (R 4.4.0)
##  tune          * 1.2.1      2024-04-18 [1] CRAN (R 4.4.0)
##  tzdb            0.4.0      2023-05-12 [1] CRAN (R 4.4.0)
##  urlchecker      1.0.1      2021-11-30 [1] CRAN (R 4.4.0)
##  usethis         2.2.3      2024-02-19 [1] CRAN (R 4.4.0)
##  utf8            1.2.4      2023-10-22 [1] CRAN (R 4.4.0)
##  vctrs           0.6.5      2023-12-01 [1] CRAN (R 4.4.0)
##  vip           * 0.4.1      2023-08-21 [1] CRAN (R 4.4.0)
##  viridisLite     0.4.2      2023-05-02 [1] CRAN (R 4.4.0)
##  withr           3.0.0      2024-01-16 [1] CRAN (R 4.4.0)
##  workflows     * 1.1.4      2024-02-19 [1] CRAN (R 4.4.0)
##  workflowsets  * 1.1.0      2024-03-21 [1] CRAN (R 4.4.0)
##  xfun            0.46       2024-07-18 [1] CRAN (R 4.4.1)
##  xml2            1.3.6      2023-12-04 [1] CRAN (R 4.4.0)
##  xtable          1.8-4      2019-04-21 [1] CRAN (R 4.4.0)
##  yaml            2.3.9      2024-07-05 [1] CRAN (R 4.4.1)
##  yardstick     * 1.3.1      2024-03-21 [1] CRAN (R 4.4.0)
## 
##  [1] C:/Users/wayne/AppData/Local/R/win-library/4.4
##  [2] C:/Program Files/R/R-4.4.0/library
## 
## ──────────────────────────────────────────────────────────────────────────────