Objective

This vignette demonstrates supervised cluster analysis. Wikipedia describes cluster analysis as the process of grouping a set of objects so that objects in the same group (called a cluster) are more similar than those in other groups (clusters).

Cluster analysis is widely implemented in business, traditionally in marketing for market (customer) segmentation. In market segmentation, cluster analysis defines the similarities and differences across customer segments for product/service preferences and customer behaviour. Cluster analysis is also widely implemented in other fields and disciplines to observe similarities and differences in data.

This vignette conducts a supervised cluster analysis on a targeted set of 10 organisations implementing major change. The analysis aims to assign these organisations into clusters based on the emotional attributes of employees towards the change. This vignette compares the results of two clustering algorithms: hierarchical and k-means.

Workflow

The workflow was similar for both clustering algorithms. The data frame was pre-processed and tidied. This included a review of the variables for statistical integrity and organisation names relabelled. The 10 organisations consisted of two corporations, one Federal Government Agency, one State Government Agency, one State Government Authority, three State Government Departments and two Local Government Authorities.

For cluster analysis, it is generally mandatory to normalise or standardise scales, mainly if variables use different measurement scales. Standardisation was unnecessary for this instance, as data was gathered using the same seven-point scale. Nevertheless, normalisation was carried out.

The next important step in cluster analysis was choosing the segmentation variables upon which to build the clusters. For this vignette, the segmentation variable was employee emotional attributes towards organisational change.

When conducting cluster analysis, it is beneficial to calculate the optimal number of clusters in the data. Two different statistical methods (elbow and silhouette) were implemented and compared to determine the optimal number of clusters.

Finally, visualised the outcomes for both hierarchical and k-means clustering algorithms.

Results

1. Hierarchical clustering

Calculated the optimal number of clusters and visualised results with hierarchical cluster dendrograms.

1.1 Optimal hierarchical clusters

The optimal number of hierarchical clusters for this data set using the elbow and silhouette methods was three (3), shown in Charts 1 and 2.

1.2 Visualise dendrograms

Example 1 Dendrogram

Example 2 Dendrogram

2. K-means clustering

Calculated the optimal number of clusters and visualised the result with a k-means cluster plot.

2.1 Optimal K-means clusters

The optimal number of k-means clusters for this data set using the elbow and silhouette methods was also three (3), shown in Charts 5 and 6.

2.2 Visualise clusters

3. Summary

The above analysis shows consistent results for both hierarchical and k-means clustering algorithms. The analysis derived three clusters. The green cluster for both algorithms comprised three organisations (State Government Department 1, Corporation 1 and Local Government Authority 2). In this cluster, employee emotion was generally positive and supportive of organisational change. The red cluster comprised one organisation (Corporation 2). This organisation had employees mostly exhibiting negative emotion and opposition to organisational change. The blue cluster comprised the remaining six organisations. Employees within this cluster generally exhibited near-neutral emotional attributes towards organisational change.

In addition to this example of supervised cluster analysis, see the vignette on unsupervised cluster analysis where the target variable is unknown.


Reference:

Emotion was measured using ‘A semantic differential mood scale’ by Lorr and Wunderlich, published in the Journal of Clinical Psychology.


Session information and package update

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.0 (2024-04-24 ucrt)
##  os       Windows 11 x64 (build 22631)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  English_Australia.utf8
##  ctype    English_Australia.utf8
##  tz       Australia/Brisbane
##  date     2024-07-29
##  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  abind         1.4-5   2016-07-21 [1] CRAN (R 4.4.0)
##  backports     1.5.0   2024-05-23 [1] CRAN (R 4.4.0)
##  broom         1.0.6   2024-05-17 [1] CRAN (R 4.4.0)
##  bslib         0.7.0   2024-03-29 [1] CRAN (R 4.4.0)
##  cachem        1.1.0   2024-05-16 [1] CRAN (R 4.4.0)
##  car           3.1-2   2023-03-30 [1] CRAN (R 4.4.0)
##  carData       3.0-5   2022-01-06 [1] CRAN (R 4.4.0)
##  cli           3.6.3   2024-06-21 [1] CRAN (R 4.4.1)
##  cluster     * 2.1.6   2023-12-01 [2] CRAN (R 4.4.0)
##  colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.4.1)
##  data.table  * 1.15.4  2024-03-30 [1] CRAN (R 4.4.0)
##  dendextend  * 1.17.1  2023-03-25 [1] CRAN (R 4.4.0)
##  devtools      2.4.5   2022-10-11 [1] CRAN (R 4.4.0)
##  digest        0.6.36  2024-06-23 [1] CRAN (R 4.4.1)
##  dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.4.0)
##  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.4.0)
##  evaluate      0.24.0  2024-06-10 [1] CRAN (R 4.4.0)
##  factoextra  * 1.0.7   2020-04-01 [1] CRAN (R 4.4.0)
##  fansi         1.0.6   2023-12-08 [1] CRAN (R 4.4.0)
##  farver        2.1.2   2024-05-13 [1] CRAN (R 4.4.0)
##  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.4.0)
##  forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.4.0)
##  fs            1.6.4   2024-04-25 [1] CRAN (R 4.4.0)
##  generics      0.1.3   2022-07-05 [1] CRAN (R 4.4.0)
##  ggplot2     * 3.5.1   2024-04-23 [1] CRAN (R 4.4.0)
##  ggpubr        0.6.0   2023-02-10 [1] CRAN (R 4.4.0)
##  ggrepel     * 0.9.5   2024-01-10 [1] CRAN (R 4.4.0)
##  ggsignif      0.6.4   2022-10-13 [1] CRAN (R 4.4.0)
##  glue          1.7.0   2024-01-09 [1] CRAN (R 4.4.0)
##  gridExtra     2.3     2017-09-09 [1] CRAN (R 4.4.0)
##  gtable        0.3.5   2024-04-22 [1] CRAN (R 4.4.0)
##  here        * 1.0.1   2020-12-13 [1] CRAN (R 4.4.0)
##  highr         0.11    2024-05-26 [1] CRAN (R 4.4.0)
##  hms           1.1.3   2023-03-21 [1] CRAN (R 4.4.0)
##  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
##  htmlwidgets   1.6.4   2023-12-06 [1] CRAN (R 4.4.0)
##  httpuv        1.6.15  2024-03-26 [1] CRAN (R 4.4.0)
##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.4.0)
##  jsonlite      1.8.8   2023-12-04 [1] CRAN (R 4.4.0)
##  knitr         1.48    2024-07-07 [1] CRAN (R 4.4.1)
##  labeling      0.4.3   2023-08-29 [1] CRAN (R 4.4.0)
##  later         1.3.2   2023-12-06 [1] CRAN (R 4.4.0)
##  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.4.0)
##  lubridate   * 1.9.3   2023-09-27 [1] CRAN (R 4.4.0)
##  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.4.0)
##  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.4.0)
##  mime          0.12    2021-09-28 [1] CRAN (R 4.4.0)
##  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0)
##  munsell       0.5.1   2024-04-01 [1] CRAN (R 4.4.0)
##  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.4.0)
##  pkgbuild      1.4.4   2024-03-17 [1] CRAN (R 4.4.0)
##  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.4.0)
##  pkgload       1.4.0   2024-06-28 [1] CRAN (R 4.4.1)
##  plyr          1.8.9   2023-10-02 [1] CRAN (R 4.4.0)
##  profvis       0.3.8   2023-05-02 [1] CRAN (R 4.4.0)
##  promises      1.3.0   2024-04-05 [1] CRAN (R 4.4.0)
##  purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.4.0)
##  R6            2.5.1   2021-08-19 [1] CRAN (R 4.4.0)
##  Rcpp          1.0.13  2024-07-17 [1] CRAN (R 4.4.1)
##  readr       * 2.1.5   2024-01-10 [1] CRAN (R 4.4.0)
##  remotes       2.5.0   2024-03-17 [1] CRAN (R 4.4.0)
##  reshape2      1.4.4   2020-04-09 [1] CRAN (R 4.4.0)
##  rlang         1.1.4   2024-06-04 [1] CRAN (R 4.4.0)
##  rmarkdown     2.27    2024-05-17 [1] CRAN (R 4.4.0)
##  rprojroot     2.0.4   2023-11-05 [1] CRAN (R 4.4.0)
##  rstatix       0.7.2   2023-02-01 [1] CRAN (R 4.4.0)
##  rstudioapi    0.16.0  2024-03-24 [1] CRAN (R 4.4.0)
##  sass          0.4.9   2024-03-15 [1] CRAN (R 4.4.0)
##  scales        1.3.0   2023-11-28 [1] CRAN (R 4.4.0)
##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.4.0)
##  shiny         1.8.1.1 2024-04-02 [1] CRAN (R 4.4.0)
##  stringi       1.8.4   2024-05-06 [1] CRAN (R 4.4.0)
##  stringr     * 1.5.1   2023-11-14 [1] CRAN (R 4.4.0)
##  tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.4.0)
##  tidyr       * 1.3.1   2024-01-24 [1] CRAN (R 4.4.0)
##  tidyselect    1.2.1   2024-03-11 [1] CRAN (R 4.4.0)
##  tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.4.0)
##  timechange    0.3.0   2024-01-18 [1] CRAN (R 4.4.0)
##  tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.4.0)
##  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.4.0)
##  usethis       2.2.3   2024-02-19 [1] CRAN (R 4.4.0)
##  utf8          1.2.4   2023-10-22 [1] CRAN (R 4.4.0)
##  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
##  viridis       0.6.5   2024-01-29 [1] CRAN (R 4.4.0)
##  viridisLite   0.4.2   2023-05-02 [1] CRAN (R 4.4.0)
##  withr         3.0.0   2024-01-16 [1] CRAN (R 4.4.0)
##  xfun          0.46    2024-07-18 [1] CRAN (R 4.4.1)
##  xtable        1.8-4   2019-04-21 [1] CRAN (R 4.4.0)
##  yaml          2.3.9   2024-07-05 [1] CRAN (R 4.4.1)
## 
##  [1] C:/Users/wayne/AppData/Local/R/win-library/4.4
##  [2] C:/Program Files/R/R-4.4.0/library
## 
## ──────────────────────────────────────────────────────────────────────────────