Objective

Text analysis extracts valuable insights from unstructured text data for making business decisions. For example, it applies to analysing customer feedback, staff surveys and social media.

This vignette aims to tidy raw text and then calculate and visualise the frequency of keywords used by senior executives to describe a major change being implemented in their organisation. In this vignette, the Director-General (DG) from a large Queensland State Government Department was implementing a state-wide restructure and realignment. The DG wanted to gauge the level of support for the change from senior executives around the state. A total of 34 senior executives participated in the study.

Examining word frequency is the first step towards more advanced and informative text analysis.

Workflow

There are several steps to visualise the frequency of keywords in text data.

  1. The first step was importing raw text data for analysis. A small sample of comments by senior executives regarding the restructure follows.
Table 1: A sample of comments by senior executives about the restructure
organisation id comments
State Govt Dept 1 617 The changes are for the good of the organisation whilst individuals are not being ignored as a consequence.
State Govt Dept 1 618 The change being undertaken recognises the risks to individual potential threats etc. and plans to assist people through the process. We have undertaken significant restructuring including relinquishing power bases etc. with support of all senior managers and staff and no apparent covert or overt resistance. This is not to say people have not argued points of view but rather been a part of the process to improvement - ownership.
State Govt Dept 1 619 Change creates opportunities for committed, active, interested individuals. I can’t think of anything which has not changed over the last 10 years alone. I have always preferred to be part of a change and influence it rather than just sit back and watch it happen.
State Govt Dept 1 620 The department can better deliver public services.
State Govt Dept 1 621 I welcome change. The move to focus on building relationships as a management approach has clear benefits in improving staff morale, motivation and encouraging innovation and improvement.
State Govt Dept 1 624 The majority of change is vitally important to our organisation. It ensures our continuing improvement.
  1. Words were then unnested, and tokens counted. Usually, the most common word tokens in raw text are stop words, as shown in Table 2.
  1. Stop words are common terms such as “the”, “and”, “to”, etcetera, as shown in the above table. Stop words provide little meaning to a sentence, and are removed from the corpus without sacrificing meaningful word tokens. In this example, the SMART lexicon identified stop words to remove. Then proceeded to remove other superfluous characters and symbols to arrive at a list of clean word tokens in Table 3.
  1. The next important step in text analysis was word lemmatisation. According to Wikipedia, lemmatisation in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item. It serves a similar purpose to word stemming. For example, the terms: improve|improved|improvement|improvements|improving were lemmatised to improvement. For this vignette, lemmatisation was conducted on inflected forms of words with multiple frequencies. Table 4 shows the outcome of lemmatisation.
  1. With text cleaning and lemmatisation complete, notice the difference with textual comments by the same senior executives in Table 5 after pre-processing compared to the raw text in Table 1 shown below.
Table 5: Sample of comments by senior executives about the restructure following cleaning and lemmatisation
organisation id text
State Govt Dept 1 617 changes good organisation whilst staff consequence
State Govt Dept 1 618 change undertaken recognises risks staff potential threats plans assist staff process undertaken significant restructuring including relinquishing power base support senior management staff apparent covert overt resistance staff argued views process improvement ownership
State Govt Dept 1 619 change creates opportunities committed active staff change preferred change influence sit watch happen
State Govt Dept 1 620 department deliver public services
State Govt Dept 1 621 change move focus building relationships management approach benefits improvement staff morale motivation encouraged innovation improvement
State Govt Dept 1 624 majority change vitally organisation ensures continuing improvement
Table 1: A sample of comments by senior executives about the restructure
organisation id comments
State Govt Dept 1 617 The changes are for the good of the organisation whilst individuals are not being ignored as a consequence.
State Govt Dept 1 618 The change being undertaken recognises the risks to individual potential threats etc. and plans to assist people through the process. We have undertaken significant restructuring including relinquishing power bases etc. with support of all senior managers and staff and no apparent covert or overt resistance. This is not to say people have not argued points of view but rather been a part of the process to improvement - ownership.
State Govt Dept 1 619 Change creates opportunities for committed, active, interested individuals. I can’t think of anything which has not changed over the last 10 years alone. I have always preferred to be part of a change and influence it rather than just sit back and watch it happen.
State Govt Dept 1 620 The department can better deliver public services.
State Govt Dept 1 621 I welcome change. The move to focus on building relationships as a management approach has clear benefits in improving staff morale, motivation and encouraging innovation and improvement.
State Govt Dept 1 624 The majority of change is vitally important to our organisation. It ensures our continuing improvement.

Results

Word frequency was visualised in two ways: with a bar chart and a word cloud. The bar chart shows the top 10 words used by senior executives to describe the restructure. In contrast, the word cloud includes all word tokens documented at least twice.

The most frequent words by senior executives about the restructure show support for management and the organisational change driven by a changing business environment. There was a perception that the restructuring would lead to improved business processes and benefit staff with promotion opportunities.

For more detailed text analysis, look at the vignettes on n-grams word pairs and correlation, sentiment analysis and topic modelling.


Reference:

Silge, J. & Robinson, D. (2017). Text Mining with R, O’Reilly Media Inc.


Session information and package update

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.0 (2024-04-24 ucrt)
##  os       Windows 11 x64 (build 22631)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  English_Australia.utf8
##  ctype    English_Australia.utf8
##  tz       Australia/Brisbane
##  date     2024-07-30
##  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package      * version date (UTC) lib source
##  bslib          0.7.0   2024-03-29 [1] CRAN (R 4.4.0)
##  cachem         1.1.0   2024-05-16 [1] CRAN (R 4.4.0)
##  cli            3.6.3   2024-06-21 [1] CRAN (R 4.4.1)
##  colorspace     2.1-0   2023-01-23 [1] CRAN (R 4.4.1)
##  crayon         1.5.3   2024-06-20 [1] CRAN (R 4.4.1)
##  crosstalk      1.2.1   2023-11-23 [1] CRAN (R 4.4.0)
##  data.table   * 1.15.4  2024-03-30 [1] CRAN (R 4.4.0)
##  devtools       2.4.5   2022-10-11 [1] CRAN (R 4.4.0)
##  digest         0.6.36  2024-06-23 [1] CRAN (R 4.4.1)
##  dplyr        * 1.1.4   2023-11-17 [1] CRAN (R 4.4.0)
##  DT           * 0.33    2024-04-04 [1] CRAN (R 4.4.0)
##  ellipsis       0.3.2   2021-04-29 [1] CRAN (R 4.4.0)
##  evaluate       0.24.0  2024-06-10 [1] CRAN (R 4.4.0)
##  fansi          1.0.6   2023-12-08 [1] CRAN (R 4.4.0)
##  farver         2.1.2   2024-05-13 [1] CRAN (R 4.4.0)
##  fastmap        1.2.0   2024-05-15 [1] CRAN (R 4.4.0)
##  forcats      * 1.0.0   2023-01-29 [1] CRAN (R 4.4.0)
##  fs             1.6.4   2024-04-25 [1] CRAN (R 4.4.0)
##  generics       0.1.3   2022-07-05 [1] CRAN (R 4.4.0)
##  ggplot2      * 3.5.1   2024-04-23 [1] CRAN (R 4.4.0)
##  glue           1.7.0   2024-01-09 [1] CRAN (R 4.4.0)
##  gtable         0.3.5   2024-04-22 [1] CRAN (R 4.4.0)
##  here         * 1.0.1   2020-12-13 [1] CRAN (R 4.4.0)
##  highr          0.11    2024-05-26 [1] CRAN (R 4.4.0)
##  hms            1.1.3   2023-03-21 [1] CRAN (R 4.4.0)
##  htmltools      0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
##  htmlwidgets    1.6.4   2023-12-06 [1] CRAN (R 4.4.0)
##  httpuv         1.6.15  2024-03-26 [1] CRAN (R 4.4.0)
##  janeaustenr    1.0.0   2022-08-26 [1] CRAN (R 4.4.0)
##  jquerylib      0.1.4   2021-04-26 [1] CRAN (R 4.4.0)
##  jsonlite       1.8.8   2023-12-04 [1] CRAN (R 4.4.0)
##  kableExtra   * 1.4.0   2024-01-24 [1] CRAN (R 4.4.0)
##  knitr          1.48    2024-07-07 [1] CRAN (R 4.4.1)
##  later          1.3.2   2023-12-06 [1] CRAN (R 4.4.0)
##  lattice        0.22-6  2024-03-20 [2] CRAN (R 4.4.0)
##  lifecycle      1.0.4   2023-11-07 [1] CRAN (R 4.4.0)
##  lubridate    * 1.9.3   2023-09-27 [1] CRAN (R 4.4.0)
##  magrittr       2.0.3   2022-03-30 [1] CRAN (R 4.4.0)
##  Matrix         1.7-0   2024-03-22 [2] CRAN (R 4.4.0)
##  memoise        2.0.1   2021-11-26 [1] CRAN (R 4.4.0)
##  mime           0.12    2021-09-28 [1] CRAN (R 4.4.0)
##  miniUI         0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0)
##  munsell        0.5.1   2024-04-01 [1] CRAN (R 4.4.0)
##  pillar         1.9.0   2023-03-22 [1] CRAN (R 4.4.0)
##  pkgbuild       1.4.4   2024-03-17 [1] CRAN (R 4.4.0)
##  pkgconfig      2.0.3   2019-09-22 [1] CRAN (R 4.4.0)
##  pkgload        1.4.0   2024-06-28 [1] CRAN (R 4.4.1)
##  profvis        0.3.8   2023-05-02 [1] CRAN (R 4.4.0)
##  promises       1.3.0   2024-04-05 [1] CRAN (R 4.4.0)
##  purrr        * 1.0.2   2023-08-10 [1] CRAN (R 4.4.0)
##  R6             2.5.1   2021-08-19 [1] CRAN (R 4.4.0)
##  RColorBrewer * 1.1-3   2022-04-03 [1] CRAN (R 4.4.0)
##  Rcpp           1.0.13  2024-07-17 [1] CRAN (R 4.4.1)
##  readr        * 2.1.5   2024-01-10 [1] CRAN (R 4.4.0)
##  remotes        2.5.0   2024-03-17 [1] CRAN (R 4.4.0)
##  rlang          1.1.4   2024-06-04 [1] CRAN (R 4.4.0)
##  rmarkdown      2.27    2024-05-17 [1] CRAN (R 4.4.0)
##  rprojroot      2.0.4   2023-11-05 [1] CRAN (R 4.4.0)
##  rstudioapi     0.16.0  2024-03-24 [1] CRAN (R 4.4.0)
##  sass           0.4.9   2024-03-15 [1] CRAN (R 4.4.0)
##  scales         1.3.0   2023-11-28 [1] CRAN (R 4.4.0)
##  sessioninfo    1.2.2   2021-12-06 [1] CRAN (R 4.4.0)
##  shiny          1.8.1.1 2024-04-02 [1] CRAN (R 4.4.0)
##  SnowballC      0.7.1   2023-04-25 [1] CRAN (R 4.4.0)
##  stringi        1.8.4   2024-05-06 [1] CRAN (R 4.4.0)
##  stringr      * 1.5.1   2023-11-14 [1] CRAN (R 4.4.0)
##  svglite        2.1.3   2023-12-08 [1] CRAN (R 4.4.0)
##  systemfonts    1.1.0   2024-05-15 [1] CRAN (R 4.4.0)
##  tibble       * 3.2.1   2023-03-20 [1] CRAN (R 4.4.0)
##  tidyr        * 1.3.1   2024-01-24 [1] CRAN (R 4.4.0)
##  tidyselect     1.2.1   2024-03-11 [1] CRAN (R 4.4.0)
##  tidytext     * 0.4.2   2024-04-10 [1] CRAN (R 4.4.0)
##  tidyverse    * 2.0.0   2023-02-22 [1] CRAN (R 4.4.0)
##  timechange     0.3.0   2024-01-18 [1] CRAN (R 4.4.0)
##  tokenizers     0.3.0   2022-12-22 [1] CRAN (R 4.4.0)
##  tzdb           0.4.0   2023-05-12 [1] CRAN (R 4.4.0)
##  urlchecker     1.0.1   2021-11-30 [1] CRAN (R 4.4.0)
##  usethis        2.2.3   2024-02-19 [1] CRAN (R 4.4.0)
##  utf8           1.2.4   2023-10-22 [1] CRAN (R 4.4.0)
##  vctrs          0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
##  viridisLite    0.4.2   2023-05-02 [1] CRAN (R 4.4.0)
##  withr          3.0.0   2024-01-16 [1] CRAN (R 4.4.0)
##  wordcloud    * 2.6     2018-08-24 [1] CRAN (R 4.4.0)
##  xfun           0.46    2024-07-18 [1] CRAN (R 4.4.1)
##  xml2           1.3.6   2023-12-04 [1] CRAN (R 4.4.0)
##  xtable         1.8-4   2019-04-21 [1] CRAN (R 4.4.0)
##  yaml           2.3.9   2024-07-05 [1] CRAN (R 4.4.1)
## 
##  [1] C:/Users/wayne/AppData/Local/R/win-library/4.4
##  [2] C:/Program Files/R/R-4.4.0/library
## 
## ──────────────────────────────────────────────────────────────────────────────