Sentiment Analysis
According to Wikipedia, sentiment analysis (also known as opinion mining) is applied to language and text to systematically identify, extract, quantify and study affective states and subjective information. Sentiment analysis is widely applied to customer reviews, survey responses and social media across many industries. Wikipedia adds that a fundamental task in sentiment analysis is classifying the polarity of language and text as either positive, negative, or neutral. Advanced sentiment analysis goes beyond polarity to analyse emotional states such as enjoyment, anger, disgust, sadness, fear and surprise.
This vignette demonstrates both polar and advanced sentiment analysis. The data set consists of responses from 86 employees in a state government department. The following analysis explores employee sentiment and emotional states as they experienced a divisional restructure within their department.
One of the first steps in sentiment analysis is understanding the impact of negated words on sentiment. For example, individuals can express the opposite meaning of a particular word by inserting or using a negation. Negations are words like no, not, and never. In sentiment analysis, this becomes important because a word may be interpreted positively when the expressed sentiment is the exact opposite. For example, the word “good” has a positive sentiment; however, when negated by “not good”, the sentiment for “good” becomes negative.
After actioning the potential impact of negated words on sentiment, tidied raw text, removed stop words, and lemmatised similar words. A bar chart visualises the frequency of keywords used by government employees to describe the restructure.
For brevity, this vignette focuses on implementing one sentiment dictionary from several options available. Popular sentiment lexicons include AFINN, Bing, Loughran McDonald and NRC. In this example, NRC will be implemented. Before doing so, all four sentiment lexicons are compared based on the order of words and the timeline of respondent comments.
NRC assigns a word into sentiment categories of positive, negative, anger, anticipation, disgust, fear, joy, sadness, surprise and trust. It is noted that words can sometimes fit into more than one category of emotional state. For example, the word “improvement”, apart from being “positive”, is also categorised as “trust” and “joy”.
Another important issue in sentiment analysis is understanding the context in which words are used. This is specific to each data set. For example, the word “tender” is categorised as “positive” and “joy”. In contrast, in a medical environment, the word “tender” is “negative” when associated with pain. Potential conflicts in sentiment specific to this corpus were considered during implementation.
This vignette aims to explore employee sentiment from two perspectives. First, from a macro perspective on the corpus (all documents), and second, from a micro perspective on individual documents to gain greater insight.
The potential impact of 10 negated terms on sentiment was conducted. Chart 1 shows the prevalence of six negated terms in the corpus. Due to the low frequencies and, in some cases, adjoining stop-words, the impact of negation on sentiment in this data set was considered low.
The choice of the lexicon is another consideration when handling negations in sentiment analysis. For example, the AFINN lexicon assigns each word a ranking between -5 and +5, indicating the degree of negative or positive sentiment, respectively. Chart 2 shows the results of analysing negated terms with AFINN to arrive at a weighting by frequency score.
For demonstration purposes, reviewed the words “support” and “helpful” preceded by “not”. Output 1 is an example of comments by respondents in raw text, where positive ranking words have the opposite negative sentiment.
Output 1 Examples of word negation
[1] "Unfairness, hypocrisy of alleged openness of new way, contradicted by top-down change decision. Asked for feedback and took no notice if did not support management's position. Also staffing issues unfair on staff and does not correct past wrongs at all."
[2] "Change for political reasons is not helpful. Change of government equals organisational change does not equal improved service."
Both negated words “not support” and “not helpful” were replaced with the synonyms “oppose” and “useless”, respectively, in the following analysis.
After preprocessing and tidying raw text, Chart 3 illustrates the most common words documented by staff to describe the divisional restructure.
Four popular lexicons for assessing sentiment are AFINN, Bing, Loughran McDonald and NRC. Chart 4 compares sentiment across these four lexicons based on the respondents’ order of words and documents. For this specific corpus, all lexicons, except for Loughran McDonald, recorded wide-ranging sentiments across the narrative or timeline.
Earlier Chart 3 showed that “management” was the third most frequent word in the corpus. The NRC lexicon assigns the word “management” with sentiments “positive” and “trust”. In the context of this corpus, more than 80 per cent of the sentiment directed towards “management” was quite the opposite, as reflected in the following sample of comments.
| id | comments by respondents |
|---|---|
| 8 | Unfairness, hypocrisy of alleged openness of new way, contradicted by top-down change decision. Asked for feedback and took no notice if did not support management’s position. Also staffing issues unfair on staff and does not correct past wrongs at all. |
| 15 | I have no personal investment in the organisation. I agree with many of the ideas which are put forward but my observation of how these are enacted indicate the change is disruptive, not because of the change itself but due to some of the management who use it as a tool for workplace violence and to obtain more power-control. Field workers are left isolated with poor role models and support. |
| 42 | Change is made up of processes, outcomes and consequences. Although the outcome of the restructure will give stability, the processes being followed and the consequences for individuals are not appropriate or just. There are shady deals being done and management is turning a blind eye to the consequences of their actions. |
| 60 | My reason is due to the fact the organisational change is because of mismanagement. Therefore, staff who have been affected should be considered prior to other departmental staff for positions they have been working in for long periods and working effectively. The process has been developed with little consultation with staff. Senior management can abuse the system by bypassing the rules in place for other staff, and appoint their favourites. The public service especially the division in their supposed merit selection process rort the system and us the employee pay for it. No equity exists. |
| 84 | The change came too quickly after the previous change. I personally do not feel my ability and dedication to the job is recognised and valued. I do not feel accepted as a member of the team. My future is uncertain and quite gloomy. Senior management recognise my work unit is understaffed but do nothing to move staff from under-utilised areas. |
| 87 | My role is likely to move to an area that is more supportive/understanding of the duties. I am also likely to have more and different opportunities. I also believe I have been in the current area long enough and my cynicism towards the area and management are growing. |
| 203 | Change will happen. When change similar to what is happening has happened before occurs and a person feels that their career moves and where they are employed are stripped from them. These people feel very wary of change and management’s action plans to change (i.e. don’t believe them). Do not want to be hurt again. |
Consequently, in the following analysis, sentiment for “management” was changed from “positive” and “trust” to “negative” and “anger” to align with the overall sentiment expressed in this corpus. There were no other modifications to the sentiment dictionary.
Presented overall respondent sentiment in two ways. First by polarity and second by emotional states encompassing fear, trust, anticipation, joy, sadness, anger, surprise and disgust.
To provide an overview, Chart 5 shows how the level of sentiment evolved during the data collection period determined by order of employee responses.
Chart 6 visualises the proportion of positive and negative sentiment towards the organisational restructure documented by employees within the division. In terms of polarity, the sentiment was mostly positive.
The following chart provides more context and understanding by drilling down into the words contributing to positive and negative sentiment.
An alternative way to visualise the most common positive and negative words is by a comparison word cloud, as illustrated in Chart 8.
Chart 9 shows the emotional state of employees towards restructuring their division. Employee fear about the restructure was evident.
While not shown here, it is possible to drill down into the words contributing to each of the eight emotional states, as demonstrated above in Chart 7, with positive/negative sentiment.
Chart 10 presents an alternative visualisation of the emotional state of employees about the restructuring in their division calculated on frequency.
This final section on sentiment analysis reviews the sentiment of individual respondents in two ways; polarity and emotional states encompassing fear, trust, anticipation, joy, sadness, anger, surprise and disgust.
Chart 11 illustrates the degree of positive, neutral and negative sentiment expressed by individual respondents about the restructure of their division.
Chart 12 provides an alternative way to visualise each respondent’s positive and negative sentiments.
The following chart provides more information by drilling into the words contributing to positive and negative sentiment. Rather than providing all respondents, Chart 13 presents results for a small sample of employees.
The emotional states of individual employees towards the restructuring can also be examined and understood. While not shown here, like in Chart 7, it is possible to drill down into the words contributing to each of the eight emotional states. Alternatively, to conclude this vignette, the emotional state of individual respondents about the restructure is presented. Once again, rather than presenting the results for all respondents, Chart 14 displays a small sample of employees. Chart 14 shows a range of emotional states about the restructuring, with a sentiment of fear quite widespread.
For more examples of text analysis, look at the vignettes on word frequency, n-grams word pairs and correlation, and topic modelling.
Reference:
Silge, J. & Robinson, D. (2017). Text Mining with R, O’Reilly Media Inc.
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.4.0 (2024-04-24 ucrt)
## os Windows 11 x64 (build 22631)
## system x86_64, mingw32
## ui RTerm
## language (EN)
## collate English_Australia.utf8
## ctype English_Australia.utf8
## tz Australia/Brisbane
## date 2024-07-30
## pandoc 3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## bslib 0.7.0 2024-03-29 [1] CRAN (R 4.4.0)
## cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.0)
## cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.1)
## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.4.1)
## data.table * 1.15.4 2024-03-30 [1] CRAN (R 4.4.0)
## devtools 2.4.5 2022-10-11 [1] CRAN (R 4.4.0)
## digest 0.6.36 2024-06-23 [1] CRAN (R 4.4.1)
## dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.4.0)
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.0)
## evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.4.0)
## fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.0)
## farver 2.1.2 2024-05-13 [1] CRAN (R 4.4.0)
## fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
## fmsb * 0.7.6 2024-01-19 [1] CRAN (R 4.4.1)
## forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.4.0)
## fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0)
## generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.0)
## ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.4.0)
## glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0)
## gtable 0.3.5 2024-04-22 [1] CRAN (R 4.4.0)
## here * 1.0.1 2020-12-13 [1] CRAN (R 4.4.0)
## highr 0.11 2024-05-26 [1] CRAN (R 4.4.0)
## hms 1.1.3 2023-03-21 [1] CRAN (R 4.4.0)
## htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
## htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.0)
## httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.4.0)
## janeaustenr 1.0.0 2022-08-26 [1] CRAN (R 4.4.0)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.4.0)
## jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.4.0)
## kableExtra * 1.4.0 2024-01-24 [1] CRAN (R 4.4.0)
## knitr 1.48 2024-07-07 [1] CRAN (R 4.4.1)
## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.4.0)
## later 1.3.2 2023-12-06 [1] CRAN (R 4.4.0)
## lattice 0.22-6 2024-03-20 [2] CRAN (R 4.4.0)
## lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
## lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.4.0)
## magrittr * 2.0.3 2022-03-30 [1] CRAN (R 4.4.0)
## Matrix 1.7-0 2024-03-22 [2] CRAN (R 4.4.0)
## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.0)
## mgcv 1.9-1 2023-12-21 [2] CRAN (R 4.4.0)
## mime 0.12 2021-09-28 [1] CRAN (R 4.4.0)
## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0)
## munsell 0.5.1 2024-04-01 [1] CRAN (R 4.4.0)
## nlme 3.1-164 2023-11-27 [2] CRAN (R 4.4.0)
## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.0)
## pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.4.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0)
## pkgload 1.4.0 2024-06-28 [1] CRAN (R 4.4.1)
## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.4.0)
## profvis 0.3.8 2023-05-02 [1] CRAN (R 4.4.0)
## promises 1.3.0 2024-04-05 [1] CRAN (R 4.4.0)
## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.4.0)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0)
## radarchart * 0.3.1 2016-12-20 [1] CRAN (R 4.4.0)
## rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.4.0)
## RColorBrewer * 1.1-3 2022-04-03 [1] CRAN (R 4.4.0)
## Rcpp 1.0.13 2024-07-17 [1] CRAN (R 4.4.1)
## readr * 2.1.5 2024-01-10 [1] CRAN (R 4.4.0)
## remotes 2.5.0 2024-03-17 [1] CRAN (R 4.4.0)
## reshape2 * 1.4.4 2020-04-09 [1] CRAN (R 4.4.0)
## rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.0)
## rmarkdown 2.27 2024-05-17 [1] CRAN (R 4.4.0)
## rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.4.0)
## rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0)
## sass 0.4.9 2024-03-15 [1] CRAN (R 4.4.0)
## scales 1.3.0 2023-11-28 [1] CRAN (R 4.4.0)
## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
## shiny 1.8.1.1 2024-04-02 [1] CRAN (R 4.4.0)
## SnowballC 0.7.1 2023-04-25 [1] CRAN (R 4.4.0)
## stringi 1.8.4 2024-05-06 [1] CRAN (R 4.4.0)
## stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.4.0)
## svglite 2.1.3 2023-12-08 [1] CRAN (R 4.4.0)
## systemfonts 1.1.0 2024-05-15 [1] CRAN (R 4.4.0)
## textdata * 0.4.5 2024-05-28 [1] CRAN (R 4.4.0)
## tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.4.0)
## tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.4.0)
## tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0)
## tidytext * 0.4.2 2024-04-10 [1] CRAN (R 4.4.0)
## tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.4.0)
## timechange 0.3.0 2024-01-18 [1] CRAN (R 4.4.0)
## tokenizers 0.3.0 2022-12-22 [1] CRAN (R 4.4.0)
## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.4.0)
## urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.4.0)
## usethis 2.2.3 2024-02-19 [1] CRAN (R 4.4.0)
## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0)
## vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0)
## viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.4.0)
## withr 3.0.0 2024-01-16 [1] CRAN (R 4.4.0)
## wordcloud * 2.6 2018-08-24 [1] CRAN (R 4.4.0)
## xfun 0.46 2024-07-18 [1] CRAN (R 4.4.1)
## xml2 1.3.6 2023-12-04 [1] CRAN (R 4.4.0)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.0)
## yaml 2.3.9 2024-07-05 [1] CRAN (R 4.4.1)
##
## [1] C:/Users/wayne/AppData/Local/R/win-library/4.4
## [2] C:/Program Files/R/R-4.4.0/library
##
## ──────────────────────────────────────────────────────────────────────────────