Objective

Wikipedia describes exploratory data analysis (EDA) as analysing datasets to summarise main characteristics, often using statistical graphics and other data visualisation methods. According to Wickham and Grolemund in their publication R for Data Science, two questions help make discoveries within data. These questions are: what type of variation occurs within variables and what type of covariation occurs between variables?

This vignette explores two categorical variables: respondents’ age group and job role. The dataset comprised 616 respondents from 10 public and private sector organisations experiencing organisational change.

Workflow

The raw dataset was tidied prior to exploration. This included renaming variables, updating data types and checking for anomalies. Cases with missing values were removed. Apart from collapsing the original age group levels, no other wrangling was required.

Both age group and job role are summarised with frequency tables and then visualised with an appropriate chart. To conclude, explored the covariation between these two categorical variables.

Results

1. Explore age group

Table 1 summarises the distribution of age group by count and proportion. Chart 1 visualises the distribution of age group.

Table 1 Age group distribution summary (sorted by frequency)

age group

n

percent

40-49 yrs

203

33%

30-39 yrs

180

30%

<30 yrs

126

21%

>49 yrs

100

16%

N = 609

2. Explore job role

Table 2 summarises the distribution of job role by count and proportion. Chart 2 visualises the distribution of job role.

Table 2 Job role distribution summary (sorted by frequency)

job role

n

percent

Employee

358

59%

Middle management

121

20%

Supervisor

92

15%

Executive/Senior management

40

7%

N = 611

3. Exploring covariation

Table 3 shows the frequency of job role by age group.

Table 3 Contingency table of job role by age group

Age group

Employee

Supervisor

Middle management

Executive/Senior management

<30 yrs

107

9

9

0

30-39 yrs

107

29

34

10

40-49 yrs

98

36

51

16

>49 yrs

43

17

26

14

N = 606

Table 4 calculates the proportion of job role within each age group.

Table 4 Contingency table of job role by age group

Age group

Employee

Supervisor

Middle management

Executive/Senior management

<30 yrs

86%

7%

7%

30-39 yrs

59%

16%

19%

6%

40-49 yrs

49%

18%

25%

8%

>49 yrs

43%

17%

26%

14%

Charts 3 and 4 visualise the frequency of job role across age groups using stacked and dodge plots.

Charts 5 and 6 illustrate alternative ways to visualise frequency across two categorical variables.

Finally, Chart 7 is a stacked bar chart showing the proportion of job role within each age group. This chart highlights the covariation between these two categorical variables.

This vignette explored the distribution within and across two categorical variables. See the vignette on categorical hypothesis testing to determine if there is a statistically significant relationship between age group and job role.


Session information and package update

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.0 (2024-04-24 ucrt)
##  os       Windows 11 x64 (build 22631)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  English_Australia.utf8
##  ctype    English_Australia.utf8
##  tz       Australia/Brisbane
##  date     2024-07-29
##  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package           * version date (UTC) lib source
##  askpass             1.2.0   2023-09-03 [1] CRAN (R 4.4.0)
##  bslib               0.7.0   2024-03-29 [1] CRAN (R 4.4.0)
##  cachem              1.1.0   2024-05-16 [1] CRAN (R 4.4.0)
##  cli                 3.6.3   2024-06-21 [1] CRAN (R 4.4.1)
##  colorspace          2.1-0   2023-01-23 [1] CRAN (R 4.4.1)
##  crayon              1.5.3   2024-06-20 [1] CRAN (R 4.4.1)
##  crul                1.5.0   2024-07-19 [1] CRAN (R 4.4.1)
##  curl                5.2.1   2024-03-01 [1] CRAN (R 4.4.0)
##  data.table        * 1.15.4  2024-03-30 [1] CRAN (R 4.4.0)
##  devtools            2.4.5   2022-10-11 [1] CRAN (R 4.4.0)
##  digest              0.6.36  2024-06-23 [1] CRAN (R 4.4.1)
##  dplyr             * 1.1.4   2023-11-17 [1] CRAN (R 4.4.0)
##  ellipsis            0.3.2   2021-04-29 [1] CRAN (R 4.4.0)
##  evaluate            0.24.0  2024-06-10 [1] CRAN (R 4.4.0)
##  fansi               1.0.6   2023-12-08 [1] CRAN (R 4.4.0)
##  farver              2.1.2   2024-05-13 [1] CRAN (R 4.4.0)
##  fastmap             1.2.0   2024-05-15 [1] CRAN (R 4.4.0)
##  flextable         * 0.9.6   2024-05-05 [1] CRAN (R 4.4.0)
##  fontBitstreamVera   0.1.1   2017-02-01 [1] CRAN (R 4.4.0)
##  fontLiberation      0.1.0   2016-10-15 [1] CRAN (R 4.4.0)
##  fontquiver          0.2.1   2017-02-01 [1] CRAN (R 4.4.0)
##  forcats           * 1.0.0   2023-01-29 [1] CRAN (R 4.4.0)
##  fs                  1.6.4   2024-04-25 [1] CRAN (R 4.4.0)
##  gdtools             0.3.7   2024-03-05 [1] CRAN (R 4.4.0)
##  generics            0.1.3   2022-07-05 [1] CRAN (R 4.4.0)
##  gfonts              0.2.0   2023-01-08 [1] CRAN (R 4.4.0)
##  ggplot2           * 3.5.1   2024-04-23 [1] CRAN (R 4.4.0)
##  glue                1.7.0   2024-01-09 [1] CRAN (R 4.4.0)
##  gtable              0.3.5   2024-04-22 [1] CRAN (R 4.4.0)
##  here              * 1.0.1   2020-12-13 [1] CRAN (R 4.4.0)
##  highr               0.11    2024-05-26 [1] CRAN (R 4.4.0)
##  hms                 1.1.3   2023-03-21 [1] CRAN (R 4.4.0)
##  htmltools           0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
##  htmlwidgets         1.6.4   2023-12-06 [1] CRAN (R 4.4.0)
##  httpcode            0.3.0   2020-04-10 [1] CRAN (R 4.4.0)
##  httpuv              1.6.15  2024-03-26 [1] CRAN (R 4.4.0)
##  jquerylib           0.1.4   2021-04-26 [1] CRAN (R 4.4.0)
##  jsonlite            1.8.8   2023-12-04 [1] CRAN (R 4.4.0)
##  knitr               1.48    2024-07-07 [1] CRAN (R 4.4.1)
##  labeling            0.4.3   2023-08-29 [1] CRAN (R 4.4.0)
##  later               1.3.2   2023-12-06 [1] CRAN (R 4.4.0)
##  lifecycle           1.0.4   2023-11-07 [1] CRAN (R 4.4.0)
##  lubridate         * 1.9.3   2023-09-27 [1] CRAN (R 4.4.0)
##  magrittr            2.0.3   2022-03-30 [1] CRAN (R 4.4.0)
##  memoise             2.0.1   2021-11-26 [1] CRAN (R 4.4.0)
##  mime                0.12    2021-09-28 [1] CRAN (R 4.4.0)
##  miniUI              0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0)
##  munsell             0.5.1   2024-04-01 [1] CRAN (R 4.4.0)
##  officer             0.6.6   2024-05-05 [1] CRAN (R 4.4.0)
##  openssl             2.2.0   2024-05-16 [1] CRAN (R 4.4.0)
##  pillar              1.9.0   2023-03-22 [1] CRAN (R 4.4.0)
##  pkgbuild            1.4.4   2024-03-17 [1] CRAN (R 4.4.0)
##  pkgconfig           2.0.3   2019-09-22 [1] CRAN (R 4.4.0)
##  pkgload             1.4.0   2024-06-28 [1] CRAN (R 4.4.1)
##  profvis             0.3.8   2023-05-02 [1] CRAN (R 4.4.0)
##  promises            1.3.0   2024-04-05 [1] CRAN (R 4.4.0)
##  purrr             * 1.0.2   2023-08-10 [1] CRAN (R 4.4.0)
##  R6                  2.5.1   2021-08-19 [1] CRAN (R 4.4.0)
##  ragg                1.3.2   2024-05-15 [1] CRAN (R 4.4.0)
##  Rcpp                1.0.13  2024-07-17 [1] CRAN (R 4.4.1)
##  readr             * 2.1.5   2024-01-10 [1] CRAN (R 4.4.0)
##  remotes             2.5.0   2024-03-17 [1] CRAN (R 4.4.0)
##  rlang               1.1.4   2024-06-04 [1] CRAN (R 4.4.0)
##  rmarkdown           2.27    2024-05-17 [1] CRAN (R 4.4.0)
##  rprojroot           2.0.4   2023-11-05 [1] CRAN (R 4.4.0)
##  rstudioapi          0.16.0  2024-03-24 [1] CRAN (R 4.4.0)
##  sass                0.4.9   2024-03-15 [1] CRAN (R 4.4.0)
##  scales              1.3.0   2023-11-28 [1] CRAN (R 4.4.0)
##  sessioninfo         1.2.2   2021-12-06 [1] CRAN (R 4.4.0)
##  shiny               1.8.1.1 2024-04-02 [1] CRAN (R 4.4.0)
##  stringi             1.8.4   2024-05-06 [1] CRAN (R 4.4.0)
##  stringr           * 1.5.1   2023-11-14 [1] CRAN (R 4.4.0)
##  systemfonts         1.1.0   2024-05-15 [1] CRAN (R 4.4.0)
##  textshaping         0.4.0   2024-05-24 [1] CRAN (R 4.4.0)
##  tibble            * 3.2.1   2023-03-20 [1] CRAN (R 4.4.0)
##  tidyr             * 1.3.1   2024-01-24 [1] CRAN (R 4.4.0)
##  tidyselect          1.2.1   2024-03-11 [1] CRAN (R 4.4.0)
##  tidyverse         * 2.0.0   2023-02-22 [1] CRAN (R 4.4.0)
##  timechange          0.3.0   2024-01-18 [1] CRAN (R 4.4.0)
##  tzdb                0.4.0   2023-05-12 [1] CRAN (R 4.4.0)
##  urlchecker          1.0.1   2021-11-30 [1] CRAN (R 4.4.0)
##  usethis             2.2.3   2024-02-19 [1] CRAN (R 4.4.0)
##  utf8                1.2.4   2023-10-22 [1] CRAN (R 4.4.0)
##  uuid                1.2-0   2024-01-14 [1] CRAN (R 4.4.0)
##  vctrs               0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
##  withr               3.0.0   2024-01-16 [1] CRAN (R 4.4.0)
##  xfun                0.46    2024-07-18 [1] CRAN (R 4.4.1)
##  xml2                1.3.6   2023-12-04 [1] CRAN (R 4.4.0)
##  xtable              1.8-4   2019-04-21 [1] CRAN (R 4.4.0)
##  yaml                2.3.9   2024-07-05 [1] CRAN (R 4.4.1)
##  zip                 2.3.1   2024-01-27 [1] CRAN (R 4.4.0)
## 
##  [1] C:/Users/wayne/AppData/Local/R/win-library/4.4
##  [2] C:/Program Files/R/R-4.4.0/library
## 
## ──────────────────────────────────────────────────────────────────────────────