Hypothesis Testing - Categorical and Numerical Variables

Objective

According to Wikipedia, statistical hypothesis testing is a method of statistical inference used to determine whether the data sufficiently supports a particular hypothesis.

This vignette presents three examples of hypothesis testing between categorical and continuous variables, implementing t-test and ANOVA where appropriate. The dataset comprised 616 respondents from 10 public and private sector organisations.

The following hypothesis tests examine the relationship between:

a two-level categorical variable (gender) and a continuous variable (emotion) by applying a t-test
a multiple-level categorical variable (job role) and a continuous variable, irrational ideas, by applying one-way ANOVA
two multiple-level categorical variables (job role and education) and a continuous variable, irrational ideas, by applying a two-way ANOVA.

Workflow

Tidied the raw dataset and derived hypotheses for testing. The null hypothesis (H₀) implies the variables are independent, while the alternative hypothesis (H_a) implies the variables are associated or related. A p-value of <0.05, the general convention, was used as the cut-off to assess statistical significance.

The following table, originally sourced from the Journal of Clinical and Preventive Cardiology, provides a straightforward interpretation of p-values for testing.

Before implementing each test, conducted a brief exploratory analysis to understand the variables. Hypothesis tests are built on several assumptions, including normality, with results provided. Prepared visualisations highlighting and comparing differences in the mean. The appropriate hypothesis test was then applied. Performed a post hoc analysis to verify the findings. A table at the end of the vignette summarises testing results.

Results

Test 1: T-Test

Test 1 examines the relationship between a two-level categorical variable (gender) and a continuous variable (emotion).

Hypothesis
H₀: Gender and emotion towards organisational change are independent; there is no association between these two variables.
H_a: Gender and emotion towards organisational change are dependent; a relationship exists between these variables.

1.1 Explore data

Before testing the hypothesis, conducted a brief exploratory analysis of the variables. Chart 1 visualises the distribution of the categorical variable gender.

Chart 2 shows the distribution of the continuous variable emotion towards organisational change.

Chart 3 checks if emotion meets the assumption of normality. There is a correlation between survey data and the normal distribution. Survey data in Chart 3 generally falls in a line within or is close to the confidence interval level of 0.95.

Chart 4 compares the distribution of emotion based on gender. The notched box plots suggest similar distributions of emotion by gender.

1.2 Apply t-test

The t-test calculates if there is a statistically significant difference in the mean of emotion based on gender. The Welch Two Sample t-test, shown in Output 1, calculates a p-value of 0.6361, above the significance level of 0.05. Therefore, fail to reject the null hypothesis, as there is no evidence of an association between emotion towards organisational change and gender.

Output 1


    Welch Two Sample t-test

data:  emotion by gender
t = -0.47343, df = 575.92, p-value = 0.6361
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
 -0.2095663  0.1281593
sample estimates:
mean in group Female   mean in group Male 
            4.095161             4.135865

To verify this result, Cohen’s d was calculated. Cohen’s d computes the effect size of the standardised difference between two means. The results summarised in Output 2 show effect size for emotion by gender is “very small”.

Output 2

Cohen's d |        95% CI
-------------------------
-0.04     | [-0.20, 0.12]

- Estimated using pooled SD.

[1] "very small"
(Rules: cohen1988)

Test 2: One-Way ANOVA

This test examines the relationship between a categorical variable (with three or more levels - job role) and a continuous variable (irrational ideas).

Hypothesis
H₀: Job role and level of irrational ideas are independent; there is no association between these two variables.
H_a: Job role and level of irrational ideas are dependent; a relationship exists between these two variables.

2.1 Explore data

Before testing the hypothesis, conducted a brief exploratory analysis of the variables. Chart 5 visualises the distribution of the categorical variable job role.

Chart 6 visualises the distribution of the continuous variable irrational ideas.

Charts 7 and 8 test the normality of the residuals. Both charts show that the residuals generally follow a normal distribution, thereby satisfying the normality assumption for one-way ANOVA hypothesis testing.

Chart 9 compares the distribution of irrational ideas based on job role.

Chart 10 highlights differences in mean and mean standard error for irrational ideas across job roles.

2.2 Apply one-way ANOVA test

The one-way ANOVA test calculates if there is a statistically significant difference in the mean of irrational ideas based on job role. Output 3 shows a Pr(>F) value of 1.31e-06, which is less than the significance level of 0.05. Based on this result, reject the null hypothesis, concluding there is a statistically significant difference between the level of irrational thinking and job role.

Output 3

             Df Sum Sq Mean Sq F value   Pr(>F)    
job_role      3   17.4   5.785   10.28 1.31e-06 ***
Residuals   607  341.7   0.563                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

To delve deeper into this finding, carried out post hoc testing using multiple pairwise comparisons of means. The fitted one-way ANOVA was passed to the Tukey HSD test, summarising the results in Output 4. The results show that irrational ideas between Executive/Senior Management and Employee, and Executive/Senior Management and Supervisor, are the most statistically significant. The comparison of Middle Management and Employee and Executive/Senior Management and Middle Management was also significant, with a Pr(>|t|) value less than the cut-off of 0.05.

Output 4


     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: aov(formula = irrational_ideas ~ job_role, data = role_ii_df2)

Linear Hypotheses:
                                                     Estimate Std. Error
Supervisor - Employee == 0                           -0.02819    0.08770
Middle management - Employee == 0                    -0.22894    0.07889
Executive/Senior management - Employee == 0          -0.63243    0.12508
Middle management - Supervisor == 0                  -0.20075    0.10378
Executive/Senior management - Supervisor == 0        -0.60424    0.14209
Executive/Senior management - Middle management == 0 -0.40349    0.13683
                                                     t value Pr(>|t|)    
Supervisor - Employee == 0                            -0.321   0.9879    
Middle management - Employee == 0                     -2.902   0.0184 *  
Executive/Senior management - Employee == 0           -5.056   <0.001 ***
Middle management - Supervisor == 0                   -1.934   0.2045    
Executive/Senior management - Supervisor == 0         -4.252   <0.001 ***
Executive/Senior management - Middle management == 0  -2.949   0.0162 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

Finally, the practical significance of the effect size of this one-way ANOVA between irrational ideas and job role was assessed as small in Output 5.

Output 5

# Effect Size for ANOVA

Parameter | Eta2 |       95% CI
-------------------------------
job_role  | 0.05 | [0.00, 0.08]

- One-sided CIs: lower bound fixed at [0.00].

[1] "small"
(Rules: cohen1992)

Test 3: Two-Way ANOVA

This test builds upon Test 2 above and examines the relationship between two categorical variables (with three or more levels - job role and education) and a continuous variable (irrational ideas).

Hypothesis
H₀: Job role and education are independent; there is no interaction between these variables on irrational ideas.
H_a: Job role and education are dependent; interaction between these variables produces a synergistic effect on irrational ideas.

3.1 Explore data

A brief exploratory analysis of job role and irrational ideas was provided in Test 2. A brief overview of the categorical variable education follows.

Chart 11 visualises the distribution of the categorical variable education.

Charts 12 and 13 show that the residuals generally follow a normal distribution, satisfying the normality assumption for two-way ANOVA hypothesis testing.

Chart 14 compares irrational ideas based on job role and education.

Chart 15 highlights differences in mean and mean standard error across job roles and education.

3.2 Apply two-way ANOVA test

The two-way ANOVA test calculates if there is a statistically significant difference in the mean of irrational ideas based on job role and education. The vignette on categorical hypothesis testing showed a statistically significant relationship between the categorical variables job role and education. Therefore, the results of two-way ANOVA in Output 6 were derived from a multiplicative rather than an additive model.

Output 6

                    Df Sum Sq Mean Sq F value   Pr(>F)    
job_role             3   17.1   5.708  10.039 1.85e-06 ***
education            3    0.1   0.043   0.076    0.973    
job_role:education   9    2.0   0.223   0.392    0.939    
Residuals          589  334.9   0.569                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The above output shows, like in Test 2 of this vignette, a statistically significant relationship exists between job role and irrational ideas with a Pr(>F) value of 1.85e-06. However, the Pr(>F) value of 0.973 for education and 0.939 for job role:education were not statistically significant and did not provide a synergistic effect on irrational ideas. Based on these results, fail to reject the null hypothesis, concluding there is no interaction between job role and education on the level of irrational ideas.

To explore this result further, carried out post hoc testing using multiple pairwise comparisons of means. The fitted two-way ANOVA was passed to the Tukey HSD test. The results shown in Output 7 confirm that all pairwise comparisons of education are not statistically significant.

Output 7


     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: aov(formula = irrational_ideas ~ job_role * education, data = role_ed_ii_df2)

Linear Hypotheses:
                                                 Estimate Std. Error t value
Certificate/Diploma - High school == 0          -0.021033   0.151567  -0.139
Undergraduate degree - High school == 0         -0.003806   0.127550  -0.030
Postgraduate degree - High school == 0           0.036966   0.130677   0.283
Undergraduate degree - Certificate/Diploma == 0  0.017227   0.154494   0.112
Postgraduate degree - Certificate/Diploma == 0   0.057999   0.157085   0.369
Postgraduate degree - Undergraduate degree == 0  0.040772   0.134061   0.304
                                                Pr(>|t|)
Certificate/Diploma - High school == 0             0.999
Undergraduate degree - High school == 0            1.000
Postgraduate degree - High school == 0             0.992
Undergraduate degree - Certificate/Diploma == 0    0.999
Postgraduate degree - Certificate/Diploma == 0     0.983
Postgraduate degree - Undergraduate degree == 0    0.990
(Adjusted p values reported -- single-step method)

Summary

Table 1 summarises the testing results. Variables are listed with respective p-values and outcomes from the t-test, one-way ANOVA and two-way ANOVA.

Table 1 Summary of categorical and continuous variable hypothesis testing
test	variables	p_value	H0
T-Test	gender and emotion	0.6361	fail to reject
One-way ANOVA	irrational ideas and job role	1.31e-06	reject
Two-way ANOVA	irrational ideas, job role and education	0.939	fail to reject
Significance level: p < 0.05

References:

Emotion was measured using ‘A semantic differential mood scale’ by Lorr and Wunderlich, published in the Journal of Clinical Psychology.
Irrational ideas were measured using the ‘Irrational belief scale’ developed by Malouff and Schutte, published in the Sourcebook of Adult Assessment Strategies, based on Ellis and Harper’s work, published in A New Guide to Rational Living.

Session information and package update

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.0 (2024-04-24 ucrt)
##  os       Windows 11 x64 (build 22631)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  English_Australia.utf8
##  ctype    English_Australia.utf8
##  tz       Australia/Brisbane
##  date     2024-07-29
##  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package           * version  date (UTC) lib source
##  abind               1.4-5    2016-07-21 [1] CRAN (R 4.4.0)
##  askpass             1.2.0    2023-09-03 [1] CRAN (R 4.4.0)
##  backports           1.5.0    2024-05-23 [1] CRAN (R 4.4.0)
##  bayestestR          0.14.0   2024-07-24 [1] CRAN (R 4.4.1)
##  broom               1.0.6    2024-05-17 [1] CRAN (R 4.4.0)
##  bslib               0.7.0    2024-03-29 [1] CRAN (R 4.4.0)
##  cachem              1.1.0    2024-05-16 [1] CRAN (R 4.4.0)
##  car               * 3.1-2    2023-03-30 [1] CRAN (R 4.4.0)
##  carData           * 3.0-5    2022-01-06 [1] CRAN (R 4.4.0)
##  cli                 3.6.3    2024-06-21 [1] CRAN (R 4.4.1)
##  coda                0.19-4.1 2024-01-31 [1] CRAN (R 4.4.0)
##  codetools           0.2-20   2024-03-31 [2] CRAN (R 4.4.0)
##  colorspace          2.1-0    2023-01-23 [1] CRAN (R 4.4.1)
##  crayon              1.5.3    2024-06-20 [1] CRAN (R 4.4.1)
##  crul                1.5.0    2024-07-19 [1] CRAN (R 4.4.1)
##  curl                5.2.1    2024-03-01 [1] CRAN (R 4.4.0)
##  data.table        * 1.15.4   2024-03-30 [1] CRAN (R 4.4.0)
##  datawizard          0.12.2   2024-07-21 [1] CRAN (R 4.4.1)
##  devtools            2.4.5    2022-10-11 [1] CRAN (R 4.4.0)
##  digest              0.6.36   2024-06-23 [1] CRAN (R 4.4.1)
##  dplyr             * 1.1.4    2023-11-17 [1] CRAN (R 4.4.0)
##  effectsize        * 0.8.9    2024-07-03 [1] CRAN (R 4.4.1)
##  ellipsis            0.3.2    2021-04-29 [1] CRAN (R 4.4.0)
##  emmeans             1.10.3   2024-07-01 [1] CRAN (R 4.4.1)
##  estimability        1.5.1    2024-05-12 [1] CRAN (R 4.4.0)
##  evaluate            0.24.0   2024-06-10 [1] CRAN (R 4.4.0)
##  fansi               1.0.6    2023-12-08 [1] CRAN (R 4.4.0)
##  farver              2.1.2    2024-05-13 [1] CRAN (R 4.4.0)
##  fastmap             1.2.0    2024-05-15 [1] CRAN (R 4.4.0)
##  flextable         * 0.9.6    2024-05-05 [1] CRAN (R 4.4.0)
##  fontBitstreamVera   0.1.1    2017-02-01 [1] CRAN (R 4.4.0)
##  fontLiberation      0.1.0    2016-10-15 [1] CRAN (R 4.4.0)
##  fontquiver          0.2.1    2017-02-01 [1] CRAN (R 4.4.0)
##  forcats           * 1.0.0    2023-01-29 [1] CRAN (R 4.4.0)
##  fs                  1.6.4    2024-04-25 [1] CRAN (R 4.4.0)
##  gdtools             0.3.7    2024-03-05 [1] CRAN (R 4.4.0)
##  generics            0.1.3    2022-07-05 [1] CRAN (R 4.4.0)
##  gfonts              0.2.0    2023-01-08 [1] CRAN (R 4.4.0)
##  GGally              2.2.1    2024-02-14 [1] CRAN (R 4.4.0)
##  ggplot2           * 3.5.1    2024-04-23 [1] CRAN (R 4.4.0)
##  ggpubr            * 0.6.0    2023-02-10 [1] CRAN (R 4.4.0)
##  ggsignif            0.6.4    2022-10-13 [1] CRAN (R 4.4.0)
##  ggstats             0.6.0    2024-04-05 [1] CRAN (R 4.4.0)
##  glue                1.7.0    2024-01-09 [1] CRAN (R 4.4.0)
##  gridExtra           2.3      2017-09-09 [1] CRAN (R 4.4.0)
##  gtable              0.3.5    2024-04-22 [1] CRAN (R 4.4.0)
##  here              * 1.0.1    2020-12-13 [1] CRAN (R 4.4.0)
##  highr               0.11     2024-05-26 [1] CRAN (R 4.4.0)
##  hms                 1.1.3    2023-03-21 [1] CRAN (R 4.4.0)
##  htmltools           0.5.8.1  2024-04-04 [1] CRAN (R 4.4.0)
##  htmlwidgets         1.6.4    2023-12-06 [1] CRAN (R 4.4.0)
##  httpcode            0.3.0    2020-04-10 [1] CRAN (R 4.4.0)
##  httpuv              1.6.15   2024-03-26 [1] CRAN (R 4.4.0)
##  insight             0.20.2   2024-07-13 [1] CRAN (R 4.4.0)
##  ISLR                1.4      2021-09-15 [1] CRAN (R 4.4.0)
##  jquerylib           0.1.4    2021-04-26 [1] CRAN (R 4.4.0)
##  jsonlite            1.8.8    2023-12-04 [1] CRAN (R 4.4.0)
##  knitr               1.48     2024-07-07 [1] CRAN (R 4.4.1)
##  labeling            0.4.3    2023-08-29 [1] CRAN (R 4.4.0)
##  later               1.3.2    2023-12-06 [1] CRAN (R 4.4.0)
##  lattice             0.22-6   2024-03-20 [2] CRAN (R 4.4.0)
##  lifecycle           1.0.4    2023-11-07 [1] CRAN (R 4.4.0)
##  lpSolve             5.6.20   2023-12-10 [1] CRAN (R 4.4.0)
##  lubridate         * 1.9.3    2023-09-27 [1] CRAN (R 4.4.0)
##  magrittr            2.0.3    2022-03-30 [1] CRAN (R 4.4.0)
##  MASS              * 7.3-60.2 2024-04-24 [2] local
##  Matrix              1.7-0    2024-03-22 [2] CRAN (R 4.4.0)
##  memoise             2.0.1    2021-11-26 [1] CRAN (R 4.4.0)
##  mime                0.12     2021-09-28 [1] CRAN (R 4.4.0)
##  miniUI              0.1.1.1  2018-05-18 [1] CRAN (R 4.4.0)
##  multcomp          * 1.4-26   2024-07-18 [1] CRAN (R 4.4.1)
##  munsell             0.5.1    2024-04-01 [1] CRAN (R 4.4.0)
##  mvtnorm           * 1.2-5    2024-05-21 [1] CRAN (R 4.4.0)
##  officer             0.6.6    2024-05-05 [1] CRAN (R 4.4.0)
##  openssl             2.2.0    2024-05-16 [1] CRAN (R 4.4.0)
##  parameters          0.22.1   2024-07-21 [1] CRAN (R 4.4.1)
##  pillar              1.9.0    2023-03-22 [1] CRAN (R 4.4.0)
##  pkgbuild            1.4.4    2024-03-17 [1] CRAN (R 4.4.0)
##  pkgconfig           2.0.3    2019-09-22 [1] CRAN (R 4.4.0)
##  pkgload             1.4.0    2024-06-28 [1] CRAN (R 4.4.1)
##  plyr                1.8.9    2023-10-02 [1] CRAN (R 4.4.0)
##  profvis             0.3.8    2023-05-02 [1] CRAN (R 4.4.0)
##  promises            1.3.0    2024-04-05 [1] CRAN (R 4.4.0)
##  purrr             * 1.0.2    2023-08-10 [1] CRAN (R 4.4.0)
##  R6                  2.5.1    2021-08-19 [1] CRAN (R 4.4.0)
##  ragg                1.3.2    2024-05-15 [1] CRAN (R 4.4.0)
##  RColorBrewer        1.1-3    2022-04-03 [1] CRAN (R 4.4.0)
##  Rcpp                1.0.13   2024-07-17 [1] CRAN (R 4.4.1)
##  readr             * 2.1.5    2024-01-10 [1] CRAN (R 4.4.0)
##  remotes             2.5.0    2024-03-17 [1] CRAN (R 4.4.0)
##  rlang               1.1.4    2024-06-04 [1] CRAN (R 4.4.0)
##  rmarkdown           2.27     2024-05-17 [1] CRAN (R 4.4.0)
##  rprojroot           2.0.4    2023-11-05 [1] CRAN (R 4.4.0)
##  rstatix           * 0.7.2    2023-02-01 [1] CRAN (R 4.4.0)
##  rstudioapi          0.16.0   2024-03-24 [1] CRAN (R 4.4.0)
##  sampling            2.10     2023-10-29 [1] CRAN (R 4.4.0)
##  sandwich            3.1-0    2023-12-11 [1] CRAN (R 4.4.0)
##  sass                0.4.9    2024-03-15 [1] CRAN (R 4.4.0)
##  scales              1.3.0    2023-11-28 [1] CRAN (R 4.4.0)
##  sessioninfo         1.2.2    2021-12-06 [1] CRAN (R 4.4.0)
##  shiny               1.8.1.1  2024-04-02 [1] CRAN (R 4.4.0)
##  SmartEDA          * 0.3.10   2024-01-30 [1] CRAN (R 4.4.0)
##  stringi             1.8.4    2024-05-06 [1] CRAN (R 4.4.0)
##  stringr           * 1.5.1    2023-11-14 [1] CRAN (R 4.4.0)
##  survival          * 3.5-8    2024-02-14 [2] CRAN (R 4.4.0)
##  systemfonts         1.1.0    2024-05-15 [1] CRAN (R 4.4.0)
##  textshaping         0.4.0    2024-05-24 [1] CRAN (R 4.4.0)
##  TH.data           * 1.1-2    2023-04-17 [1] CRAN (R 4.4.0)
##  tibble            * 3.2.1    2023-03-20 [1] CRAN (R 4.4.0)
##  tidyr             * 1.3.1    2024-01-24 [1] CRAN (R 4.4.0)
##  tidyselect          1.2.1    2024-03-11 [1] CRAN (R 4.4.0)
##  tidyverse         * 2.0.0    2023-02-22 [1] CRAN (R 4.4.0)
##  timechange          0.3.0    2024-01-18 [1] CRAN (R 4.4.0)
##  tzdb                0.4.0    2023-05-12 [1] CRAN (R 4.4.0)
##  urlchecker          1.0.1    2021-11-30 [1] CRAN (R 4.4.0)
##  usethis             2.2.3    2024-02-19 [1] CRAN (R 4.4.0)
##  utf8                1.2.4    2023-10-22 [1] CRAN (R 4.4.0)
##  uuid                1.2-0    2024-01-14 [1] CRAN (R 4.4.0)
##  vctrs               0.6.5    2023-12-01 [1] CRAN (R 4.4.0)
##  withr               3.0.0    2024-01-16 [1] CRAN (R 4.4.0)
##  xfun                0.46     2024-07-18 [1] CRAN (R 4.4.1)
##  xml2                1.3.6    2023-12-04 [1] CRAN (R 4.4.0)
##  xtable              1.8-4    2019-04-21 [1] CRAN (R 4.4.0)
##  yaml                2.3.9    2024-07-05 [1] CRAN (R 4.4.1)
##  zip                 2.3.1    2024-01-27 [1] CRAN (R 4.4.0)
##  zoo                 1.8-12   2023-04-13 [1] CRAN (R 4.4.0)
## 
##  [1] C:/Users/wayne/AppData/Local/R/win-library/4.4
##  [2] C:/Program Files/R/R-4.4.0/library
## 
## ──────────────────────────────────────────────────────────────────────────────