Hypothesis Testing - Categorical and Numerical Variables
According to Wikipedia, statistical hypothesis testing is a method of statistical inference used to determine whether the data sufficiently supports a particular hypothesis.
This vignette presents three examples of hypothesis testing between categorical and continuous variables, implementing t-test and ANOVA where appropriate. The dataset comprised 616 respondents from 10 public and private sector organisations.
The following hypothesis tests examine the relationship between:
Tidied the raw dataset and derived hypotheses for testing. The null hypothesis (H0) implies the variables are independent, while the alternative hypothesis (Ha) implies the variables are associated or related. A p-value of <0.05, the general convention, was used as the cut-off to assess statistical significance.
The following table, originally sourced from the Journal of Clinical and Preventive Cardiology, provides a straightforward interpretation of p-values for testing.
Before implementing each test, conducted a brief exploratory analysis to understand the variables. Hypothesis tests are built on several assumptions, including normality, with results provided. Prepared visualisations highlighting and comparing differences in the mean. The appropriate hypothesis test was then applied. Performed a post hoc analysis to verify the findings. A table at the end of the vignette summarises testing results.
Test 1 examines the relationship between a two-level categorical variable (gender) and a continuous variable (emotion).
Hypothesis
H0: Gender and emotion towards organisational change
are independent; there is no association between these two
variables.
Ha: Gender and emotion towards organisational change
are dependent; a relationship exists between these variables.
Before testing the hypothesis, conducted a brief exploratory analysis of the variables. Chart 1 visualises the distribution of the categorical variable gender.
Chart 2 shows the distribution of the continuous variable emotion towards organisational change.
Chart 3 checks if emotion meets the assumption of normality. There is a correlation between survey data and the normal distribution. Survey data in Chart 3 generally falls in a line within or is close to the confidence interval level of 0.95.
Chart 4 compares the distribution of emotion based on gender. The notched box plots suggest similar distributions of emotion by gender.
The t-test calculates if there is a statistically significant difference in the mean of emotion based on gender. The Welch Two Sample t-test, shown in Output 1, calculates a p-value of 0.6361, above the significance level of 0.05. Therefore, fail to reject the null hypothesis, as there is no evidence of an association between emotion towards organisational change and gender.
Output 1
Welch Two Sample t-test
data: emotion by gender
t = -0.47343, df = 575.92, p-value = 0.6361
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
-0.2095663 0.1281593
sample estimates:
mean in group Female mean in group Male
4.095161 4.135865
To verify this result, Cohen’s d was calculated. Cohen’s d computes the effect size of the standardised difference between two means. The results summarised in Output 2 show effect size for emotion by gender is “very small”.
Output 2
Cohen's d | 95% CI
-------------------------
-0.04 | [-0.20, 0.12]
- Estimated using pooled SD.
[1] "very small"
(Rules: cohen1988)
This test examines the relationship between a categorical variable (with three or more levels - job role) and a continuous variable (irrational ideas).
Hypothesis
H0: Job role and level of irrational ideas are
independent; there is no association between these two variables.
Ha: Job role and level of irrational ideas are
dependent; a relationship exists between these two variables.
Before testing the hypothesis, conducted a brief exploratory analysis of the variables. Chart 5 visualises the distribution of the categorical variable job role.
Chart 6 visualises the distribution of the continuous variable irrational ideas.
Charts 7 and 8 test the normality of the residuals. Both charts show that the residuals generally follow a normal distribution, thereby satisfying the normality assumption for one-way ANOVA hypothesis testing.
Chart 9 compares the distribution of irrational ideas based on job role.
Chart 10 highlights differences in mean and mean standard error for irrational ideas across job roles.
The one-way ANOVA test calculates if there is a statistically significant difference in the mean of irrational ideas based on job role. Output 3 shows a Pr(>F) value of 1.31e-06, which is less than the significance level of 0.05. Based on this result, reject the null hypothesis, concluding there is a statistically significant difference between the level of irrational thinking and job role.
Output 3
Df Sum Sq Mean Sq F value Pr(>F)
job_role 3 17.4 5.785 10.28 1.31e-06 ***
Residuals 607 341.7 0.563
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To delve deeper into this finding, carried out post hoc testing using multiple pairwise comparisons of means. The fitted one-way ANOVA was passed to the Tukey HSD test, summarising the results in Output 4. The results show that irrational ideas between Executive/Senior Management and Employee, and Executive/Senior Management and Supervisor, are the most statistically significant. The comparison of Middle Management and Employee and Executive/Senior Management and Middle Management was also significant, with a Pr(>|t|) value less than the cut-off of 0.05.
Output 4
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: aov(formula = irrational_ideas ~ job_role, data = role_ii_df2)
Linear Hypotheses:
Estimate Std. Error
Supervisor - Employee == 0 -0.02819 0.08770
Middle management - Employee == 0 -0.22894 0.07889
Executive/Senior management - Employee == 0 -0.63243 0.12508
Middle management - Supervisor == 0 -0.20075 0.10378
Executive/Senior management - Supervisor == 0 -0.60424 0.14209
Executive/Senior management - Middle management == 0 -0.40349 0.13683
t value Pr(>|t|)
Supervisor - Employee == 0 -0.321 0.9879
Middle management - Employee == 0 -2.902 0.0184 *
Executive/Senior management - Employee == 0 -5.056 <0.001 ***
Middle management - Supervisor == 0 -1.934 0.2045
Executive/Senior management - Supervisor == 0 -4.252 <0.001 ***
Executive/Senior management - Middle management == 0 -2.949 0.0162 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
Finally, the practical significance of the effect size of this one-way ANOVA between irrational ideas and job role was assessed as small in Output 5.
Output 5
# Effect Size for ANOVA
Parameter | Eta2 | 95% CI
-------------------------------
job_role | 0.05 | [0.00, 0.08]
- One-sided CIs: lower bound fixed at [0.00].
[1] "small"
(Rules: cohen1992)
This test builds upon Test 2 above and examines the relationship between two categorical variables (with three or more levels - job role and education) and a continuous variable (irrational ideas).
Hypothesis
H0: Job role and education are independent; there is
no interaction between these variables on irrational ideas.
Ha: Job role and education are dependent;
interaction between these variables produces a synergistic effect on
irrational ideas.
A brief exploratory analysis of job role and irrational ideas was provided in Test 2. A brief overview of the categorical variable education follows.
Chart 11 visualises the distribution of the categorical variable education.
Charts 12 and 13 show that the residuals generally follow a normal distribution, satisfying the normality assumption for two-way ANOVA hypothesis testing.
Chart 14 compares irrational ideas based on job role and education.
Chart 15 highlights differences in mean and mean standard error across job roles and education.
The two-way ANOVA test calculates if there is a statistically significant difference in the mean of irrational ideas based on job role and education. The vignette on categorical hypothesis testing showed a statistically significant relationship between the categorical variables job role and education. Therefore, the results of two-way ANOVA in Output 6 were derived from a multiplicative rather than an additive model.
Output 6
Df Sum Sq Mean Sq F value Pr(>F)
job_role 3 17.1 5.708 10.039 1.85e-06 ***
education 3 0.1 0.043 0.076 0.973
job_role:education 9 2.0 0.223 0.392 0.939
Residuals 589 334.9 0.569
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The above output shows, like in Test 2 of this vignette, a statistically significant relationship exists between job role and irrational ideas with a Pr(>F) value of 1.85e-06. However, the Pr(>F) value of 0.973 for education and 0.939 for job role:education were not statistically significant and did not provide a synergistic effect on irrational ideas. Based on these results, fail to reject the null hypothesis, concluding there is no interaction between job role and education on the level of irrational ideas.
To explore this result further, carried out post hoc testing using multiple pairwise comparisons of means. The fitted two-way ANOVA was passed to the Tukey HSD test. The results shown in Output 7 confirm that all pairwise comparisons of education are not statistically significant.
Output 7
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: aov(formula = irrational_ideas ~ job_role * education, data = role_ed_ii_df2)
Linear Hypotheses:
Estimate Std. Error t value
Certificate/Diploma - High school == 0 -0.021033 0.151567 -0.139
Undergraduate degree - High school == 0 -0.003806 0.127550 -0.030
Postgraduate degree - High school == 0 0.036966 0.130677 0.283
Undergraduate degree - Certificate/Diploma == 0 0.017227 0.154494 0.112
Postgraduate degree - Certificate/Diploma == 0 0.057999 0.157085 0.369
Postgraduate degree - Undergraduate degree == 0 0.040772 0.134061 0.304
Pr(>|t|)
Certificate/Diploma - High school == 0 0.999
Undergraduate degree - High school == 0 1.000
Postgraduate degree - High school == 0 0.992
Undergraduate degree - Certificate/Diploma == 0 0.999
Postgraduate degree - Certificate/Diploma == 0 0.983
Postgraduate degree - Undergraduate degree == 0 0.990
(Adjusted p values reported -- single-step method)
Table 1 summarises the testing results. Variables are listed with respective p-values and outcomes from the t-test, one-way ANOVA and two-way ANOVA.
Table 1 Summary of categorical and continuous variable hypothesis testing | |||
---|---|---|---|
test | variables | p_value | H0 |
T-Test | gender and emotion | 0.6361 | fail to reject |
One-way ANOVA | irrational ideas and job role | 1.31e-06 | reject |
Two-way ANOVA | irrational ideas, job role and education | 0.939 | fail to reject |
Significance level: p < 0.05 |
References:
Emotion was measured using ‘A semantic differential mood scale’ by
Lorr and Wunderlich, published in the Journal of Clinical
Psychology.
Irrational ideas were measured using the ‘Irrational belief scale’
developed by Malouff and Schutte, published in the Sourcebook of
Adult Assessment Strategies, based on Ellis and Harper’s work,
published in A New Guide to Rational Living.
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.4.0 (2024-04-24 ucrt)
## os Windows 11 x64 (build 22631)
## system x86_64, mingw32
## ui RTerm
## language (EN)
## collate English_Australia.utf8
## ctype English_Australia.utf8
## tz Australia/Brisbane
## date 2024-07-29
## pandoc 3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## abind 1.4-5 2016-07-21 [1] CRAN (R 4.4.0)
## askpass 1.2.0 2023-09-03 [1] CRAN (R 4.4.0)
## backports 1.5.0 2024-05-23 [1] CRAN (R 4.4.0)
## bayestestR 0.14.0 2024-07-24 [1] CRAN (R 4.4.1)
## broom 1.0.6 2024-05-17 [1] CRAN (R 4.4.0)
## bslib 0.7.0 2024-03-29 [1] CRAN (R 4.4.0)
## cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.0)
## car * 3.1-2 2023-03-30 [1] CRAN (R 4.4.0)
## carData * 3.0-5 2022-01-06 [1] CRAN (R 4.4.0)
## cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.1)
## coda 0.19-4.1 2024-01-31 [1] CRAN (R 4.4.0)
## codetools 0.2-20 2024-03-31 [2] CRAN (R 4.4.0)
## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.4.1)
## crayon 1.5.3 2024-06-20 [1] CRAN (R 4.4.1)
## crul 1.5.0 2024-07-19 [1] CRAN (R 4.4.1)
## curl 5.2.1 2024-03-01 [1] CRAN (R 4.4.0)
## data.table * 1.15.4 2024-03-30 [1] CRAN (R 4.4.0)
## datawizard 0.12.2 2024-07-21 [1] CRAN (R 4.4.1)
## devtools 2.4.5 2022-10-11 [1] CRAN (R 4.4.0)
## digest 0.6.36 2024-06-23 [1] CRAN (R 4.4.1)
## dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.4.0)
## effectsize * 0.8.9 2024-07-03 [1] CRAN (R 4.4.1)
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.0)
## emmeans 1.10.3 2024-07-01 [1] CRAN (R 4.4.1)
## estimability 1.5.1 2024-05-12 [1] CRAN (R 4.4.0)
## evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.4.0)
## fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.0)
## farver 2.1.2 2024-05-13 [1] CRAN (R 4.4.0)
## fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
## flextable * 0.9.6 2024-05-05 [1] CRAN (R 4.4.0)
## fontBitstreamVera 0.1.1 2017-02-01 [1] CRAN (R 4.4.0)
## fontLiberation 0.1.0 2016-10-15 [1] CRAN (R 4.4.0)
## fontquiver 0.2.1 2017-02-01 [1] CRAN (R 4.4.0)
## forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.4.0)
## fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0)
## gdtools 0.3.7 2024-03-05 [1] CRAN (R 4.4.0)
## generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.0)
## gfonts 0.2.0 2023-01-08 [1] CRAN (R 4.4.0)
## GGally 2.2.1 2024-02-14 [1] CRAN (R 4.4.0)
## ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.4.0)
## ggpubr * 0.6.0 2023-02-10 [1] CRAN (R 4.4.0)
## ggsignif 0.6.4 2022-10-13 [1] CRAN (R 4.4.0)
## ggstats 0.6.0 2024-04-05 [1] CRAN (R 4.4.0)
## glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0)
## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.4.0)
## gtable 0.3.5 2024-04-22 [1] CRAN (R 4.4.0)
## here * 1.0.1 2020-12-13 [1] CRAN (R 4.4.0)
## highr 0.11 2024-05-26 [1] CRAN (R 4.4.0)
## hms 1.1.3 2023-03-21 [1] CRAN (R 4.4.0)
## htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
## htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.0)
## httpcode 0.3.0 2020-04-10 [1] CRAN (R 4.4.0)
## httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.4.0)
## insight 0.20.2 2024-07-13 [1] CRAN (R 4.4.0)
## ISLR 1.4 2021-09-15 [1] CRAN (R 4.4.0)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.4.0)
## jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.4.0)
## knitr 1.48 2024-07-07 [1] CRAN (R 4.4.1)
## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.4.0)
## later 1.3.2 2023-12-06 [1] CRAN (R 4.4.0)
## lattice 0.22-6 2024-03-20 [2] CRAN (R 4.4.0)
## lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
## lpSolve 5.6.20 2023-12-10 [1] CRAN (R 4.4.0)
## lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.4.0)
## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0)
## MASS * 7.3-60.2 2024-04-24 [2] local
## Matrix 1.7-0 2024-03-22 [2] CRAN (R 4.4.0)
## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.0)
## mime 0.12 2021-09-28 [1] CRAN (R 4.4.0)
## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0)
## multcomp * 1.4-26 2024-07-18 [1] CRAN (R 4.4.1)
## munsell 0.5.1 2024-04-01 [1] CRAN (R 4.4.0)
## mvtnorm * 1.2-5 2024-05-21 [1] CRAN (R 4.4.0)
## officer 0.6.6 2024-05-05 [1] CRAN (R 4.4.0)
## openssl 2.2.0 2024-05-16 [1] CRAN (R 4.4.0)
## parameters 0.22.1 2024-07-21 [1] CRAN (R 4.4.1)
## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.0)
## pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.4.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0)
## pkgload 1.4.0 2024-06-28 [1] CRAN (R 4.4.1)
## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.4.0)
## profvis 0.3.8 2023-05-02 [1] CRAN (R 4.4.0)
## promises 1.3.0 2024-04-05 [1] CRAN (R 4.4.0)
## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.4.0)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0)
## ragg 1.3.2 2024-05-15 [1] CRAN (R 4.4.0)
## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.4.0)
## Rcpp 1.0.13 2024-07-17 [1] CRAN (R 4.4.1)
## readr * 2.1.5 2024-01-10 [1] CRAN (R 4.4.0)
## remotes 2.5.0 2024-03-17 [1] CRAN (R 4.4.0)
## rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.0)
## rmarkdown 2.27 2024-05-17 [1] CRAN (R 4.4.0)
## rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.4.0)
## rstatix * 0.7.2 2023-02-01 [1] CRAN (R 4.4.0)
## rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0)
## sampling 2.10 2023-10-29 [1] CRAN (R 4.4.0)
## sandwich 3.1-0 2023-12-11 [1] CRAN (R 4.4.0)
## sass 0.4.9 2024-03-15 [1] CRAN (R 4.4.0)
## scales 1.3.0 2023-11-28 [1] CRAN (R 4.4.0)
## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
## shiny 1.8.1.1 2024-04-02 [1] CRAN (R 4.4.0)
## SmartEDA * 0.3.10 2024-01-30 [1] CRAN (R 4.4.0)
## stringi 1.8.4 2024-05-06 [1] CRAN (R 4.4.0)
## stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.4.0)
## survival * 3.5-8 2024-02-14 [2] CRAN (R 4.4.0)
## systemfonts 1.1.0 2024-05-15 [1] CRAN (R 4.4.0)
## textshaping 0.4.0 2024-05-24 [1] CRAN (R 4.4.0)
## TH.data * 1.1-2 2023-04-17 [1] CRAN (R 4.4.0)
## tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.4.0)
## tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.4.0)
## tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0)
## tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.4.0)
## timechange 0.3.0 2024-01-18 [1] CRAN (R 4.4.0)
## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.4.0)
## urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.4.0)
## usethis 2.2.3 2024-02-19 [1] CRAN (R 4.4.0)
## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0)
## uuid 1.2-0 2024-01-14 [1] CRAN (R 4.4.0)
## vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0)
## withr 3.0.0 2024-01-16 [1] CRAN (R 4.4.0)
## xfun 0.46 2024-07-18 [1] CRAN (R 4.4.1)
## xml2 1.3.6 2023-12-04 [1] CRAN (R 4.4.0)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.0)
## yaml 2.3.9 2024-07-05 [1] CRAN (R 4.4.1)
## zip 2.3.1 2024-01-27 [1] CRAN (R 4.4.0)
## zoo 1.8-12 2023-04-13 [1] CRAN (R 4.4.0)
##
## [1] C:/Users/wayne/AppData/Local/R/win-library/4.4
## [2] C:/Program Files/R/R-4.4.0/library
##
## ──────────────────────────────────────────────────────────────────────────────