generated from statOmics/Rmd-website
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path05-statisticalInference-twosampleT.Rmd
256 lines (153 loc) · 9.92 KB
/
05-statisticalInference-twosampleT.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
---
title: "5. Statistical Inference: Two-sample t-test"
author: "Lieven Clement"
date: "statOmics, Ghent University (https://statomics.github.io)"
---
<a rel="license" href="https://creativecommons.org/licenses/by-nc-sa/4.0"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a>
```{r setup, include=FALSE, cache=FALSE}
knitr::opts_chunk$set(
include = TRUE, comment = NA, echo = TRUE,
message = FALSE, warning = FALSE, cache = TRUE
)
library(tidyverse)
```
# Smelly armpit example
- Smelly armpits are not caused by sweat, itself. The smell is caused by specific micro-organisms belonging to the group of *Corynebacterium spp.* that metabolise sweat.
Another group of abundant bacteria are the *Staphylococcus spp.*, these bacteria do not metabolise sweat in smelly compounds.
- The CMET-group at Ghent University does research to on transplanting the armpit microbiome to save people with smelly armpits.
- Proposed Therapy:
1. Remove armpit-microbiome with antibiotics
2. Influence armpit microbiome with microbial transplant (https://youtu.be/9RIFyqLXdVw)
- Experiment:
- 20 subjects with smelly armpits are attributed to one of two treatment groups
- placebo (only antibiotics)
- transplant (antibiotica followed by microbial transplant).
- The microbiome is sampled 6 weeks upon the treatment
- The relative abundance of *Staphylococcus spp.* on *Corynebacterium spp.* + *Staphylococcus spp.* in the microbiome is measured via DGGE (*Denaturing Gradient Gel Electrophoresis*).
---
## Import the data
```{r}
ap <- read_csv("https://raw.githubusercontent.com/GTPB/PSLS20/master/data/armpit.csv")
ap
```
## Data exploration
We plot the direct relative abundances in function of the treatment group. With the ggplot2 library we can easily build plots by adding layers.
```{r}
ap %>% ggplot(aes(x = trt, y = rel)) +
geom_boxplot(outlier.shape = NA) +
geom_point(position = "jitter")
ap %>% ggplot(aes(sample = rel)) +
geom_qq() +
geom_qq_line() +
facet_wrap(~trt)
```
---
# Two sample T-test
## Notation
Suppose that $Y_{ij}$ is the response for subjects $i=1,\ldots, n_j$ from population $j=1,2$.
Use of the term **treatment** or **group** instead of population
Here the treatment is $j=1$ microbial transplant vs $j=2$ placebo.
We assume
$$Y_{ij}\text{ i.i.d. } N(\mu_j,\sigma^2)\;\;\;i=1,\ldots,n_i\;j=1,2.$$
Note, that we assume equal variances **homoscedastic**
(Unequal variances are referred to as **heteroscedastic**)
---
## Hypotheses
Test $$ H_0: \mu_1 = \mu_2 $$
against
$$ H_1: \mu_1 \neq \mu_2 .$$
$H_1$ is again the research hypothesis: the average relative abundance of *Staphylococcus spp.* is different upon microbial transplant then upon placebo treatment.
$H_0$ and $H_1$ can also be specified in terms of the effect size between the two treatments, $\mu_1-\mu_2$
$$H_0: \mu_1-\mu_2 = 0,$$
$$H_1: \mu_1-\mu_2 \neq 0.$$
We can estimate the effect size using the difference in sample means:
$$\hat \mu_1-\hat \mu_2=\bar Y_1 -\bar Y_2.$$
---
## Variance estimator
The experimental units are independent so the sample means are also independent and the variance on the difference is
$$\text{Var}_{\bar Y_1 -\bar Y_2}=\frac{\sigma^2}{n_1}+\frac{\sigma^2}{n_2}=\sigma^2 \left(\frac{1}{n_1}+\frac{1}{n_2}\right).$$
And the standard error becomes
$$\sigma_{\bar Y_1 -\bar Y_2}=\sigma\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}.$$
The variance can be estimated within each group using the sample variance:
$$S_1^2 = \frac{1}{n_1-1}\sum_{i=1}^{n_1} (Y_{i1}-\bar{Y}_1)^2.$$
$$S_2^2 = \frac{1}{n_2-1}\sum_{i=1}^{n_2} (Y_{i2}-\bar{Y}_2)^2.$$
But, if we assume equal variances $\sigma_1^2=\sigma_2^2=\sigma^2$ than we can estimate the variance more precise by using all observations in both groups. This variance estimator is also referred to as the *pooled variance estimator: $S^2_p$*.
So $S_1^2$ en $S_2^2$ are estimators of the same parameter $\sigma^2$.
And we can combine them into one estimator based on all $n_1+n_2$ observations:
$$ S_p^2 = \frac{n_1-1}{n_1+n_2-2} S_1^2 + \frac{n_2-1}{n_1+n_2-2} S_2^2 = \frac{1}{n_1+n_2-2}\sum_{j=1}^2\sum_{i=1}^{n_j} (Y_{ij} - \bar{Y}_j)^2.$$
$$ S_p^2= \sum\limits_{j=1}^2\sum\limits_{i=1}^{n_j} \frac{(Y_{ij}-\bar{Y}_{.j})^2}{n_1+n_2-2}$$
The pooled variance estimator uses the squared deviations of the observations from their group mean and has $n_1+n_2-2$ degrees of freedom.
---
## Test statistic
Two-sample $t$-teststatistiek:
$$T = \frac{\bar{Y}_1-\bar{Y}_2}{\sqrt{\frac{S_p^2}{n_1}+\frac{S_p^2}{n_2}}} =
\frac{\bar{Y}_1 - \bar{Y}_2}{S_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}.$$
The statistic T follows a t-distribution with $n_1+n_2-2$ under $H_0$ is all data are independent, normally distributed and have equal variances.
---
## Armpit example
We can implement the test in R:
```{r}
t.test(rel ~ trt, data = ap, var.equal = TRUE)
```
On the $5\%$ significance level we reject the null hypothesis in favor of the alternative hypothesis and conclude that the relative abundance of *Staphylococcus spp.* is on average extreme significant larger is in transplantation group than in the placebo group.
If there is no effect of the transplant we have a probability of less then 9 in $100000$ to observe a test statistic in a random sample that is at least as extreme as what we observed in the armpit experiment.
This is extremely rare under $H_0$.
If $H_1$ is correct, we expect that the test statistic is larger in absolute value and expect small p-values. Hence we decide that there is a lot of evidence against $H_0$ in favour of $H_1$.
**Good statistical practice** is to report the $p$-value, but also effect size along with its confidence interval. So that we can judge the statistical significance and the biological relevance.
### Conclusion
On average the relative abundance of *Staphylococcus spp.* in the microbiome of the armpit in the transplant group is extremely significantly different from that in the placebo group ($p<<0.001$). The relative abundance of *Staphylococcus spp.* is on average `r round(diff(t.test(rel~trt,data=ap,var.equal=TRUE)$estimate),1)`% larger in the transplant group than in the placebo group (95\% CI [`r paste(format(-t.test(rel~trt,data=ap,var.equal=TRUE)$conf.int[2:1],digits=2,nsmall=1),collapse=",")`]%).
---
# Assumptions
Validity of t-test depends on distributional assumptions:
- Independence (design)
- One-sample t-test: normality of the observations
- Paired t-test: normality of the difference
- Two-sample t-test: Normality of the observations in both groups, and equal variances.
If the assumptions are not met, the null distribution does not follow a t-distribution, and, the p-values and critical values are incorrect.
To construct confidence intervals we also rely on these assumptions.
- We used quantiles from the t-distribution to calculate the lower and upper limit.
- The correct coverage of the CI depends on these assumptions
---
## Evaluate normality
- Boxplots and histograms: shape of distribution and outliers
- QQ-plots
There also exist hypothesis tests (goodness-of-fit test), but their null hypothesis is that the data are normally distributed so we make a weak conclusion!
- Kolmogorov-Smirnov, Shapiro-Wilk en Anderson-Darling.
- In small samples they have a low power
- In large samples they often flag very small deviations as significant
Recommendation
- Start with graphical exploration of the data and keep the sample size in mind to avoid overinterpretation of the plots.
- If you have doubts, use simulation where you simulate data with the same sample size from a Normal distribution with the same mean and variance as the one that you observed in the sample
- If you observed deviations of normality check in the literature how sensitive your method is such deviations of normality. (e.g. T-tests for instance are rather insensitive to deviations as long as the distribution of the data is symmetric.)
- In large samples you can resort to the central limit theorem.
- You might resort to transformations of the response.
---
## Homoscedasticity
- Boxplots: The box size is the inter quartile range (IQR) a robust estimator of the variance.
- If the differences are not large $\rightarrow$ homoscedasticiteit
- Again you can use simulation to get insight in the differences you can expect.
- Formal F-test can be used to compare the variances, but again under the null you assume equal variances, so the same criticism as for normality tests applies here.
---
## Welch modified t-test
If the data are heteroscedastic, you can use a Welch two-sample T-test, which no longer uses the pooled variance estimator.
$$T = \frac{\bar{Y}_1 - \bar{Y}_2}{\sqrt{\frac{S^2_1}{n_1}+\frac{S^2_2}{n_2}}}$$
with $S^2_1$ en $S^2_2$ the sample variances in both groups.
This statistic follows approximately a t-distribution with a number of degrees of freedom between $\text{min}(n_1-1,n_2-1)$ and $n_1+n_2-2$.
In R the degrees of freedom are estimated using the Welch- Satterthwaite approximation. You can do this by using the `t.test` function with argument `var.equal=FALSE`.
```{r}
t.test(rel ~ trt, data = ap, var.equal = FALSE)
```
Note that you can see that the Welch T-test is adopted in the title. The adjusted degrees of freedom are $df = 17.876$ $\pm$ to that of the conventional T-test, because the variances are approximately equal.
---
# How to report?
- In the scientific literature there is too much attention for p-values
- It is much more informative to combine an estimate with its confidence interval.
**Rule of thumb**:
Report an estimate together with its confidence interval (and its p-value)
1. The result of the test can be derived of the confidence interval
2. It allows the reader to judge **scientific relevance**.
```{r}
t.test(rel ~ trt, data = ap)
```
The result of an $\alpha$-level t-test is equivalent with comparing the effect size under $H_0$ with the $1-\alpha$ CI.
An effect can be extremely statistically significant, but scientifically irrelevant. With a CI you will spot this.