|
| 1 | +--- |
| 2 | +title: "Correlated test statistics" |
| 3 | +author: "Chenguang Zhang, Yujie Zhao" |
| 4 | +output: |
| 5 | + rmarkdown::html_document: |
| 6 | + toc: true |
| 7 | + toc_float: true |
| 8 | + toc_depth: 2 |
| 9 | + number_sections: true |
| 10 | + highlight: "textmate" |
| 11 | + css: "custom.css" |
| 12 | + code_fold: hide |
| 13 | +vignette: > |
| 14 | + %\VignetteEngine{knitr::rmarkdown} |
| 15 | + %\VignetteIndexEntry{Correlated test statistics} |
| 16 | +bibliography: wpgsd.bib |
| 17 | +--- |
| 18 | + |
| 19 | +The weighted parametric group sequential design (WPGSD) (@anderson2022unified) approach allows one to take advantage of the known correlation structure in constructing efficacy bounds to control family-wise error rate (FWER) for a group sequential design. Here correlation may be due to common observations in nested populations, due to common observations in overlapping populations, or due to common observations in the control arm. |
| 20 | + |
| 21 | +# Methodologies to calculate correlations |
| 22 | + |
| 23 | +Suppose that in a group sequential trial there are $m$ elementary null hypotheses $H_i$, $i \in I={1,...,m}$, and there are $K$ analyses. Let $k$ be the index for the interim analyses and final analyses, $k=1,2,...K$. For any nonempty set $J \subseteq I$, we denote the intersection hypothesis $H_J=\cap_{j \in J}H_j$. We note that $H_I$ is the global null hypothesis. |
| 24 | + |
| 25 | +We assume the plan is for all hypotheses to be tested at each of the $k$ planned analyses if the trial continues to the end for all hypotheses. We further assume that the distribution of the $m \times K$ tests of $m$ individual hypotheses at all $k$ analyses is multivariate normal with a completely known correlation matrix. |
| 26 | + |
| 27 | +Let $Z_{ik}$ be the standardized normal test statistic for hypothesis $i \in I$, analysis $1 \le k \le K$. Let $n_{ik}$ be the number of events collected cumulatively through stage $k$ for hypothesis $i$. Then $n_{i \wedge i',k \wedge k'}$ is the number of events included in both $Z_{ik}$ and $i$, $i' \in I$, $1 \le k$, $k' \le K$. The key of the parametric tests to utilize the correlation among the test statistics. The correlation between $Z_{ik}$ and $Z_{i'k'}$ is |
| 28 | +$$Corr(Z_{ik},Z_{i'k'})=\frac{n_{i \wedge i',k \wedge k'}}{\sqrt{n_{ik}*n_{i'k'}}}$$. |
| 29 | + |
| 30 | +# Examples |
| 31 | + |
| 32 | +We borrow an example from a paper by Anderson et al. (@anderson2022unified), demonstrated in Section 2 - Motivating Examples, we use Example 1 as the basis here. The setting will be: |
| 33 | + |
| 34 | +In a two-arm controlled clinical trial with one primary endpoint, there are three patient populations defined by the status of two biomarkers, A and B: |
| 35 | + |
| 36 | +* Biomarker A positive, the population 1, |
| 37 | +* Biomarker B positive, the population 2, |
| 38 | +* Overall population. |
| 39 | + |
| 40 | +The 3 primary elementary hypotheses are: |
| 41 | + |
| 42 | +* **H1**: the experimental treatment is superior to the control in the population 1 |
| 43 | +* **H2**: the experimental treatment is superior to the control in the population 2 |
| 44 | +* **H3**: the experimental treatment is superior to the control in the overall population |
| 45 | + |
| 46 | +Assume an interim analysis and a final analysis are planned for the study. The number of events are listed as |
| 47 | +```{r,message=FALSE} |
| 48 | +library(dplyr) |
| 49 | +library(tibble) |
| 50 | +library(gt) |
| 51 | +``` |
| 52 | + |
| 53 | +```{r} |
| 54 | +event_tb <- tribble( |
| 55 | + ~Population, ~"Number of Event in IA", ~"Number of Event in FA", |
| 56 | + "Population 1", 100, 200, |
| 57 | + "Population 2", 110, 220, |
| 58 | + "Overlap of Population 1 and 2", 80, 160, |
| 59 | + "Overall Population", 225, 450 |
| 60 | +) |
| 61 | +event_tb %>% |
| 62 | + gt() %>% |
| 63 | + tab_header(title = "Number of events at each population") |
| 64 | +``` |
| 65 | + |
| 66 | +## Correlation of different populations within the same analysis |
| 67 | +Let's consider a simple situation, we want to compare the population 1 and population 2 in only interim analyses. Then $k=1$, and to compare $H_{1}$ and $H_{2}$, the $i$ will be $i=1$ and $i=2$. |
| 68 | +The correlation matrix will be |
| 69 | +$$Corr(Z_{11},Z_{21})=\frac{n_{1 \wedge 2,1 \wedge 1}}{\sqrt{n_{11}*n_{21}}}$$ |
| 70 | +The number of events are listed as |
| 71 | +```{r} |
| 72 | +event_tbl <- tribble( |
| 73 | + ~Population, ~"Number of Event in IA", |
| 74 | + "Population 1", 100, |
| 75 | + "Population 2", 110, |
| 76 | + "Overlap in population 1 and 2", 80 |
| 77 | +) |
| 78 | +event_tbl %>% |
| 79 | + gt() %>% |
| 80 | + tab_header(title = "Number of events at each population in example 1") |
| 81 | +``` |
| 82 | +The the corrleation could be simply calculated as |
| 83 | +$$Corr(Z_{11},Z_{21})=\frac{80}{\sqrt{100*110}}=0.76$$ |
| 84 | +```{r} |
| 85 | +Corr1 <- 80 / sqrt(100 * 110) |
| 86 | +round(Corr1, 2) |
| 87 | +``` |
| 88 | + |
| 89 | +## Correlation of different analyses within the same population |
| 90 | +Let's consider another simple situation, we want to compare single population, for example, the population 1, but in different analyses, interim and final analyses. Then $i=1$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. |
| 91 | +The correlation matrix will be |
| 92 | +$$Corr(Z_{11},Z_{12})=\frac{n_{1 \wedge 1,1 \wedge 2}}{\sqrt{n_{11}*n_{12}}}$$ |
| 93 | +The number of events are listed as |
| 94 | +```{r} |
| 95 | +event_tb2 <- tribble( |
| 96 | + ~Population, ~"Number of Event in IA", ~"Number of Event in FA", |
| 97 | + "Population 1", 100, 200 |
| 98 | +) |
| 99 | +event_tb2 %>% |
| 100 | + gt() %>% |
| 101 | + tab_header(title = "Number of events at each analyses in example 2") |
| 102 | +``` |
| 103 | +The the corrleation could be simply calculated as |
| 104 | +$$\text{Corr}(Z_{11},Z_{12})=\frac{100}{\sqrt{100*200}}=0.71$$ |
| 105 | +The 100 in the numerator is the overlap number of events of interim analysis and final analysis in population 1. |
| 106 | +```{r} |
| 107 | +Corr1 <- 100 / sqrt(100 * 200) |
| 108 | +round(Corr1, 2) |
| 109 | +``` |
| 110 | + |
| 111 | +## Correlation of different analyses and different population |
| 112 | +Let's consider the situation that we want to compare population 1 in interim analyses and population 2 in final analyses. Then for different population, $i=1$ and $i=2$, and to compare IA and FA, the $k$ will be $k=1$ and $k=2$. |
| 113 | +The correlation matrix will be |
| 114 | +$$\text{Corr}(Z_{11},Z_{22})=\frac{n_{1 \wedge 1,2 \wedge 2}}{\sqrt{n_{11}*n_{22}}}$$ |
| 115 | +The number of events are listed as |
| 116 | +```{r} |
| 117 | +event_tb3 <- tribble( |
| 118 | + ~Population, ~"Number of Event in IA", ~"Number of Event in FA", |
| 119 | + "Population 1", 100, 200, |
| 120 | + "Population 2", 110, 220, |
| 121 | + "Overlap in population 1 and 2", 80, 160 |
| 122 | +) |
| 123 | +event_tb3 %>% |
| 124 | + gt() %>% |
| 125 | + tab_header(title = "Number of events at each population & analyses in example 3") |
| 126 | +``` |
| 127 | + |
| 128 | +The correlation could be simply calculated as |
| 129 | +$$\text{Corr}(Z_{11},Z_{22})=\frac{80}{\sqrt{100*220}}=0.54$$ |
| 130 | +The 80 in the numerator is the overlap number of events of population 1 in interim analysis and population 2 in final analysis. |
| 131 | +```{r} |
| 132 | +Corr1 <- 80 / sqrt(100 * 220) |
| 133 | +round(Corr1, 2) |
| 134 | +``` |
| 135 | + |
| 136 | +# Generate the correlation matrix by `generate_corr()` |
| 137 | +Now we know how to calculate the correlation values under different situations, and the `generate_corr()` function was built based on this logic. We can directly calculate the results for each cross situation via the function. |
| 138 | + |
| 139 | +First, we need a event table including the information of the study. |
| 140 | + |
| 141 | +- `H1` refers to one hypothesis, selected depending on the interest, while `H2` refers to the other hypothesis, both of which are listed for multiplicity testing. For example, `H1` means the experimental treatment is superior to the control in the population 1/experimental arm 1; `H2` means the experimental treatment is superior to the control in the population 2/experimental arm 2; |
| 142 | +- `Analysis` means different analysis stages, for example, 1 means the interim analysis, and 2 means the final analysis; |
| 143 | +- `Event` is the common events overlap by `H1` and `H2`. |
| 144 | + |
| 145 | +For example: `H1=1`, `H2=1`, `Analysis=1`, `Event=100 `indicates that in the first population, there are 100 cases where the experimental treatment is superior to the control in the interim analysis. |
| 146 | + |
| 147 | +Another example: `H1=1`, `H2=2`, `Analysis=2`, `Event=160` indicates that the number of overlapping cases where the experimental treatment is superior to the control in population 1 and 2 in the final analysis is 160. |
| 148 | + |
| 149 | +To be noticed, the column names in this function are fixed to be `H1`, `H2`, `Analysis`, `Event`. |
| 150 | +```{r, message=FALSE} |
| 151 | +library(wpgsd) |
| 152 | +# The event table |
| 153 | +event <- tibble::tribble( |
| 154 | + ~H1, ~H2, ~Analysis, ~Event, |
| 155 | + 1, 1, 1, 100, |
| 156 | + 2, 2, 1, 110, |
| 157 | + 3, 3, 1, 225, |
| 158 | + 1, 2, 1, 80, |
| 159 | + 1, 3, 1, 100, |
| 160 | + 2, 3, 1, 110, |
| 161 | + 1, 1, 2, 200, |
| 162 | + 2, 2, 2, 220, |
| 163 | + 3, 3, 2, 450, |
| 164 | + 1, 2, 2, 160, |
| 165 | + 1, 3, 2, 200, |
| 166 | + 2, 3, 2, 220 |
| 167 | +) |
| 168 | +
|
| 169 | +event %>% |
| 170 | + gt() %>% |
| 171 | + tab_header(title = "Number of events at each population & analyses") |
| 172 | +``` |
| 173 | + |
| 174 | +Then we input the above event table to the function of `generate_corr()`, and get the correlation matrix as follow. |
| 175 | +```{r} |
| 176 | +generate_corr(event) |
| 177 | +``` |
| 178 | + |
| 179 | +# References |
| 180 | + |
0 commit comments