-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathNotebook_causal_survival.qmd
4037 lines (3258 loc) · 205 KB
/
Notebook_causal_survival.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Causal survival analysis"
subtitle: "Estimation of the Average Treatment Effect (ATE): Practical Recommendations"
author:
- name: Charlotte Voinot
corresponding: true
email: [email protected]
url: https://chvoinot.github.io/
affiliations:
- name: Sanofi R&D
department: CMEI
url: https://www.sanofi.fr/fr/
- name: INRIA
department: Premedical
url: https://www.inria.fr/fr/premedical
- name: INSERM
url: https://www.inserm.fr/
- name: Université de Montpellier
url: https://www.umontpellier.fr/
- name: Clément Berenfeld
corresponding: true
email: [email protected]
url: https://cberenfeld.github.io
affiliations:
- name: Universität Potsdam, Potsdam, Germany
departement: Institut für Mathematik
url: https://www.uni-potsdam.de/en/university-of-potsdam
- name: Imke Mayer
corresponding: true
email: [email protected], now affiliated with Owkin
affiliations:
- name: Charité -- Universität Berlin, Berlin, Germany
departement: Institut für Public Health
url: https://www.charite.de/
- name: Bernard Sebastien
corresponding: true
email: [email protected]
affiliations:
- name: Sanofi R&D
department: CMEI
url: https://www.sanofi.fr/fr/
- name: Julie Josse
corresponding: true
email: [email protected]
url: https://juliejosse.com/
affiliations:
- name: INRIA
department: Premedical
url: https://www.inria.fr/fr/premedical
- name: INSERM
url: https://www.inserm.fr/
- name: Université de Montpellier
url: https://www.umontpellier.fr/
date: last-modified
date-modified: last-modified
abstract: Causal survival analysis combines survival analysis and causal inference to evaluate the effect of a treatment or intervention on a time-to-event outcome, such as survival time. It offers an alternative to relying solely on Cox models for assessing these effects. In this paper, we present a comprehensive review of estimators for the average treatment effect measured with the restricted mean survival time, including regression-based methods, weighting approaches, and hybrid techniques. We investigate their theoretical properties and compare their performance through extensive numerical experiments. Our analysis focuses on the finite-sample behavior of these estimators, the influence of nuisance parameter selection, and their robustness and stability under model misspecification. By bridging theoretical insights with practical evaluation, we aim to equip practitioners with both state-of-the-art implementations of these methods and practical guidelines for selecting appropriate estimators for treatment effect estimation. Among the approaches considered, G-formula two-learners, AIPCW-AIPTW, Buckley-James estimators, and causal survival forests emerge as particularly promising.
keywords: [Restricted Mean Survival Time, Randomized Control Trial, Observational Study, Censoring]
bibliography: references.bib
github-user: chvoinot
repo: "Simple_simulation_causal_survival"
draft: false # set to false once the build is running
published: false # will be set to true once accepted
execute:
echo: false
format:
computo-html: default
computo-pdf:
header-includes:
- |
\newcommand{\bbR}{\mathbb{R}}
\newcommand{\bbE}{\mathbb{E}}
\newcommand{\bbP}{\mathbb{P}}
\newcommand\wt{\widetilde}
\newcommand\wh{\widehat}
\newcommand{\ve}{\varepsilon}
\renewcommand{\epsilon}{\varepsilon}
\renewcommand\leq{\leqslant}
\renewcommand\geq{\geqslant}
\newcommand{\mand}{\quad\text{and}\quad}
\newcommand{\where}{\quad\text{where}\quad}
\newcommand{\with}{\quad\text{with}\quad}
\newcommand{\Var}{\mathrm{Var}}
\renewcommand{\(}{$}
\renewcommand{\)}{$}
\newcommand*\diff{\mathop{}\!\mathrm{d}}
\newcommand\cF{\mathcal{F}}
\newcommand\cX{\mathcal{X}}
\usepackage{amsmath}
\usepackage{amssymb}
editor:
markdown:
wrap: 7
---
::: {.content-hidden}
$$
{{< include _macro.tex >}}
$$
:::
# Introduction {#sec-intro}
## Context and motivations {#sec-context}
Causal survival analysis is a growing field that integrates causal inference [@rubin_estimating_1974; @hernan2010causal] with survival analysis [@kalbfleisch2002statistical] to evaluate the impact of treatments on time-to-event outcomes, while accounting for censoring situations where only partial information about an event's occurrence is available.
Being a relatively new domain, the existing literature, though vast, remains fragmented. As a result, a clear understanding of the theoretical properties of various estimators is challenging to obtain. Moreover, the implementation of proposed methods is limited, leaving researchers confronted with a range of available estimators and the need to make numerous methodological decisions. There is a pressing need for a comprehensive survey that organizes the available methods, outlines the underlying assumptions, and provides an evaluation of estimator performance --- particularly in finite sample settings. Such a survey also has the potential to help identify remaining methodological challenges that need to be addressed.
This need becomes increasingly urgent as causal survival analysis gains traction in both theoretical and applied domains. For instance, its applications to external control arm analyses are particularly relevant in the context of single-arm clinical trials, where traditional comparator arms are unavailable. Regulatory guidelines have begun to acknowledge and support such semi-experimental approaches, reflecting the broader evolution of trial design and therapeutic innovation in precision medicine, see for instance [@EMA_RWD].
By synthesizing the theoretical foundations, assumptions, and performance of various estimators, a survey on existing causal survival analysis methods would provide researchers and practitioners with the necessary tools to make informed methodological choices. This is crucial for fostering robust and reliable applications of causal survival analysis in both academic research and practical settings, where precise and valid results are paramount.
In this paper, we focus our attention to the estimation of the Restricted Mean Survival Time (RMST), a popular causal measure in survival analysis which offers an intuitive interpretation of the average survival time over a specified period. In particular, we decided to not cover the estimation of Hazard Ratio (HR), which has been prominently used but often questioned due to its potential non-causal nature [@HR_Martinussen]. Additionally, unlike the Hazard Ratio, the RMST has the desirable property of being a collapsible measure, meaning that the population effect can be expressed as a weighted average of subgroup effects, with positive weights that sum to one [@huitfeldt2019collapsibility].
## Definition of the estimand: the RMST {#sec-notations}
We set the analysis in the potential outcome framework, where a patient, described by a vector of covariates $X \in \mathbb{R}^p$, either receives a treatment ($A=1$) or is in the control group ($A=0$). The patient will then survive up to a certain time $T(0) \in \bbR^+$ in the control group, or up to a time $T(1)\in \bbR^+$ in the treatment group. In practice, we cannot simultaneously have access to $T(0)$ and $T(1)$, as one patient is either treated or control, but only to $T$ defined as follows:
::: {.assumption}
**Assumption.** (Stable Unit
Treatment Value Assumption: SUTVA)
$$
T = AT(1) + (1-A)T(0).
$$ {#eq-sutva}
:::
Due to potential censoring, the outcome $T$ is not completely observed. The most common form of censoring is right-censoring (also known as type II censoring), which occurs when the event of interest has not taken place by the end of the observation period, indicating that it may have occurred later if the observation had continued [@censoring_effect]. We focus in this study on this type of censoring only and we assume that we observe $\tilde T= T \wedge C = \min(T,C)$ for some censoring time $C \in \mathbb{R}^+$. When an observation is censored, the observed time is equal to the censoring time.
We also assume that we know whether an outcome is censored or not. In other words, we observe the censoring status variable $\Delta = \mathbb{I}\{T \leq C\}$, where $\mathbb{I}\{\cdot\}$ is the indicator function. $\Delta$ is $1$ if the true outcome is observed, and $0$ if it is censored.
We assume observing a $n$-sample of variables $(X,A,\wt T,\Delta)$ stemming from an $n$-sample of the partially unobservable variables $(X,A,T(0),T(1),C)$. A toy exemple of such data is given in @tbl-exemple_data.
| ID | Covariates | | | Treatment | Censoring | Status | Potential outcomes | | True outcome | Observed outcome |
|--|--|--|--|--|--|--|--|--|--|--|
| i | $X_{1}$ | $X_{2}$ | $X_{3}$ | $A$ | $C$ | $\Delta$ | $T(0)$ | $T(1)$ | $T$ | $\tilde{T}$ |
| 1 | 1 | 1.5 | 4 | 1 | ? | 1 | ? | 200 | 200 | 200 |
| 2 | 5 | 1 | 2 | 0 | ? | 1 | 100 | ? | 100 | 100 |
| 3 | 9 | 0.5 | 3 | 1 | 200 | 0 | ? | ? | ? | 200 |
: Example of a survival dataset. In practice, only $X,A,\wt T$ and $\Delta$ are observed. {#tbl-exemple_data}
Our aim is to estimate the Average Treatment Effect (ATE) defined as the difference between the Restricted Mean Survival Time of the treated and controls [@royston2013restricted].
::: {#def-ATE}
(Causal effect: Difference between Restricted Mean Survival Time)
$$
\theta_{\mathrm{RMST}} = \mathbb{E}\left[T(1) \wedge \tau - T(0) \wedge \tau\right],
$$
where $a \wedge b := \min(a,b)$ for $a,b \in \bbR$.
:::
We define the survival functions $S^{(a)}(t):=\mathbb{P}(T(a) > t)$ for $a \in \{0,1\}$, i.e., the probability that a
treated or non-treated individual will survive beyond a given time $t$. Likewise, we let $S(t) := \bbP(T >t)$, and $S_C(t) := \bbP(C > t)$. We also let $G(t) := \bbP(C \geq t)$ be the left-limit of the survival function $S_C$.
Because $T(a) \wedge \tau$ are non-negative random variables, one can easily express the restricted mean survival time using the survival functions:
$$
\mathbb{E}(T(a) \wedge \tau) = \int_{0}^{\infty} \mathbb{P}(T(a)\wedge \tau > t)dt = \int_{0}^{\tau}S^{(a)}(t)dt.
$$ {#eq-rmstsurv}
Consequently, $\theta_{\mathrm{RMST}}$ can be interpreted as the mean
difference between the survival function of treated and control until a
fixed time horizon $\tau$. A
difference in RMST $\theta_{\mathrm{RMST}} = 10$ days with $\tau=200$
means that, on average, the treatment increases the survival time by 10
days at 200 days. We give a visual interpretation of RMST in @fig-RMST.
{#fig-RMST}
Although the present work focuses on the estimation of the difference in RMST, we would like to stress that the causal effect can be assessed through other measures, such as for instance the difference of the survival functions
$$
\theta_{\mathrm{SP}} := S^{(1)}(\tau) - S^{(0)}(\tau)
$$
for some time $\tau$, see for instance [@Ozenne20_AIPTW_AIPCW]. As mentionned in @sec-context, another widely used measure (though not necessarily causal) is the hazards ratio, defined as
$$
\theta_{\mathrm{HR}} := \frac{\lambda^{(1)}(\tau)}{\lambda^{(0)}(\tau)},
$$
where the hazard function $\lambda^{(a)}$ is defined as
$$
\lambda^{(a)}(t) := \lim_{h \to 0^+}\frac{\bbP(T(a) \in [t,t+j)|T(a) \geq t)}{h}.
$$
in a continuous setting, or as $\lambda^{(a)}(t) := \bbP(T(a) = t|T(a) \geq t)$ when the survival times are discrete. The hazard functions and the survival functions are linked through the identities
$$
S^{(a)}(t) = \exp\left(-\Lambda^{(a)}(t)\right) \quad \text{where} \quad \Lambda^{(a)}(t) := \int_0^t \lambda^{(a)}(s)\diff s,
$$ {#eq-lambdacont}
in the continuous case. The functions $\Lambda^{(a)}$ are call the cumulative hazard functions. In the discrete case, we have in turn
$$
S^{(a)}(t) = \prod_{t_k \leq t} \left(1-\lambda^{(a)}(t_k)\right),
$$ {#eq-lambdadisc}
where $\{t_1,\dots,t_K\}$ are the atoms of $T^{(a)}$. These hazard functions are classically used to model the survival times and the censoring times, see @sec-Gformula.
## Organisation of the paper
In this paper, we detail the minimal theoretical framework that allows the analysis of established RMST estimators in the context of both Randomized Controlled Trials (@sec-theoryRCT) and observational data (@sec-theoryOBS). We give their statistical properties (consistency, asymptotic normality) along with proofs when possible.
We then conduct in @sec-simulation a numerical study of these estimators through simulations under various experimental conditions, including independent and conditionally independent censoring and correct and incorrect model specifications. We conclude in @sec-conclusion with practical recommendations on estimator selection, based on criteria such as asymptotic behavior, computational complexity, and efficiency.
## Notations
We provide in @tbl-reminder_notations a summary of the notation used throughout the paper.
| Symbol | Description |
|---------------|---------------------------------------------------------|
| $X$ | Covariates |
| $A$ | Treatment indicator $(A=1$ for treatment, $A=0$ for control$)$ |
| $T$ | Survival time |
| $T(a), a \in \{0,1\}$ | Potential survival time respectively with and without treatment |
| $S^{(a)},a \in \{0,1\}$ | Survival function $S^{(a)}(t) =\bbP(T(a) > t)$ of the potential survival times |
| $\lambda^{(a)},a \in \{0,1\}$ | Hazard function $\lambda^{(a)}(t) =\lim_{h \to 0^+}\mathbb{P}(T(a) \in [t,t+h) |T(a)\geq t)/h$ of the potential survival times |
| $\Lambda^{(a)},a \in \{0,1\}$ | Cumulative hazard function of the potential survival times |
| $C$ | Censoring time |
| $S_C$ | Survival function $S_C(t) =\mathbb{P}(C > t)$ of the censoring time |
| $G$ | Left-limit of the survival function $G(t) =\mathbb{P}(C \geq t)$ of the censoring time |
| $\widetilde{T}$ | Observed time ($T \wedge C$) |
| $\Delta$ | Censoring status $\mathbb{I}\{T \leq C \}$ |
| $\Delta^{\tau}$ | Censoring status of the restricted time $\Delta^{\tau} = \max\{\Delta, \mathbb{I}\{\widetilde{T} \geq \tau\}\}$ |
| $\{t_{1},t_{2},\dots,t_{K}\}$ | Discrete times |
| $e(x)$ | Propensity score $\mathbb{E} [A| X = x]$ |
| $\mu(x,a), a \in \{0,1\}$ | $\mathbb{E}[T \wedge \tau \mid X=x,A=a ]$ |
| $S(t|x,a), a \in \{0,1\}$ | Conditional survival function, $\bbP(T > t | X=x, A =a)$. |
| $\lambda^{(a)}(t|x), a \in \{0,1\}$ | Conditional hazard functions of the potential survival times |
| $G(t|x,a), a \in \{0,1\}$ | left-limit of the conditional survival function of the censoring $\bbP(C\geq t|X=x,A=a)$ |
| $Q_{S}(t|x,a), a \in \{0,1\}$ | $\mathbb{E}[T \wedge \tau \mid X=x,A=a, T \wedge \tau>t]$ |
: Summary of the notations. {#tbl-reminder_notations}
# Causal survival analysis in Randomized Controlled Trials {#sec-theoryRCT}
Randomized Controlled Trials (RCTs) are the gold standard for establishing
the effect of a treatment on an outcome, because treatment allocation is controlled through randomization, which ensures (asymptotically) the balance of covariates
between treated and controls, and thus avoids problems of confounding
between treatment groups. The core assumption in a classical RCT is the
random assignment of the treatment [@rubin_estimating_1974].
::: {.assumption}
**Assumption.**
(Random treatment assignment) There holds:
$$
A \perp\mkern-9.5mu\perp T(0),T(1),X
$$ {#eq-rta}
:::
We also assume that there is a positive probability of receiving the treatment, which we rephrase under the following assumption.
::: {.assumption}
**Assumption.** (Trial positivity)
$$
0 < \mathbb{P}(A=1) < 1
$$ {#eq-postrial}
:::
Under Assumptions [-@eq-rta] and [-@eq-postrial], classical causal identifiability equations can be written to express $\theta_{\mathrm{RMST}}$ without potential outcomes.
$$
\begin{aligned}
\theta_{\mathrm{RMST}} &= \mathbb{E}[T(1) \wedge \tau - T(0) \wedge \tau]\\
&= \mathbb{E}[T(1) \wedge \tau | A = 1] - \mathbb{E}[ T(0) \wedge \tau| A= 0] && \tiny\text{(Random treatment assignment)} \\
&= \mathbb{E}[T \wedge \tau | A = 1] - \mathbb{E}[ T \wedge \tau| A= 0]. && \tiny\text{(SUTVA)}
\end{aligned}
$$ {#eq-RMSTkm}
However, @eq-RMSTkm still depends on $T$, which remains only partially observed due to censoring. To ensure that censoring does not compromise the identifiability of treatment effects, we must impose certain assumptions on the censoring process,
standards in survival analysis, namely, independent censoring and conditionally independent censoring. These assumptions lead to different estimation approaches. We focus on two strategies: those that aim to directly estimate $\mathbb{E}[T \wedge \tau | A = a]$ directly (e.g., through censoring-unbiased transformations, see @sec-condcens), and those that first estimate the survival curves to derive RMST via @eq-rmstsurv (such as the Kaplan-Meier estimator and all its variants, see the next Section).
## Independent censoring: the Kaplan-Meier estimator {#sec-theoryRCT_indc}
In a first approach, one might assume that the censoring times are independent from the rest of the variables.
::: {.assumption}
**Assumption.** (Independent censoring)
$$
C \perp\mkern-9.5mu\perp T(0),T(1),X,A.
$$ {#eq-independantcensoring}
:::
Under @eq-independantcensoring, subjects censored at time $t$ are representative of all subjects who remain at risk at time $t$.
@fig-RCT_ind_causalgraph represents the causal graph when the
study is randomized and outcomes are observed under independent censoring.
{#fig-RCT_ind_causalgraph fig-align="center" width="40%"}
We also assume that there is no almost-sure upper bound on the censoring time before $\tau$, which we rephrase under the following assumption.
::: {.assumption}
**Assumption.** (Positivity of the censoring process) There exists $\ve > 0$ such that
$$
G(t) \geq \ve \quad \text{for all} \quad t \in [0,\tau).
$$ {#eq-poscen}
:::
If indeed it was the case that $\bbP(C < t) = 1$ for some $t < \tau$, then we would not be able to infer anything on the survival function on the interval $[t,\tau]$ as all observation times $\wt T_i$ would be in $[0,t]$ almost surely. In practice, adjusting the threshold time $\tau$
can help satisfy the positivity assumption. For instance, in a clinical study, if a subgroup of patients has zero probability of remaining uncensored at a given time, $\tau$ can be modified to ensure that participants have a feasible chance of remaining uncensored up to the revised threshold.
The two Assumptions [-@eq-independantcensoring] and [-@eq-poscen] together allow the distributions of $T(a)$ to be identifiable, in the sense that there exists an identity that expresses $S^{(a)}$ as a function of the joint distribution of $(\wt T,\Delta,A=a)$, see for instance @ebrahimi2003identifiability for such a formula in a non-causal framework. This enables several estimation strategies, the most well-known being the Kaplan-Meier product-limit estimator.
To motivate the definition of the latter and explicit the identifiability identity, we set the analysis in the discrete case. We let $\{t_k\}_{k \geq 1}$ be a set of positive and increasing times and assume that $T \in \{t_k\}_{k \geq 1}$ almost surely. Then for any $t \in [0,\tau]$, it holds, letting $t_0 = 0$ by convention, thanks to @eq-lambdadisc,
\begin{align*}
S(t| A=a) &= \bbP(T > t|A=a) = \prod_{t_k \leq t} \left(1 - \bbP(T = t_k | T > t_{k-1}, A=a)\right) \\
&= \prod_{t_k \leq t} \left(1 - \frac{\bbP(T = t_{k}, A=a)}{\bbP(T \geq t_{k},A=a)}\right).
\end{align*}
Using Assumptions [-@eq-independantcensoring,-@eq-postrial] and [-@eq-poscen], we find that
$$
\frac{\bbP(T = t_{k}, A=a)}{\bbP(T \geq t_{k},A=a)} = \frac{\bbP(T = t_{k},C \geq t_k,A=a)}{\bbP(T \geq t_{k}, C\geq t_k,A=a)} = \frac{\bbP(\wt T = t_{k}, \Delta = 1,A=a)}{\bbP( \wt T \geq t_{k},A=a)},
$$ {#eq-kmindep}
yielding the final identity
$$
S(t|A=a) = \prod_{t_k \leq t} \left(1-\frac{\bbP(\wt T = t_{k}, \Delta = 1,A=a)}{\bbP( \wt T \geq t_{k},A=a)}\right).
$${#eq-kmi}
Notice that the right hand side only depends on the distribution of the observed tuple $(A,\wt T,\Delta)$. This last equation suggests in turn to introduce the quantities
$$
D_k(a) := \sum_{i=1}^n \mathbb{I}(\wt T_i = t_k, \Delta_i = 1, A=a) \mand N_k(a) := \sum_{i=1}^n \mathbb{I}(\wt T_i \geq t_k, A=a),
$$ {#eq-dknk}
which correspond respectively to the number of deaths $D_k(a)$ and of individuals at risk $N_k(a)$ at time $t_k$ in the treated group (a=1) or in the control group (a=0).
::: {#def-km}
(Kaplan-Meier estimator, @kaplan)
With $D_k(a)$ and $N_k(a)$ defined in @eq-dknk, we let
$$
\wh{S}_{\mathrm{KM}}(t|A=a) := \prod_{t_k \leq t}\left(1-\frac{D_k(a)}{N_k(a)}\right).
$$ {#eq-unadjKM}
:::
The assiociated RMST estimator is then simply defined as
$$
\wh{\theta}_{\mathrm{KM}} = \int_{0}^{\tau}\wh{S}_{KM}(t|A=1)-\wh{S}_{KM}(t|A=0)dt.
$$ {#eq-thetaunadjKM}
The Kaplan-Meier estimator is the Maximum Likelihood Estimator (MLE) of the survival functions, see for instance @kaplan. Furthermore, because $D_k(a)$ and $N_k(a)$ are sums of i.i.d. random variables, the Kaplan-Meier estimator inherits some convenient statistical properties.
::: {#prp-km}
Under Assumptions [-@eq-sutva; -@eq-rta; -@eq-postrial; -@eq-independantcensoring] and [-@eq-poscen], and for all $t \in [0,\tau]$, the estimator $\wh S_{\mathrm{KM}}(t|A=a)$ of $S^{(a)}(t)$ is strongly consistent and admits the following bounds for its bias:
$$
0 \leq S^{(a)}(t) - \bbE[\wh S_\mathrm{KM}(t|A=a)] \leq O(\bbP(N_k(a) = 0)),
$$
where $k$ is the greatest time $t_k$ such that $t \geq t_k$.
:::
@gill1983large gives a more precise lower-bound on the bias in the case of continuous distributions, which was subsequently refined by @zhou1988two. The bound we give, although slightly looser, still exhibits the same asymptotic regime. In particular, as soon as $S^{(a)}(t) > 0$ (and Assumption [-@eq-poscen] holds), then the bias decays exponentially fast towards $0$. We give in @sec-proof21 a simple proof of our bound is our context.
::: {#prp-varkm}
Under Assumptions [-@eq-sutva; -@eq-rta; -@eq-postrial; -@eq-independantcensoring] and [-@eq-poscen], and for all $t \in [0,\tau]$, $\wh S_{\mathrm{KM}}(t|A=a)$ is asymptotically normal and $\sqrt{n}\left(\wh S_{\mathrm{KM}}(t|A=a) - S^{(a)}(t)\right)$ converges in distribution towards a centered Gaussian of variance
$$
V_{\mathrm{KM}}(t|A=a) := S^{(a)}(t)^2 \sum_{t_k \leq t} \frac{1-s_k(a)}{s_k(a) r_k(a)},
$$
where $s_k(a) = S^{(a)}(t_k)/S^{(a)}(t_{k-1})$ and $r_k(a) = \bbP(\wt T \geq t_k, A=a)$.
:::
The proof of @prp-varkm can be found in @sec-proof21. Because $D_k(a)/N_k(a)$ is a natural estimator of $1-s_k(a)$ and, $\frac{1}{n} N_k(a)$ a natural estimator for $r_k(a)$, the asymptotic variance of the Kaplan-Meier estimator can be estimated with the so-called Greenwood formula, as already derived heuristically in @kaplan:
$$
\wh{\Var} \left(\wh{S}_{\mathrm{KM}}(t|A=a)\right) := \wh{S}_{\mathrm{KM}}(t|A=a)^2 \sum_{t_k \leq t} \frac{D_k(a)}{N_k(a)(N_k(a)-D_k(a))}.
$$ {#eq-varkm}
We finally mention that the KM estimator as defined in @def-km still makes sense in a non-discrete setting, and one only needs to replace the fixed grid $\{t_k\}$ by the values at which we observed an event $(\wt T_i = t_k, \Delta_i =1)$. We refer to @Kaplan_consistency_breslow74 for a study of this estimator in the continuous case and to @Aalen2008, Sec 3.2 for a general study of the KM estimator through the prism of point processes.
## Conditionally independent censoring {#sec-condcens}
An alternative hypothesis in survival analysis that relaxes the assumption of independent censoring is conditionally independent censoring, also refered sometimes as _informative censoring_. It allows to model more realistic censoring processes, in particular in situations where there are reasons to believe that $C$ may be dependent from $A$ and $X$ (for instance, if patient is more like to leave the study when treated because of side-effects of the treatment).
::: {.assumption}
**Assumption.** (Conditionally independent censoring)
$$
C \perp\mkern-9.5mu\perp T(0),T(1) ~ | ~X,A
$$ {#eq-condindepcensoring}
:::
Under @eq-condindepcensoring, subjects within a same stratum defined by $X=x$ and $A=a$ have equal probability of censoring at time $t$, for all $t$.
In case of conditionally independent censoring, we also need to assume that all subjects have a positive probability to remain uncensored at their time-to-event.
::: {.assumption}
**Assumption.** (Positivity / Overlap for censoring)
There exists $\ve > 0$ such that for all $t\in [0,\tau)$, it almost surely holds
$$
G(t|A,X) \geq \epsilon.
$$ {#eq-positivitycensoring}
:::
@fig-RCT_dep_causalgraph represents the causal graph when the
study is randomized with conditionally independent censoring.
{#fig-RCT_dep_causalgraph fig-align="center" width="40%"}
Under dependent censoring, the
Kaplan-Meier estimator as defined in @def-km can fail to estimate survival, in particular because @eq-kmindep does not hold anymore. Alternatives include plug-in G-formula estimators (@sec-Gformula) and unbiased transformations (@sec-unbiasedtransfo).
### The G-formula and the Cox Model {#sec-Gformula}
Because the censoring is now independent from the potential outcome conditionally to the covariates, it would seem natural to model the response of the survival time conditionally to these covariates too:
$$
\mu(x,a) := \mathbb{E}[T \wedge \tau |X = x, A = a].
$$
Building on @eq-RMSTkm, one can express the RMST as a function of $\mu$:
\begin{align*}
\theta_{\mathrm{RMST}} = \mathbb{E}\left[\mathbb{E}[T \wedge \tau |X, A = 1]\right] - \mathbb{E}\left[ T \wedge \tau|X, A= 0]\right] = \bbE[\mu(X,1)-\mu(X,0)].
\end{align*}
An estimator $\wh \mu$ of $\mu$ would then straightforwardly yield an estimator of the difference in RMST through the so-called _G-formula_ plug-in estimator:
$$ \widehat{\theta}_{\mathrm{G-formula}} = \frac{1}{n} \sum_{i=1}^n \wh \mu\left(X_i, 1\right)-\wh \mu\left(X_i, 0\right). $${#eq-gformula}
We would like to stress that a G-formula approach works also in a observational context as the one introduced in @sec-obscondcens. However, because the estimation strategies presented in the next sections relies on estimating nuisance parameters, and that this latter task is often done in the same way as we estimate the conditional response $\mu$, we decided to not delay the introduction of the G-formula any further, and we present below a few estimation methods for $\mu$. These methods are sub-divised in two categories: _$T$-learners_, where $\mu(\cdot,1)$ is estimated separately from $\mu(\cdot,0)$, and _$S$-learners, where $\wh\mu$ is obtained by fitting a single model based on covariates $(X,A)$.
**Cox's Model**. There are many ways to model $\mu$ in a survival context, the most notorious of which being the Cox proportional hazards model [@Cox_1972]. It relies on a semi-parametric modelling the conditional hazard functions $\lambda^{(a)}(t|X)$ as
$$
\lambda^{(a)}(t|X) = \lambda_0^{(a)}(t) \exp(X^\top\beta^{(a)}),
$$
where $\lambda^{(a)}_0$ is a baseline hazard function and $\beta^{(a)}$ has the same dimension as the vector of covariate $X$. The conditional survival function then take the simple form (in the continuous case)
$$
S^{(a)}(t|X) = S^{(a)}_0(t)^{\exp(X^\top\beta^{(a)})},
$$
where $S^{(a)}_0(t)$ is the survival function associated with $\lambda_0^{(a)}$. The vector $\beta^{(a)}$ is classically estimated by maximizing the so-called _partial likelihood_ function as introduced in the original paper of @Cox_1972:
$$
\mathcal{L}(\beta) := \prod_{\Delta_i = 1} \frac{\exp(X_i^\top \beta)}{\displaystyle\sum_{\wt T_j \geq \wt T_i} \exp(X_j^\top \beta)},
$$
while the cumulative baseline hazard function can be estimated through the Breslow's estimator [@Breslow_approx1974]:
$$
\wh \Lambda_0^{(a)}(t) = \sum_{\Delta_i = 1, \wt T_i \leq t} \frac{1}{\displaystyle\sum_{\wt T_j \geq \wt T_i} \exp(X_j^\top \wh \beta^{(a)})}
$$
where $\wh \beta^{(a)}$ is a partial likelihood maximizer. One can show that $(\wh\beta^{(a)},\wh \Lambda_0^{(a)})$ is the MLE of the true likelihood, when $\wh \Lambda_0^{(a)}$ is optimized over all step fonctions of the form
$$
\Lambda_0(t) := \sum_{\Delta_i=1} h_i, \quad h_i \in \mathbb{R}^+.
$$
This fact was already hinted in the original paper by Cox and made rigorous in many subsequent papers, see for instance @fan2010high. Furthermore, if the true distribution follows a Cox model, then both $\wh\beta^{(a)}$ and $\wh \Lambda_0^{(a)}$ are strongly consistent and asymptotically normal estimator of the true parameters $\beta^{(a)}$ and $\Lambda^{(a)}$, see @kalbfleisch2002statistical, Sec 5.7.
When using a $T$-learner approach, one simply finds $(\wh\beta^{(a)},\wh \Lambda_0^{(a)})$ for $a \in \{0,1\}$ by considering the control group and the treated group separately. When using a $S$-learner approach, the treatment status $A$ becomes a covariate a the model becomes
$$
\lambda(t|X,A) = \lambda_0(t) \exp(X^\top\beta+\alpha A).
$$ {#eq-coxslearner}
for some $\alpha \in \mathbb{R}$. One main advantage of Cox's model is that it makes it very easy to interpret the effect of a covariate on the survival time. If indeed $\alpha > 0$, then the treatment has a negative effect of the survival times. Likewise, if $\beta_i >0$, then the $i$-th coordinate of $X$ has a negative effect as well. We finally mention that the hazard ratio takes a particularly simple form under the later model since
$$
\theta_{\mathrm{HR}} = e^{\alpha}.
$$
In particular, it does not depends on the time horizon $\tau$, and is thus sometimes refered to as _proportional hazard_. @fig-gf illustrates the estimation of the difference in Restricted Mean Survival Time using G-formula with Cox models.
{#fig-gf width="110%"}
**Weibull Model**. A popular parametric model for survival is the Weibull Model, which amounts to assume that
$$
\lambda^{(a)}(t|X) = \lambda_0^{(a)}(t) \exp(X^\top\beta)
$$
where $\lambda_0^{(a)}(t)$ is the instant hazard function of a Weibull distribution, that is to say a function proportional to $t^{\gamma}$ for some shape parameter $\gamma >0$. We refer to @zhang2016parametric for a study of this model.
**Survival Forests**. On the non-parametric front, we mention the existence of survival forests [@ishwaran2008random].
### Censoring unbiased transformations {#sec-unbiasedtransfo}
Censoring unbiased transformations involve applying a transformation to $T$. Specifically, we compute a new time $T^*$ of the form
$$
T^* := T^*(\wt T,X,A,\Delta) = \begin{cases}
\phi_0(\wt T \wedge \tau,X,A) \quad &\text{if} \quad \Delta^\tau = 0, \\
\phi_1(\wt T \wedge \tau,X,A) \quad &\text{if} \quad \Delta^\tau = 1.
\end{cases}
$$ {#eq-defcut}
for two wisely chosen transformations $\phi_0$ and $\phi_1$, and where
$$\Delta^{\tau}:=\mathbb{I}\{T \wedge \tau \leq C\} = \Delta+(1-\Delta)\mathbb{I}(\wt T \geq \tau)
$$ {#eq-defdeltatau}
is the indicator of the event where the individual is either uncensored or censored after time $\tau$. The idea behind the introduction of $\Delta^\tau$ is that because we are only interested in computed the expectation of the survival time thresholded by $\tau$, any censored observation coming after time $\tau$ can in fact be considered as uncensored ($\Delta^\tau = 1$).
A censoring unbiased transformation $T^*$ shall satisfy: for $a \in \{0,1\}$, it holds
$$
\bbE[T^*|A=a,X] = \bbE[T(a) \wedge \tau |X] \quad\text{almost surely.}
$${#eq-cut}
A notable advantage of this
approach is that it enables the use of the full transformed dataset $(X_i,A_i,T^*_i)$ as if no censoring occured. Because it holds
$$
\bbE[\bbE[T^*|A=a,X]] = \bbE\left[\frac{\mathbb{I}\{A=a\}}{\bbP(A=a)} T^*\right],
$$ {#eq-cutcond}
there is a very natural way to derive an estimator of the difference in RMST from any censoring unbiased transformation $T^*$ as:
$$
\wh\theta = \frac1n\sum_{i=1}^n \left(\frac{A_i}{\pi}-\frac{1-A_i}{1-\pi} \right) T^*_i
$$ {#eq-cutest}
where $\pi = \bbP(A=1) \in (0,1)$ by Assumption [-@eq-postrial] and where $T^*_i = T^*(\wt T_i,X_i,A_i,\Delta_i)$. We easily get the following result.
::: {#prp-cutest}
Under Assumptions [-@eq-rta] and [-@eq-postrial], the estimator $\wh\theta$ derived as in @eq-cutest from a square integrable censoring unbiased transformations satisfying @eq-cut is an unbiased, strongly consistent, and asymptotically normal estimator of the difference in RMST.
:::
Square integrability will be ensured any time the transformnation is bounded, which will always be the case of the ones considered in this work. It is natural in a RCT setting to assume that probability of being treated $\pi$ is known. If not, it is usual to replace $\pi$ by its empirical counterpart $\wh \pi = n_1/n$ where $n_a = \sum_{i} \mathbb{1}\{A=a\}$. The resulting estimator takes the form
$$
\wh\theta = \frac1{n_1}\sum_{A_i = 1} T^*_i - \frac1{n_0}\sum_{A_i = 0} T^*_i.
$$ {#eq-cutestnopi}
Note however that this estimator is slighlty biased due to the estimation of $\pi$ (see for instance @colnet2022reweighting, Lemma 2), but it is still strongly consistent and asymptotically normal, and its biased is exponentially small in $n$.
::: {#prp-cutestnopi}
Under Assumptions [-@eq-rta] and [-@eq-postrial], the estimator $\wh\theta$ derived as in @eq-cutestnopi from a square integrable censoring unbiased transformations satisfying @eq-cut is a strongly consistent, and asymptotically normal estimator of the difference in RMST.
:::
The two most popular transformations are Inverse-Probability-of-Censoring Weighting (@IPCtransKoul81) and Buckley-James (@Buckley_james_79), both illustrated in @fig-trans and detailed below. In the former, only non-censored observations are considered and they are weighted while in the latter, censored observations are imputed with an estimated survival time.
{#fig-trans}
**The Inverse-Probability-of-Censoring Weighted transformation**
The Inverse-Probability-of-Censoring Weighted (IPCW) transformation, introduced by (@IPCtransKoul81) in the context of censored linear regression, involves discarding censored observations and applying weights to uncensored data. More precisely, we let
$$
T^*_{\mathrm{IPCW}}=\frac{\Delta^\tau}{G(\widetilde{T}\wedge \tau|X,A)} \widetilde{T} \wedge \tau,
$$ {#eq-defipcw}
where we recall that $G(t|X,A) :=\mathbb{P}(C \geq t|X,A)$ is the left limit of the conditional survival function of the censoring.
This estimator assigns higher weights to uncensored subjects within a covariate group that is highly prone to censoring, thereby correcting for conditionally independent censoring and reducing selection bias [@Howe2016SelectionBD].
::: {#prp-ipcw}
Under Assumptions [-@eq-sutva;-@eq-rta;-@eq-postrial,-@eq-postrial;-@eq-condindepcensoring] and [-@eq-positivitycensoring], the IPCW transform [-@eq-defipcw] is a censoring unbiased transformation in the sense of @eq-cut.
:::
The proof of @prp-ipcw is in @sec-proof22. The IPCW depends on the unknown conditional survival function of the censoring $G(\cdot|X,A)$, which thus needs to be estimated. Estimating conditional censoring or the conditional survival function can be approached similarly, as both involve estimating a time—whether for survival or censoring. Consequently, we can use semi-parametric methods, such as the Cox model, or non-parametric approaches like survival forests, and we refer to @sec-Gformula for a development on these approaches. Once an estimator $\wh G(\cdot|A,X)$ of the later is provided, one can construct an estimator of the difference in RMST based on @eq-cutest or @eq-cutestnopi
$$
\wh \theta_{\mathrm{IPCW}} = \frac1n \sum_{i=1}^n \left(\frac{A_i}{\pi}-\frac{1-A_i}{1-\pi}\right) T^*_{\mathrm{IPCW},i},
$$ {#eq-ipcwdef}
or
$$
\wh\theta_{\mathrm{IPCW}} = \frac1{n_1}\sum_{A_i = 1} T^*_{\mathrm{IPCW},i} - \frac1{n_0}\sum_{A_i = 0} T^*_{\mathrm{IPCW},i}.
$$ {#eq-ipcwdefnopi}
where we recall that $n_a := \#\{i \in [n]~|~A_i=a\}$. By @prp-cutest, @prp-cutestnopi and @prp-ipcw, we easily deduce that $\wh \theta_{\mathrm{IPCW}}$ is asymptotically consistent as soon as $\wh G$ is.
::: {#cor-ipcwcons}
Under Assumptions[-@eq-sutva;-@eq-rta;-@eq-postrial;-@eq-condindepcensoring] and [-@eq-positivitycensoring], if $\wh G$ is uniformly weakly (resp. strongly) consistent then so is $\wh \theta_{\mathrm{IPCW}}$, either as in defined in @eq-ipcwdef or in @eq-ipcwdefnopi.
:::
This result simply comes from the fact that $\wh \theta_{\mathrm{IPCW}}$ depends continuously on $\wh G$ and that $G$ is lower-bounded (Assumption [-@eq-positivitycensoring]). Surprisingly, we found limited use of this estimator in the literature, beside its first introduction in @IPCtransKoul81. This could potentially be explained by the fact that, empirically, we observed that this estimator is highly variable. Consequently, we do not explore its properties further and will not include it in the numerical experiments. A related and more popular estimator is the IPCW-Kaplan-Meier, defined as follows.
::: {#def-ipcwkm}
(IPCW-Kaplan-Meier)
We let again $\wh G(\cdot|X,A)$ be an estimator of the (left limit of) the conditional censoring survival function and we introduce
\begin{align*}
D_k^{\mathrm{IPCW}}(a) &:= \sum_{i=1}^n \frac{\Delta_i^\tau}{\wh G(\wt T_i \wedge\tau | X_i,A=a)} \mathbb{I}(\wt T_i = t_k, A_i=a) \\
\mand N^{\mathrm{IPCW}}_k(a) &:= \sum_{i=1}^n \frac{\Delta_i^\tau}{\wh G(\wt T_i \wedge\tau | X_i,A=a)} \mathbb{I}(\wt T_i \geq t_k, A_i=a),
\end{align*}
be the weight-corrected numbers of deaths and of individual at risk at time $t_k$. The IPCW version of the KM estimator is then defined as:
$$
\begin{aligned}
\wh{S}_{\mathrm{IPCW}}(t | A=a) &= \prod_{t_k \leq t}\left(1-\frac{D_k^{\mathrm{IPCW}}(a)}{N_k^{\mathrm{IPCW}}(a)}\right).
\end{aligned}
$$
:::
Note that the quantity $\pi$ is not present in the definition of $D_k^{\mathrm{IPCW}}(a)$ and $N_k^{\mathrm{IPCW}}(a)$ because it would simply disappear in the ratio $D_k^{\mathrm{IPCW}}(a)/N_k^{\mathrm{IPCW}}(a)$. The subsequent RMST estimator then take the form
$$
\wh{\theta}_{\mathrm{IPCW-KM}} = \int_{0}^{\tau}\wh{S}_{\mathrm{IPCW}}(t|A=1)-\wh{S}_{\mathrm{IPCW}}(t|A=0)dt.
$$ {#eq-thetaIPCWKM}
Like before for the classical KM estimator, this new reweighted KM estimator enjoys good statistical properties.
::: {#prp-ipcwkm}
Under Assumptions [-@eq-sutva;-@eq-rta;-@eq-postrial;-@eq-condindepcensoring] and [-@eq-positivitycensoring], and for all $t \in [0,\tau]$, the oracle estimator $S^*_{\mathrm{IPCW}}(t|A=a)$ defined as in @def-ipcwkm with $\wh G = G$ is a stronlgy consistent and asymptotically normal estimator of $S^{(a)}(t)$ .
:::
The proof of @prp-ipcwkm can be found in @sec-proof22. Because the evaluation of $N_k^{\textrm{IPCW}}(a)$ now depends on information gathered after time $t_k$ (through the computation of the weights), the previous proofs on the absence of bias and on the derivation of the asymptotic variance unfortunately do not carry over in this case. Whether its bias is exponentially small and whether the asymptotic variance can be derived in a closed form are questions left open. We are also not aware of any estimation schemes for the asymptotic variance in this context. In the case where we do not have access to the oracle survival function $G$, we can again still achieve consistency if the estimator $\wh G(\cdot|A,X)$ that we provide is consistent.
::: {#cor-ipcwkmcons}
Under Assumptions [-@eq-sutva;-@eq-rta,-@eq-postrial;-@eq-condindepcensoring] and [-@eq-positivitycensoring], if $\wh G$ is uniformly weakly (resp. strongly) consistent then so is $\wh S_{\mathrm{IPCW}}(t|A=a)$.
:::
This corollary ensures that the corresponding RMST estimator defined in @eq-thetaIPCWKM will be consistent as well.
**The Buckley-James transformation**
One weakness of the IPCW transformation is that it discards all censored data. The Buckley-James (BJ) transformation, introduced by (@Buckley_james_79), takes a different path by leaving all uncensored values untouched, while replacing the censored ones by an extrapolated value. Formally, it is defined as follows:
$$
\begin{aligned}
T^*_{\mathrm{BJ}} &= \Delta^\tau (\widetilde{T}\wedge\tau) + (1-\Delta^\tau) Q_S(\widetilde T \wedge \tau|X,A),
\end{aligned}
$$ {#eq-defbj}
where, for $t \leq \tau$,
$$Q_S(t|X,A) :=\mathbb{E}[T \wedge \tau | X,A,T \wedge \tau > t]= \frac{1}{S(t|X,A)}\int_{t}^{\tau} S(u|X,A) \diff u$$ where $S(t|X,A=a) := \bbP(T(a) > t|X)$ is the conditional survival function. We refer again to @fig-trans for a diagram of this transformation.
::: {#prp-bj}
Under Assumptions [-@eq-sutva;-@eq-rta,-@eq-postrial;-@eq-condindepcensoring] and [-@eq-positivitycensoring], the BJ transform [-@eq-defbj] is a censoring unbiased transformation in the sense of @eq-cut.
:::
The proof of @prp-bj can be found in @sec-proof22. Again, the BJ transformation depends on a nuisance parameter (here $Q_S(\cdot|X,A)$) that needs to be estimated. We again refer to @sec-Gformula for a brief overview of possible estimation strategies for $Q_S$.
Once provided with an estimator $\wh Q_S(\cdot|A,X)$, a very natural estimator of the RMST based on the BJ transformation and on @eq-cutest or @eq-cutestnopi would be
$$
\wh \theta_{\mathrm{BJ}} = \frac1n \sum_{i=1}^n \left(\frac{A_i}{\pi}-\frac{1-A_i}{1-\pi}\right) T^*_{\mathrm{BJ},i},
$${#eq-BJ}
or
$$
\wh \theta_{\mathrm{BJ}} = \frac1{n_1} \sum_{A_i=1} T^*_{\mathrm{BJ},i}-\frac1{n_0} \sum_{A_0=1} T^*_{\mathrm{BJ},i}.
$${#eq-BJnopi}
Like for the IPCW transformation, the BJ transformation yields a consistent estimate of the RMST as soon as the model is well-specified.
::: {#cor-bjcons}
Under Assumptions [-@eq-sutva;-@eq-rta,-@eq-postrial;-@eq-condindepcensoring] and [-@eq-positivitycensoring], if $\wh Q_S$ is uniformly weakly (resp. strongly) consistent then so is $\wh \theta_{\mathrm{BJ}}$ defined as in @eq-BJ or @eq-BJnopi.
:::
The proof is again a mere application of Propositions [-@prp-cutest;-@prp-cutestnopi] and [-@prp-bj], and relies on the continuity of $S \mapsto Q_S$. The BJ transformation is considered as the best censoring transformation of the original response in the following sense.
::: {#thm-bj}
For any transformation $T^*$ of the form [-@eq-defcut], it holds
$$
\bbE[(T^*_{\mathrm{BJ}}-T \wedge \tau)^2] \leq \bbE[(T^*-T \wedge \tau)^2].
$$
:::
This result is stated in @LocalLinear_Fan94 but without a proof. We detail it in @sec-proof22 for completeness.
### Augmented corrections {#sec-tdr}
The main disadvantage of the two previous transformations is that they heavily rely on the specification of good estimator $\wh G$ (for IPCW) or $\wh S$ (for BJ). In order to circumvent this limitation, @DoublyR_transformation proposed the following transformations, whose expression is based on theory of semi-parametric estimation developed in @vanderLaan2003,
$$
T^*_\mathrm{DR} = \frac{\Delta^\tau \wt T\wedge \tau}{G(\wt T \wedge \tau|X,A)} + \frac{(1-\Delta^\tau)Q_S(\wt T \wedge \tau |X,A)}{G(\wt T \wedge \tau |X,A)}- \int_0^{\wt T \wedge \tau} \frac{Q_S(t|X,A)}{G(t|X,A)^2} \diff \bbP_C(t|X,A),
$$ {#eq-TDR}
where $\diff \bbP_C(t|X,A)$ is the distribution of $C$ conditional on the covariates $X$ and $A$. We stress that this distribution is entirely determined by the $G(\cdot|X,A)$, so that this transformation only depends on the knowledge of both conditional survival functions $G$ and $S$, will be thus sometimes denoted $T^*_\mathrm{DR}(G,S)$ to underline this dependency. This transformation is not only a censoring unbiased transform in the sense of @eq-cut, but is also doubly robust in the following sense.
:::{#prp-tdr}
We let $F,R$ be two conditional survival functions. Under Assumptions [-@eq-sutva;-@eq-rta;-@eq-postrial;-@eq-condindepcensoring] and [-@eq-positivitycensoring], if $F$ also satisfies Assumption [-@eq-positivitycensoring], and if $F(\cdot|X,A)$ is absolutely continuous wrt $G(\cdot|X,A)$, then the transformation $T^*_\mathrm{DR} = T^*_\mathrm{DR}(F,R)$ is a censoring unbiased transformation in the sense of @eq-cut whenever $F = G$ or $R=S$.
:::
The statement and proof of this results is found in @DoublyR_transformation in the case where $C$ and $T$ are continuous. A careful examination of the proofs show that the proof translates straight away to our discrete setting.
# Causal survival analysis in observational studies {#sec-theoryOBS}
Unlike RCT, observational data --- such as from registries, electronic health records, or national healthcare systems --- are collected without controlled randomized treatment allocation. In such cases, treated and control groups are likely unbalanced due to the non-randomized design, which results in the treatment effect being confounded by variables influencing both the survival outcome $T$ and the treatment allocation $A$. To enable identifiability of the causal
effect, additional standard assumptions are needed.
::: {.assumption}
**Assumption.** (Conditional exchangeability / Unconfoundedness) It holds
$$
A \perp\mkern-9.5mu\perp T(0),T(1) | X
$$ {#eq-unconf}
:::
Under @eq-unconf, the treatment assignment is randomly
assigned conditionally on the covariates $X$. This assumption ensures that there are no
unmeasured confounder as the latter would make it impossible to
distinguish correlation from causality.
::: assumption
**Assumption.** (Positivity / Overlap for treatment) Letting $e(X) := \bbP(A=1|X)$ be the _propensity score_, there holds
$$
0 < e(X) <1 \quad \text{almost surely.}
$$ {#eq-positivitytreat}
:::
The @eq-positivitytreat assumption requires adequate overlap in
covariate distributions between treatment groups, meaning every
observation must have a non-zero probability of being treated. Because Assumption [-@eq-rta] does not hold anymore, neither does the previous idenfiability @eq-RMSTkm. In this new context, we can write
$$
\begin{aligned}
\theta_{\mathrm{RMST}} &= \mathbb{E}[T(1) \wedge \tau - T(0) \wedge \tau]\\
&= \mathbb{E}\left[\bbE[T(1) \wedge \tau|X] - \bbE[T(0) \wedge \tau|X] \right] && \\
&=\mathbb{E}\left[\bbE[T(1) \wedge \tau|X, A=1] - \bbE[T(0) \wedge \tau|X, A=0]\right] && \tiny\text{(unconfoundness)} \\
&= \mathbb{E}\left[\bbE[T \wedge \tau|X, A=1] - \bbE[T \wedge \tau|X, A=0]\right]. && \tiny\text{(SUTVA)}
\end{aligned}
$$ {#eq-identcond}
In another direction, one could wish to identify the treatment effect through the survival curve as in @eq-rmstsurv:
$$
S^{(a)}(t) = \bbP(T(a) > t) = \bbE\left[\bbP(T > t | X, A=a)\right].
$$ {#eq-kmcond}
Again, both identities still depend on the unknown quantity $T$ and suggest two different estimation strategies. These strategies differ according to the censoring assumptions and are detailed below.
## Independent censoring {#sec-obs_indcen}
@fig-causalgraph_obs_ind illustrates a causal graph in observational survival data with independent censoring (Assumption [-@eq-independantcensoring]).
{#fig-causalgraph_obs_ind
alt="Illustration of a causal graph in observational survival data with independent censoring."
fig-align="center" width="40%"}
Under Assumption [-@eq-independantcensoring], we saw in @sec-theoryRCT_indc that the Kaplan-Meier estimator could conveniently handle censoring. Building on @eq-kmcond, we can write
$$
S^{(1)}(t) = \bbE\left[\frac{\bbE[\mathbb{I}\{A=1,T > t\}|X]}{\bbE[\mathbb{I}\{A=1\}|X]} \right]=\bbE\left[\frac{A\mathbb{I}\{T > t\}}{e(X)} \right],
$$
which suggests to adapt the classical KM estimator by reweighting it by the propensity score. The use of propensity score in causal
inference has been initially introduced by @Propensity_causality and further developed in @hirano2003efficient. It was extended to
survival analysis by @IPTW through the adjusted Kaplan-Meier estimator as defined below.
::: {#def-iptwkm}
(IPTW Kaplan-Meier estimator)
We let $\wh e(\cdot)$ be an estimator of the propensity score $e(\cdot)$. We introduce
\begin{align*}
D_k^{\mathrm{IPTW}}(a) &:= \sum_{i=1}^n \left(\frac{a}{\wh e(X_i)}+\frac{1-a}{1- \wh e(X_i)}\right)\mathbb{I}(\wt T_i = t_k, \Delta_i = 1, A_i=a) \\
\mand N^{\mathrm{IPTW}}_k(a) &:= \sum_{i=1}^n \left(\frac{a}{\wh e(X_i)}+\frac{1-a}{1- \wh e(X_i)}\right) \mathbb{I}(\wt T_i \geq t_k, A_i=a),
\end{align*}
be the numbers of deaths and of individual at risk at time $t_k$, reweighted by the propensity score. The Inverse-Probability-of-Treatment Weighting (IPTW) version of the KM estimator is then defined as:
$$
\begin{aligned}
\wh{S}_{\mathrm{IPTW}}(t | A=a) &= \prod_{t_k \leq t}\left(1-\frac{D_k^{\mathrm{IPTW}}(a)}{N_k^{\mathrm{IPTW}}(a)}\right).
\end{aligned}
$${#eq-IPTWKM}
:::
We let $S^*_{\mathrm{IPTW}}(t | A=a)$ be the oracle KM-estimator defined as above with $\wh e(\cdot) = e(\cdot)$. Although the reweighting slightly changes the analysis, the good properties of the classical KM carry on to this setting.
::: {#prp-iptwkm}
Under Assumptions [-@eq-sutva;-@eq-unconf;-@eq-positivitytreat;-@eq-independantcensoring] and [-@eq-poscen] The oracle IPTW Kaplan-Meier estimator $S^*_{\mathrm{IPTW}}(t | A=a)$ is a strongly consistent and asymptotically normal estimator of $S^{(a)}(t)$.
:::
The proof of this result simply relies again on the law of large number and the $\delta$-method and can be found in @sec-proof31. Because now $S^*_{\mathrm{IPTW}}$ is a continuous function of $e(\cdot)$, and because $e$ and $1-e$ are lower-bounded as per Assumptions [-@eq-positivitytreat], we easily derive the following corollary.
::: {#cor-iptwkm}
Under the same assumptions as @prp-iptwkm, if $\wh e(\cdot)$ satisfies $\|\wh e-e\|_{\infty} \to 0$ almost surely (resp. in probability), then the IPTW Kaplan-Meier estimator $\hat S_{\mathrm{IPTW}}(t | A=a)$ is a strongly (resp. weakly) consistent estimator of $S^{(a)}(t)$.
:::
The resulting RMST estimator simply takes the form:
$$
\wh{\theta}_{\mathrm{IPTW-KM}} = \int_{0}^{\tau}\wh{S}_{\mathrm{IPTW}}(t|A=1)-\wh{S}_{\mathrm{IPTW}}(t|A=0)dt.
$$ {#eq-RMST_IPTWKM}
which will be consistent under the same Assumptions as the previous Corollary. Note that, we are not aware of any formal results concerning the bias and the asymptotic variance of the oracle estimator $S^*_{\mathrm{IPTW}}(t | A=a)$, and we refer to @IPTW for heuristics concerning these questions.
## Conditional independent censoring {#sec-obscondcens}
Under Assumptions [-@eq-unconf] (uncounfoundedness) and [-@eq-condindepcensoring]
(conditional independent censoring), the causal effect is affected both
by confounding variables and by conditional
censoring. The associated causal graph is depicted in @fig-causalgraph_obs_dep. In this setting, one can still use the $G$-formula exactly as in @sec-Gformula.
A natural alternative approach is to weight the IPCW and BJ transformations from @sec-unbiasedtransfo by the propensity score to disentangle both confounding effects and censoring at the same time.
{#fig-causalgraph_obs_dep
alt="Illustration of a simple causal graph in observational survival data with independent censoring.](causal_survival_obs_dep.pdf)"
fig-align="center" width="40%"}
We mention that the G-formula approach developed in @sec-Gformula does work in that context. In particular, @RMST prove consistency and asymptotic normality results for Cox estimators in a observational study, and they give an explicit formulation of the asymptotic variance as a function of the parameters of the Cox model. In the non-parametric world, @Foster_2011 and @Soren_2019 empirically study this estimator using survival forests, with the former employing it as a T-learner (referred to as _Virtual Twins_) and the latter as an S-learner.
### IPTW-IPCW transformations
One can check that the IPCW transformation as introduced in @eq-defipcw is also a censoring unbiased transformation in that context.
::: {#prp-iptwipcw}
Under Assumptions [-@eq-sutva;-@eq-unconf;-@eq-positivitytreat;-@eq-condindepcensoring] and [-@eq-positivitycensoring], the IPTW-IPCW transform [-@eq-defipcw] is a censoring unbiased transformation in the sense of @eq-cut.
:::
The proof of @prp-iptwipcw can be found in @sec-proof32. Deriving an estimator of the difference in RMST is however slightly different in that context. In particular, @eq-cutcond rewrites
$$
\bbE[\bbE[T^*|X,A=1]] = \bbE\left[\frac{A}{e(X)} T^*\right],
$$
Which in turn suggests to define
$$
\wh \theta_{\mathrm{IPTW-IPCW}} = \frac1n\sum_{i=1}^n \left(\frac{A}{e(X)}-\frac{1-A}{1-e(X)} \right) T^*_{\mathrm{IPCW},i}.
$$ {#eq-iptwipcw}
This transformation now depends on two nuisance parameters, namely the conditional survival function of the censoring (through $T^*_{\mathrm{IPCW}}$) and the propensity score. Once estimators of these quantities are provided, one could look at the corresponding quantity computed with these estimators.
::: {#prp-consiptwipcw}
Under Assumptions [-@eq-sutva;-@eq-unconf;-@eq-positivitytreat;-@eq-condindepcensoring] and [-@eq-positivitycensoring], and if $\wh G(\cdot|X,A)$ and $\wh e (\cdot)$ are uniformly weakly (resp. strongly) consistent estimators, then estimator [-@eq-iptwipcw] defined with $\wh e$ and $\wh G$ is a weakly (resp. strongly) consistent estimator of the difference in RMST.
:::
The proof of @prp-consiptwipcw can be found in @sec-proof32. We can also use the same strategy as for the IPCW transformation and incorporate the new weights into a Kaplan-Meier estimator.
::: {#def-iptwipcwkm}
(IPTW-IPCW-Kaplan-Meier)
We let again $\wh G(\cdot|X,A)$ and $\wh e(\cdot)$ be estimators of the conditional censoring survival function and of the propensity score. We introduce
\begin{align*}
D_k^{\mathrm{IPTW-IPCW}}(a) &:= \sum_{i=1}^n \left(\frac{A_i}{\wh e(X_i)}+\frac{1-A_i}{1-\wh e(X_i)} \right)\frac{\Delta_i^\tau}{\wh G(\wt T_i \wedge\tau | X_i,A=a)} \mathbb{I}(\wt T_i = t_k, A_i=a) \\
\mand N^{\mathrm{IPTW-IPCW}}_k(a) &:= \sum_{i=1}^n \left(\frac{A_i}{\wh e(X_i)}+\frac{1-A_i}{1-\wh e(X_i)} \right)\frac{\Delta_i^\tau}{\wh G(\wt T_i \wedge\tau | X_i,A=a)} \mathbb{I}(\wt T_i \geq t_k, A_i=a),
\end{align*}
be the weight-corrected numbers of deaths and of individual at risk at time $t_k$. The IPTW-IPCW version of the KM estimator is then defined as:
$$
\begin{aligned}
\wh{S}_{\mathrm{IPTW-IPCW}}(t | A=a) &= \prod_{t_k \leq t}\left(1-\frac{D_k^{\mathrm{IPTW-IPCW}}(a)}{N_k^{\mathrm{IPTW-IPCW}}(a)}\right).
\end{aligned}
$$
:::
The difference in RMST estimated with IPTW-IPCW-Kaplan-Meier survival curves is then simply as
$$
\wh{\theta}_{\mathrm{IPTW-IPCW-KM}} = \int_{0}^{\tau}\wh{S}_{\mathrm{IPTW-IPCW}}(t|A=1)-\wh{S}_{\mathrm{IPTW-IPCW}}(t|A=0)dt.
$$ {#eq-RMST_IPTW_IPCWKM}
::: {#prp-iptwipcwkm}
Under Assumptions [-@eq-sutva;-@eq-unconf;-@eq-positivitytreat;-@eq-condindepcensoring] and [-@eq-positivitycensoring], and for all $t \in [0,\tau]$, if the oracle estimator $S^*_{\mathrm{IPTW-IPCW}}(t | A=a)$ defined as in @def-iptwipcwkm with $\wh G(\cdot|A,X) = G(\cdot|A,X)$ and $\wh e = e$ is a strongly consistent and asymptotically normal estimator of $S^{(a)}(t)$ .
:::
The proof of @prp-iptwipcwkm can be found in @sec-proof32. Under consistency of the estimators of the nuisance parameters, the previous proposition implies that this reweighted Kaplan-Meier is a consistent estimator of the survival curve, which in turn implies consistency of $\wh{\theta}_{\mathrm{IPTW-IPCW-KM}}$.
::: {#cor-consiptwipcwkm}
Under Assumptions [-@eq-sutva;-@eq-unconf;-@eq-positivitytreat;-@eq-condindepcensoring] and [-@eq-positivitycensoring], and for all $t \in [0,\tau]$, if the nuisance estimators $\wh G(\cdot|A,X)$ and $\wh e$ are weakly (resp. strongly) uniformly consistent, then $\wh{S}_{\mathrm{IPTW-IPCW}}(t | A=a)$ is a weakly (resp. strongly) consistent estimator of $S^{(a)}(t)$.
:::
We are not aware of general formula for the asymptotic variances in this context. We mention nonetheless that @Schaubel2011 have been able to derive asymptotic laws in this framework in the particular case of Cox-models.
### IPTW-BJ transformations
Like IPCW tranformation, BJ transformation is again a censoring unbiased transformation in an observational context.
::: {#prp-iptwbj}
Under Assumptions [-@eq-sutva;-@eq-unconf;-@eq-positivitytreat;-@eq-condindepcensoring] and [-@eq-positivitycensoring], the IPTW-BJ transform [-@eq-defbj] is a censoring unbiased transformation in the sense of @eq-cut.
:::
The proof of @prp-iptwbj can be found in @sec-proof32. The corresponding estimator of the difference in RMST is
$$
\hat \theta_{\mathrm{IPTW-BJ}} = \frac1n\sum_{i=1}^n \left(\frac{A}{e(X)}-\frac{1-A}{1-e(X)} \right) T^*_{\mathrm{BJ},i}.
$$ {#eq-iptwbj}
This transformation depends on the conditional survival function $S$ (through $T^*_{\mathrm{BJ}}$) and the propensity score. Consistency of the nuisance parameter estimators implies again consistency of the RMST estimator.
::: {#prp-consiptwbj}
Under Assumptions [-@eq-sutva;-@eq-unconf;-@eq-positivitytreat;-@eq-condindepcensoring] and [-@eq-positivitycensoring], and if $\wh S(\cdot|X,A)$ and $\wh e (\cdot)$ are uniformly weakly (resp. strongly) consistent estimators, then $\wh \theta_{\mathrm{IPTW-BJ}}$ defined with $\wh S$ and $\wh e$ is a weakly (resp. strongly) consistent estimator of the RMST.
:::
The proof of @prp-consiptwbj can be found in @sec-proof32.
### Double augmented corrections {#sec-AIPTW_AIPCW}
Building on the classical doubly-robust AIPTW estimator from causal inference [@robins1994estimation], we could incorporate the doubly-robust transformations of @sec-tdr to obtain a _quadruply robust_ tranformation
$$
\Delta^*_{\mathrm{QR}} = \Delta^*_{\mathrm{QR}}(G,S,\mu,e) := \left(\frac{A}{e(X)}-\frac{1-A}{1-e(X)}\right)(T^*_{\mathrm{DR}}(G,S)-\mu(X,A))+\mu(X,1)-\mu(X,0),
$$
where we recall that $T^*_{\mathrm{DR}}$ is defined in @sec-tdr. This transformations depends on four nuisance parameters: $G$ and $S$ through $T^*_{\mathrm{DR}}$, and now the propensity score $e$ and the conditional response $\mu$. This transformation doesn't really fall into the scope of censoring unbiased transform, but it is easy to show that $\Delta^*_{\mathrm{QR}}$ is quadruply robust in the following sense.
::: {#prp-tqr}
Let $F,R$ be two conditional survival function, $p$ be a propensity score, and $\nu$ be a conditional response. Then, under the same assumption on $F,R$ as in @prp-tdr, and under Assumptions [-@eq-sutva;-@eq-unconf;-@eq-positivitytreat;-@eq-condindepcensoring] and [-@eq-positivitycensoring], the transformations $\Delta^*_{\mathrm{QR}} = \Delta^*_{\mathrm{QR}}(F,R,p,\nu)$ satisfies, fo $a \in {0,1}$,
$$
\bbE[ \Delta^*_\mathrm{QR} |X] = \bbE[T(1)\wedge \tau-T(0)\wedge \tau |X]\quad\text{if}\quad
\begin{cases} F = G \quad &\text{or}\quad R=S \quad \text{and} \\
p=e \quad &\text{or}\quad \nu=\mu.
\end{cases}
$$
:::
This result is similar to @Ozenne20_AIPTW_AIPCW, Thm 1, and its proof can be found in @sec-proof32. Based on estimators $(\wh G, \wh S, \wh \mu, \wh e)$ of $(G,S,\mu,e)$, one can then propose the following estimator of the RMST, coined the AIPTW-AIPCW estimator in @Ozenne20_AIPTW_AIPCW:
$$
\begin{aligned}
\wh \theta_{\mathrm{AIPTW-AIPCW}} &:= \frac1n \sum_{i=1}^n \Delta_{\mathrm{QR},i}^*(\wh G, \wh S, \wh \mu, \wh e)
\\
&=\frac1n \sum_{i=1}^n \left(\frac{A_i}{\wh e(X_i)}-\frac{1-A_i}{1-\wh e(X_i)}\right)(T^*_{\mathrm{DR}}(\wh G,\wh S)_i-\wh \mu(X_i,A_i)) + \wh \mu(X_i,1)-\wh\mu(X_i,0).
\end{aligned}
$${#eq-AIPTW_AIPCW}
This estimator enjoys good asymptotic properties under parametric models, as detailed in @Ozenne20_AIPTW_AIPCW.
# Implementation
In this section, we first summarize the various estimators and their properties. We then provide custom implementations for all estimators, even those already available in existing packages. These manual implementations serve two purposes: first, to make the methods accessible to the community when no existing implementation is available; and second, to facilitate a deeper understanding of the methods by detailing each step, even when a package solution exists. Finally, we present the packages available for directly computing $\theta_{\mathrm{RMST}}$.
## Summary of the estimators {#sec-summary}
@tbl-nuisance provides an overview of the estimators introduced in this paper, along with the corresponding nuisance parameters needed for their estimation and an overview of their statistical properties in particular regarding their sensitivity to misspecification of the nuisance parameters.
| Estimator | RCT | Obs | Ind Cens | Dep Cens | Outcome model | Censoring model | Treatment model | Robustness |
|--------|--|--|--|--|----|----|----|----|
| _Unadjusted KM_ | $\color{green}X$ | | $\color{green}X$ | | | | | |
| IPCW-KM | $\color{green}X$ | | $\color{green}X$ | $\color{green}X$ | | $G$ | | |
| BJ | $\color{green}X$ | | $\color{green}X$ | $\color{green}X$ | $S$ | | | |
| _IPTW-KM_ | $\color{green}X$ | $\color{green}X$ | $\color{green}X$ | | | | $e$ | |
| IPCW-IPTW-KM | $\color{green}X$ | $\color{green}X$ | $\color{green}X$ | $\color{green}X$ | | $G$ | $e$ | |
| IPTW-BJ | $\color{green}X$ | $\color{green}X$ | $\color{green}X$ | $\color{green}X$ | $S$ | | $e$ | |
| _G-formula_ | $\color{green}X$ | $\color{green}X$ | $\color{green}X$ | $\color{green}X$ | $\mu$ | | | |
| AIPTW-AIPCW | $\color{green}X$ | $\color{green}X$ | $\color{green}X$ |${\color{green}X}$ | $S,\mu$ | $G$ | $e$ | $\color{green}X$ (Prp [-@prp-tqr]) |
: Estimators of the difference in RMST and nuisance parameters needed to compute each estimator. Empty boxes indicate that the nuisance parameter is not needed in the estimator thus misspecification has no impact. Estimators in italic are those that are already implemented in available packages. {#tbl-nuisance}
## Implementation of the estimators
Across different implementations, we use the following shared functions provided in the $\texttt{utilitary.R}$ file.
- $\texttt{estimate\_propensity\_score}$: function to estimate propensity scores $e(X)$ using either parametric (i.e. logistic regression with the argument $\texttt{type\_of\_model = "glm"}$) or non-parametric methods (i.e. probability forest with the argument $\texttt{type\_of\_model ="probability forest"}$ based on the $\texttt{probability\_forest}$ function from the [grf ](https:%20//cran.r-project.org/web/packages/grf/index.html) [@Tibshirani_Athey_Sverdrup_Wager_2017] package). This latter can include cross-fitting ($\texttt{n.folds\>1}$).
- $\texttt{estimate\_survival\_function}$: function to estimate the conditional survival model, which supports either Cox models (argument $\texttt{type\_of\_model = "cox"}$) or survival forests (argument $\texttt{type\_of\_model = "survival forest"}$) which uses the function $\texttt{survival\_forest}$ from the [grf ](https:%20//cran.r-project.org/web/packages/grf/index.html) [@Tibshirani_Athey_Sverdrup_Wager_2017] package. This latter can include cross-fitting ($\texttt{n.folds\>1}$). The estimation can be done with a single learner (argument $\texttt{learner = "S-learner"}$) or two learners (argument $\texttt{learner = "T-learner"}$).
- $\texttt{estimate\_hazard\_function}$: function to estimate the instantaneous hazard function by deriving the cumulative hazard function at each time point. This cumulative hazard function is estimated from the negative logarithm of the survival function.
- $\texttt{Q\_t\_hat}$: function to estimate the remaining survival function at all time points and for all individuals which uses the former $\texttt{estimate\_survival\_function}$.
- $\texttt{Q\_Y}$: function to select the value of the remaining survival function from $\texttt{Q\_t\_hat}$ at the specific time-to-event.
- $\texttt{integral\_rectangles}$: function to estimate the integral of a decreasing step function using the rectangle method.
- $\texttt{expected\_survival}$: function to estimate the integral with $x,y$ coordinate (estimated survival function) using the trapezoidal method.
- $\texttt{integrate}$: function to estimate the integral at specific time points $\texttt{Y.grid}$ of a given $\texttt{integrand}$ function which takes initially its values on $\texttt{times}$ using the trapezoidal method.
**Unadjusted Kaplan-Meier**
Although Kaplan-Meier is implemented in the $\texttt{survival}$ package [@Therneau_2001], we provide a custom implementation, $\texttt{Kaplan\_meier\_handmade}$, for completeness. The difference in Restricted Mean Survival Time, estimated using Kaplan-Meier as in @eq-thetaunadjKM can then be calculated with the $\texttt{RMST\_1}$ function.
```{r echo=FALSE, message=FALSE, warning=FALSE}
source("utilitary.R")
# Kaplan-Meier estimator handmade implementation
# The database 'data' must be in the same form as that shown in
# notation (Table 1) and with the same variable names (status, T_obs)
Kaplan_meier_handmade <- function(data,
status = data$status,
T_obs = data$T_obs) {
# Sort unique observed times
Y.grid <- sort(unique(T_obs))
# Initialize vectors for number of events, number at risk, and survival
# probability
d <- rep(NA, length(Y.grid)) # Number of events at time Y.grid[i]
n <- rep(NA, length(Y.grid)) # Number at risk just before time Y.grid[i]
S <- rep(NA, length(Y.grid)) # Survival probability at time Y.grid[i]
# Loop over each unique observed time
for (i in 1:length(Y.grid)) {
d[i] <- sum(T_obs == Y.grid[i] & status == 1, na.rm = TRUE) # Count events
n[i] <- sum(T_obs >= Y.grid[i]) # Count at risk
# Calculate survival probability
S[i] <- cumprod(1 - d / n)[i]
}
# Create a dataframe with the results
df <- data.frame(d = d, n = n, S = S, T = Y.grid)
return(df)
}
# Function to calculate RMST (Restricted Mean Survival Time):
# Two possibilities for computing RMST:
# - in using directly S_A1 and S_A0 (survival function of treated and control)
# - in using the dataframe and the function computes the survival functions
RMST_1 <- function(data = NULL, A1 = 1, A0 = 0, tau, S_A1 = NULL, S_A0 = NULL) {
if (is.null(S_A1) & is.null(S_A0)) {
# Subset data for treatment groups
data1 <- data[data$A == A1,]
data0 <- data[data$A == A0,]
# Calculate Kaplan-Meier survival estimates
S_A1 <- Kaplan_meier_handmade(data1, status = data1$status,
T_obs = data1$T_obs)
S_A0 <- Kaplan_meier_handmade(data0, status = data0$status,
T_obs = data0$T_obs)
# Restrict observations to those less than or equal to tau
Y.grid1 <- data1$T_obs[data1$T_obs <= tau]
Y.grid0 <- data0$T_obs[data0$T_obs <= tau]
} else {
# Restrict observations to those less than or equal to tau
Y.grid1 <- S_A1$T[S_A1$T <= tau]
Y.grid0 <- S_A0$T[S_A0$T <= tau]
}
# Filter survival estimates to restricted observations
S_A1 <- S_A1 %>%
dplyr::filter(T %in% Y.grid1)
S_A0 <- S_A0 %>%
dplyr::filter(T %in% Y.grid0)
# Check if there is any event at tau for S_A1
if (!any(S_A1$T == tau)) {
new_row <- tibble(T = tau, S = S_A1$S[nrow(S_A1)])
S_A1 <- dplyr::bind_rows(S_A1, new_row)
}
# Check if there is any event at tau for S_A0
if (!any(S_A0$T == tau)) {
new_row <- tibble(T = tau, S = S_A0$S[nrow(S_A0)])
S_A0 <- dplyr::bind_rows(S_A0, new_row)
}
# Calculate integrals from 0 to tau of survival probabilities
intA1 <- integral_rectangles(S_A1$T, S_A1$S)
intA0 <- integral_rectangles(S_A0$T, S_A0$S)
RMST1 <- intA1 - intA0
return(list(RMST=RMST1, intA1=intA1,intA0=intA0))
}
```
As an alternative, one can also use the $\texttt{survfit}$ function in the survival package [@Therneau_2001] for Kaplan-Meier and specify the $\texttt{rmean}$ argument equal to $\tau$ in the corresponding summary function: