Generating exponential survival data
Suppose we interested in generating exponential survival times with scale parameter \(\lambda\), and having \(\alpha\) probability of censorship, \(0 \le \alpha < 1\). This is actually, at least from what I tried, a non-trivial problem. I've derived a few algorithms:
Algorithm 1
- Generate \(T \sim \text{Exp}( \lambda )\). If \(\alpha = 0\), return \((T, 1)\).
- Solve \(\frac{ \lambda h }{ \exp (\lambda h) -1 } = \alpha \) for \(h\).
- Generate \(E \sim \text{TruncExp}( \lambda, h )\), where \(\text{TruncExp}\) is the truncated exponential distribution with max value \(h\).
- \(C=(T+E)<h\)
- \(T = \min ( h - E, T )\)
- return \((T,C)\)
Yes, it is actually that hard (unless I am missing something and there is a super simple solution). Here's the Python:
Algorithm 2
- Generate \(T \sim \text{Exp}(\lambda)\). If \(\alpha = 0\), return \((T, 1)\).
- Generate \( T_c \sim \text{Exp}(\frac{\alpha \lambda}{1 - \alpha}) \)
- \(T = \min(T_c, T)\)
- \(C = (T > T_c)\)
- return \( (T, C) \)
The long
Here's what doesn't work, which I rudely found out today (why? This fails independences assumptions when using Kaplan Meier)
- Generate exponentials,
- randomly pick \(\alpha\) of them, and scale their magnitude by a \(\text{Uni}(0,1)\)
Details on Algorithm 1
Instead, I visualised the problem as a mini real-world situation. That is, given a randomly staggered birth and an individual having an exponential lifetime, at what time should I observe the individual so that there is an \(\alpha\) probability that I would have censored them (that is, they have not died yet). To make things mathematically easier, I assumed that staggered births also came from a independent and identical distribution as the lifetime distribution. Call the time before birth \(S\) and the lifetime of an individual \(L\) (so \(S\) and \(L\) are iid exponentials). I am curious about the time I should observe the individual, call this time \(h\), so that there is an \(\alpha\) probability they have not died yet. I also need that \(S<h\), so that I at least see the birth of the individual. Thus, I need to solve:
$$P( S+ L > h | S < h ) = \alpha$$
for \(\alpha\). The left-hand-side involved lots of straightforward integrals, and actually reduced to the amazingly simple formula:
$$\frac{ \lambda h }{ \exp (\lambda h) -1 }$$
The right-hand side, as a function of \(h\), looks like this:
After solving for \(h\), the next step was simple simulation: generate \(S \;|\; S < h\) and \(L\), and determine if witnessing the individual at time \(h\) would be a censorship or not. This last step was just some inequalities and algebra.
Details on Algorithm 2
This one is pretty simple: Suppose censorship times follow an exponential too, but what is the parameter? Well we want:
$$ P( S > L ) = \alpha $$
Computing this integral and solving for \(\alpha\) implies that the correct parameter should be \(\frac{ \alpha \lambda } { 1 - \alpha}\).