Generating exponential survival data

Posted by Cameron Davidson-Pilon at

Suppose we interested in generating exponential survival times with scale parameter \(\lambda\), and having \(\alpha\) probability of censorship, \(0 \le \alpha < 1\). This is actually, at least from what I tried, a non-trivial problem. I've derived a few algorithms:

Algorithm 1

  1.  Generate \(T \sim \text{Exp}( \lambda )\). If \(\alpha = 0\), return \((T, 1)\). 
  2.  Solve \(\frac{  \lambda h }{ \exp (\lambda h) -1 } = \alpha \) for \(h\).
  3.  Generate \(E \sim  \text{TruncExp}( \lambda, h )\), where \(\text{TruncExp}\) is the truncated exponential distribution with max value \(h\).
  4. \(C=(T+E)<h\)
  5. \(T = \min ( h - E, T )\)
  6. return \((T,C)\)

  Yes, it is actually that hard (unless I am missing something and there is a super simple solution).  Here's the Python:

Algorithm 2

  1. Generate \(T \sim \text{Exp}(\lambda)\). If \(\alpha = 0\), return \((T, 1)\). 
  2. Generate \( T_c \sim \text{Exp}(\frac{\alpha \lambda}{1 - \alpha}) \)
  3. \(T = \min(T_c, T)\)
  4. \(C = (T > T_c)\)
  5. return \( (T, C) \)

The long

Here's what doesn't work, which I rudely found out today (why? This fails independences assumptions when using Kaplan Meier)

  1. Generate exponentials,
  2. randomly pick \(\alpha\) of them, and scale their magnitude by a \(\text{Uni}(0,1)\)

Details on Algorithm 1

Instead, I visualised the problem as a mini real-world situation. That is, given a randomly staggered birth and an individual having an exponential lifetime, at what time should I observe the individual so that there is an \(\alpha\) probability that I would have censored them (that is, they have not died yet). To make things mathematically easier, I assumed that staggered births also came from a independent and identical distribution as the lifetime distribution. Call the time before birth \(S\) and the lifetime of an individual \(L\) (so \(S\) and \(L\) are iid exponentials). I am curious about the time I should observe the individual, call this time \(h\), so that there is an \(\alpha\) probability they have not died yet. I also need that \(S<h\), so that I at least see the birth of the individual. Thus, I need to solve:

$$P( S+ L > h | S < h ) = \alpha$$

for \(\alpha\). The left-hand-side involved lots of straightforward integrals, and actually reduced to the amazingly simple formula:
$$\frac{ \lambda h }{ \exp (\lambda h) -1 }$$

The right-hand side, as a function of \(h\), looks like this:

After solving for \(h\), the next step was simple simulation: generate \(S \;|\; S < h\) and \(L\), and determine if witnessing the individual at time \(h\) would be a censorship or not. This last step was just some inequalities and algebra.

Details on Algorithm 2

This one is pretty simple: Suppose censorship times follow an exponential too, but what is the parameter? Well we want:
$$ P( S > L ) = \alpha $$
Computing this integral and solving for \(\alpha\) implies that the correct parameter should be \(\frac{ \alpha \lambda } { 1 - \alpha}\).

Related Posts

Latest Data Science screencasts available


Leave a comment

Please note: comments will be approved before they are published