# Generating exponential survival data

Posted by Cameron Davidson-Pilon at

Suppose we interested in generating exponential survival times with scale parameter $$\lambda$$, and having $$\alpha$$ probability of censorship, $$0 \le \alpha < 1$$. This is actually, at least from what I tried, a non-trivial problem. I've derived a few algorithms:

### Algorithm 1

1.  Generate $$T \sim \text{Exp}( \lambda )$$. If $$\alpha = 0$$, return $$(T, 1)$$.
2.  Solve $$\frac{ \lambda h }{ \exp (\lambda h) -1 } = \alpha$$ for $$h$$.
3.  Generate $$E \sim \text{TruncExp}( \lambda, h )$$, where $$\text{TruncExp}$$ is the truncated exponential distribution with max value $$h$$.
4. $$C=(T+E)<h$$
5. $$T = \min ( h - E, T )$$
6. return $$(T,C)$$

Yes, it is actually that hard (unless I am missing something and there is a super simple solution).  Here's the Python:

### Algorithm 2

1. Generate $$T \sim \text{Exp}(\lambda)$$. If $$\alpha = 0$$, return $$(T, 1)$$.
2. Generate $$T_c \sim \text{Exp}(\frac{\alpha \lambda}{1 - \alpha})$$
3. $$T = \min(T_c, T)$$
4. $$C = (T > T_c)$$
5. return $$(T, C)$$

### The long

Here's what doesn't work, which I rudely found out today (why? This fails independences assumptions when using Kaplan Meier)

1. Generate exponentials,
2. randomly pick $$\alpha$$ of them, and scale their magnitude by a $$\text{Uni}(0,1)$$

#### Details on Algorithm 1

Instead, I visualised the problem as a mini real-world situation. That is, given a randomly staggered birth and an individual having an exponential lifetime, at what time should I observe the individual so that there is an $$\alpha$$ probability that I would have censored them (that is, they have not died yet). To make things mathematically easier, I assumed that staggered births also came from a independent and identical distribution as the lifetime distribution. Call the time before birth $$S$$ and the lifetime of an individual $$L$$ (so $$S$$ and $$L$$ are iid exponentials). I am curious about the time I should observe the individual, call this time $$h$$, so that there is an $$\alpha$$ probability they have not died yet. I also need that $$S<h$$, so that I at least see the birth of the individual. Thus, I need to solve:

$$P( S+ L > h | S < h ) = \alpha$$

for $$\alpha$$. The left-hand-side involved lots of straightforward integrals, and actually reduced to the amazingly simple formula:
$$\frac{ \lambda h }{ \exp (\lambda h) -1 }$$

The right-hand side, as a function of $$h$$, looks like this:

After solving for $$h$$, the next step was simple simulation: generate $$S \;|\; S < h$$ and $$L$$, and determine if witnessing the individual at time $$h$$ would be a censorship or not. This last step was just some inequalities and algebra.

#### Details on Algorithm 2

This one is pretty simple: Suppose censorship times follow an exponential too, but what is the parameter? Well we want:
$$P( S > L ) = \alpha$$
Computing this integral and solving for $$\alpha$$ implies that the correct parameter should be $$\frac{ \alpha \lambda } { 1 - \alpha}$$.

## Latest Data Science screencasts available

Comments

Leave a comment

Please note: comments will be approved before they are published