Menu
Cart

Distribution of the last value in a sum of Uniforms that exceeds 1

Posted by Cameron Davidson-Pilon at

While working on a problem, I derived an interesting result around sums of uniforms random variables. I wanted to record it here so I don't forget it (I haven't solved the more general problem yet!). Here's the summary of the result:

Let \(S_n = \sum_{i=1}^n U_i \) be the sum of \(n\) Uniform random variables. Let \(N\) be the index of the first time the sum exceeds 1 (so \(S_{N-1} < 1\) and \(S_{N} \ge 1\)). The distribution of \(U_N\) is \(e - e^{1-u}\).
This result is related to the really astounding result: \(N\) (the same \(N\) defined above) has expected value equal to \(e\). Yes, that's right, the average number of uniform random variables needed to exceed 1 is exactely \(e\). I just love this surprising result, but have still not found a good, intuitive reason as to why this is true. Of course it is true mathematically, and the proof is not too hard, but I just can't see why. Anyways, below is the proof of the first result.

Break the problem into smaller pieces

We'll first use the total law of probability to break the distribution into smaller pieces: $$P(U_N \in du) = \sum_{n=2}^{\infty} P(U_N \in du \;|\; N=n)P(N=n) $$

Numerical results first

For the below part, we'll be conditioning on a given \(N\), so let's first look at the distibution of \(U_N \;|\; N=n\). To do this, we'll just use Monte Carlo to sample from this distribution. Here are the results: So this is our target. If we can model these distributions, we can plug them into the above formula.

Focusing on a particular \(N\)

Let's put our attention on the case \(N=2\). We want to solve: $$ \begin{align*} P(U_2 \in du \;|\; N=2) & = P(U_2 \in du \;|\; U_1 < 1, U_1 + U_2 \ge 1) \\ & = \frac{P(U_2 \in du, U_1 < 1, U_1 + U_2 \ge 1)}{P(U_1 < 1, U_1 + U_2 \ge 1)} \\ & = \frac{P(U_1 < 1, U_1 + U_2 \ge 1 \;|\; U_2 \in du)P(U_2 \in du)}{P(U_1 < 1, U_1 + U_2 \ge 1)} \\ \end{align*} $$ Because \(U_2\) is uniform, \(P(U_2 \in du) = du \), so $$ \frac{P(U_1 < 1, U_1 + U_2 \ge 1 \;|\; U_2 \in du) du}{P(U_1 < 1, U_1 + U_2 \ge 1)} $$ The denominator, while it can be worked out, is equal to the term \(P(N=n)\) in the first equation, so those will cancel out later. It's equal to \(\frac{1}{n! + (n-1)!}\), and I'll keep it in to compare our results to the distributions above. So we just have the numerator term to compute. I'll use a slight change in the problem statement here, so as to make the algebra more clear: $$ \begin{align} P(U_1 < 1, U_1 + u \ge 1 \;|\; U_2 = u) & \\ & = P(U_1 < 1, U_1 \ge 1 - u \;|\; U_2 = u)\\ & = u \end{align} $$ So we arrive at \(f_{U_2 \; | \; N=2}(u) = \frac{u}{1/2} = 2u\), which nice aligns with our empirical distribution above.

Slightly harder problem

Let's try \(N=3\), and it gets more interesting here. We'll skip ahead to solving: $$ \begin{align*} P(U_1 + U_2 < 1, U_1 + U_2 + u \ge 1 \;|\; U_3 = u) = \\ P(U_1 + U_2 < 1, U_1 + U_2 \ge 1 - u \;|\; U_3 = u) \end{align*} $$ To solve this, we'll take geometric point of view: the two variables (\(U_1, U_2)\) can be represented as the unit square in the first quadrant. The inequality \(U_1 + U_2 < 1\) is an area in this square, bordered by the unit square and the line \(U_1 + U_2 = 1\). Ditto with \(U_1 + U_2 \ge 1 - u\). The probability is then just the area of the space between the lines. After computing triangles, the area comes out to \(\frac{1}{2}(1 - (1-u)^2)\).

In general

This geometric argument can be extended, with the help of the Irwin–Hall distribution, and in general we get: $$ \begin{align*} f_{U_n \;|\; N = n}(u) &= \frac{1}{(n-1)! + (n-2)!} \int_{1-u}^{1} \frac{1}{(n-2)!}z^{n-2}dz \\ & =\frac{(n-1)! + (n-2)!}{(n-1)!}(1 - (1-u)^{n-1}) \end{align*} $$ And the results are spot on:

Near the finish line

Going back to our original equation: $$ \begin{align*} f_{U_N}(u) &= \sum_{n=2}^{\infty} f_{U_N | N=n}(u)P(N=n)\\ &= \sum_{n=2}^{\infty} \frac{(n-1)! + (n-2)!}{(n-1)!}(1 - (1-u)^{n-1}) \frac{1}{(n-1)! + (n-2)!} \\ &= \sum_{n=2}^{\infty} \frac{1}{(n-1)!}(1 - (1-u)^{n-1})\\ &= \sum_{n=2}^{\infty} \frac{1}{(n-1)!} - \sum_{n=2}^{\infty} \frac{(1-u)^{n-1}}{(n-1)!}\\ &= \sum_{n=1}^{\infty} \frac{1}{n!} - \sum_{n=1}^{\infty} \frac{(1-u)^{n}}{n!}\\ &= e - e^{1-u}\\ \end{align*} $$ Does this empirically line up? Again, I created samples via MC and plotted our derived curve over it: Terrific! This confirms that no mistake was made in the derivation.

Conclusion

This worked out so nicely mostly because the probabilities were monomoials with simple coefficients - then we could easily work out the solution to the infinite series. However, if I wanted a similar result for exceeding 2 (instead of 1), we get polynomials and much more complex coefficients. I haven't made much progress on this part. I think a solution is to condition the sum on some value, \(N'\), such that the sum at \(N'\) exceeds 1, and then I can subtract 1 (+ a smaller term) from 2, and I arrive back at a sum less than 1.

Related Posts


Latest Data Science screencasts available


Comments

Leave a comment

Please note: comments will be approved before they are published