# Distribution of the last value in a sum of Uniforms that exceeds 1

Posted by **Cameron Davidson-Pilon** at

While working on a problem, I derived an interesting result around sums of uniforms random variables. I wanted to record it here so I don't forget it (I haven't solved the more general problem yet!). Here's the summary of the result:

Let \(S_n = \sum_{i=1}^n U_i \) be the sum of \(n\) Uniform random variables. Let \(N\) be the index of the first time the sum exceeds 1 (so \(S_{N-1} < 1\) and \(S_{N} \ge 1\)). The distribution of \(U_N\) is \(e - e^{1-u}\).This result is related to the really astounding result: \(N\) (the same \(N\) defined above) has expected value equal to \(e\). Yes, that's right, the average number of uniform random variables needed to exceed 1 is exactely \(e\). I just love this surprising result, but have still not found a good, intuitive reason as to why this is true. Of course it is true mathematically, and the proof is not too hard, but I just can't

*see why*. Anyways, below is the proof of the first result.

### Break the problem into smaller pieces

We'll first use the total law of probability to break the distribution into smaller pieces: $$P(U_N \in du) = \sum_{n=2}^{\infty} P(U_N \in du \;|\; N=n)P(N=n) $$### Numerical results first

For the below part, we'll be conditioning on a given \(N\), so let's first look at the distibution of \(U_N \;|\; N=n\). To do this, we'll just use Monte Carlo to sample from this distribution. Here are the results: So this is our target. If we can model these distributions, we can plug them into the above formula.### Focusing on a particular \(N\)

Let's put our attention on the case \(N=2\). We want to solve: $$ \begin{align*} P(U_2 \in du \;|\; N=2) & = P(U_2 \in du \;|\; U_1 < 1, U_1 + U_2 \ge 1) \\ & = \frac{P(U_2 \in du, U_1 < 1, U_1 + U_2 \ge 1)}{P(U_1 < 1, U_1 + U_2 \ge 1)} \\ & = \frac{P(U_1 < 1, U_1 + U_2 \ge 1 \;|\; U_2 \in du)P(U_2 \in du)}{P(U_1 < 1, U_1 + U_2 \ge 1)} \\ \end{align*} $$ Because \(U_2\) is uniform, \(P(U_2 \in du) = du \), so $$ \frac{P(U_1 < 1, U_1 + U_2 \ge 1 \;|\; U_2 \in du) du}{P(U_1 < 1, U_1 + U_2 \ge 1)} $$ The denominator, while it can be worked out, is equal to the term \(P(N=n)\) in the first equation, so those will cancel out later. It's equal to \(\frac{1}{n! + (n-1)!}\), and I'll keep it in to compare our results to the distributions above. So we just have the numerator term to compute. I'll use a slight change in the problem statement here, so as to make the algebra more clear: $$ \begin{align} P(U_1 < 1, U_1 + u \ge 1 \;|\; U_2 = u) & \\ & = P(U_1 < 1, U_1 \ge 1 - u \;|\; U_2 = u)\\ & = u \end{align} $$ So we arrive at \(f_{U_2 \; | \; N=2}(u) = \frac{u}{1/2} = 2u\), which nice aligns with our empirical distribution above.### Slightly harder problem

Let's try \(N=3\), and it gets more interesting here. We'll skip ahead to solving: $$ \begin{align*} P(U_1 + U_2 < 1, U_1 + U_2 + u \ge 1 \;|\; U_3 = u) = \\ P(U_1 + U_2 < 1, U_1 + U_2 \ge 1 - u \;|\; U_3 = u) \end{align*} $$ To solve this, we'll take geometric point of view: the two variables (\(U_1, U_2)\) can be represented as the unit square in the first quadrant. The inequality \(U_1 + U_2 < 1\) is an area in this square, bordered by the unit square and the line \(U_1 + U_2 = 1\). Ditto with \(U_1 + U_2 \ge 1 - u\). The probability is then just the area of the space between the lines. After computing triangles, the area comes out to \(\frac{1}{2}(1 - (1-u)^2)\).### In general

This geometric argument can be extended, with the help of the Irwin–Hall distribution, and in general we get: $$ \begin{align*} f_{U_n \;|\; N = n}(u) &= \frac{1}{(n-1)! + (n-2)!} \int_{1-u}^{1} \frac{1}{(n-2)!}z^{n-2}dz \\ & =\frac{(n-1)! + (n-2)!}{(n-1)!}(1 - (1-u)^{n-1}) \end{align*} $$ And the results are spot on:### Near the finish line

Going back to our original equation: $$ \begin{align*} f_{U_N}(u) &= \sum_{n=2}^{\infty} f_{U_N | N=n}(u)P(N=n)\\ &= \sum_{n=2}^{\infty} \frac{(n-1)! + (n-2)!}{(n-1)!}(1 - (1-u)^{n-1}) \frac{1}{(n-1)! + (n-2)!} \\ &= \sum_{n=2}^{\infty} \frac{1}{(n-1)!}(1 - (1-u)^{n-1})\\ &= \sum_{n=2}^{\infty} \frac{1}{(n-1)!} - \sum_{n=2}^{\infty} \frac{(1-u)^{n-1}}{(n-1)!}\\ &= \sum_{n=1}^{\infty} \frac{1}{n!} - \sum_{n=1}^{\infty} \frac{(1-u)^{n}}{n!}\\ &= e - e^{1-u}\\ \end{align*} $$ Does this empirically line up? Again, I created samples via MC and plotted our derived curve over it: Terrific! This confirms that no mistake was made in the derivation.### Conclusion

This worked out so nicely mostly because the probabilities were monomoials with simple coefficients - then we could easily work out the solution to the infinite series. However, if I wanted a similar result for exceeding 2 (instead of 1), we get*polynomials*and much more complex coefficients. I haven't made much progress on this part. I think a solution is to condition the sum on some value, \(N'\), such that the sum at \(N'\) exceeds 1, and then I can subtract 1 (+ a smaller term) from 2, and I arrive back at a sum less than 1.

- Tags: probability, python, statistics

## Latest Data Science screencasts available

Comments