Poissonization of Multinomials

Posted by Cameron Davidson-Pilon on


I've seen some really interesting probability & numerical solutions using a strategy called Poissonization, but Googling for it revealed very few resources (just some references in some textbooks that I don't have quick access to). Below are my notes and repository for PoissonizationAfter we introduce the theory, we'll do some examples.

The technique relies on the following theorem:

Theorem: Let \(N \sim \text{Poi}(\lambda)\) and suppose \(N=n, (X_1, X_2, ... X_k) \sim \text{Multi}(n, p_1, p_2, ..., p_k)\). Then, marginally, \(X_1, X_2, ..., X_k\) are are independent Poisson, with \(X_i \sim \text{Poi}(p_i \lambda)\). [1]

The proof of this theorem is as follows. First, by the law of total probability:

P(X_1=x_1, ..., X_k = x_k\;|\; n, \mathbf{p})
& = \sum_{n=0}^{\infty} P(X_1=x_1, ..., X_k = x_k \;|\; N=n) \frac{e^{-\lambda}\lambda^n}{n!} \\
& = \sum_{n=0}^{\infty} \frac{(x_1 + ... + x_k)!}{x_1!x_2! \cdots x_k!}p_1^{x_1}\cdots p_k^{x_k}\frac{e^{-\lambda}\lambda^n}{n!}I_{x_1 + ... + x_k = n} \\
& = e^{-\lambda}\lambda^{x_1} \cdots \lambda^{x_k}p_1^{x_1} \cdots p_k^{x_k}\frac{1}{x_1!x_2! \cdots x_k!}\\
& = e^{-\lambda}(\lambda p_1)^{x_1} \cdots (\lambda p_k)^{x_k}\frac{1}{x_1!x_2! \cdots x_k!}\\
& = \prod_{i=1}^k \frac{e^{-\lambda p_i}(\lambda p_i)^{x_i}}{x_i!}\;\;\;\;\blacksquare

Let's see how we can use this to solve probability problems. Let \((Y_1, Y_2, ...Y_k)\) be \(\text{Multi}(n, p_1, p_2, ..., p_k)\), and we are interested in when the realized values fall into some set \(A\). Let \(X_i\) be defined as above, that is, \(X_i \sim \text{Poi}(p_i \lambda)\). Then, from above, we can write:  

P((X_1, ... X_k) \in A) & = \sum_{i=0}^\infty \frac{e^{-\lambda}\lambda^n}{n!} P((Y_1, ... Y_k) \in A)\\
& = e^{-\lambda} \sum_{i=0}^\infty \frac{\lambda^n}{n!} P((Y_1, ... Y_k) \in A)\\
& e^{\lambda}P((X_1, ... X_k) \in A) = \sum_{i=0}^\infty \frac{\lambda^n}{n!} P((Y_1, ... Y_k) \in A) \;\;\;(★)

Both sides of the last line are infinite series of \(\lambda^n\), and thus their coefficients must be equal. Thus, as \(N=n\), \(P((Y_1, ... Y_k) \in A)\) is equal to  \(n! c(n)\) where \(c(n)\) is the coefficient of \(\lambda^n\) in the expansion of \(e^{\lambda}P((X_1, ... X_k) \in A)\). 

Example 1

Let's try an example: 

n defects are randomly distributed amongst k integrated-circuit chips produced by a factory (any number of defects may be found on a chip and each defect is independent of the other defects). If n = 4, k=7, find the probability that there is a chip with at least 3 defects. 

This is a classic case where it's easier to compute the complementary probability: what is the probability that all chips have less than 3 defects (and then we subtract this from 1 to get the original probability).  

Since each defect is randomly distributed, \(p_i = p = \frac{1}{7}\). Below, we invoke Poissonization by letting \(n\) be randomly distributed as \( \text{Poi}(\lambda)\) (later we will condition on its actual value of 4). That means that we can assume the number of defects on a chip is a Poisson random variable with rate \(\lambda p = \frac{\lambda}{7}\). 

P(\text{All chips have $\le$ 2 defects})
& = \Big(P(\text{Chip has 0 defect}) + P(\text{Chip has 1 defect}) + P(\text{Chip has 2 defect})\Big)^7 \\
& = \Big(e^{-\lambda/7} + \frac{e^{-\lambda/7}(\lambda/7)^1}{1!} + \frac{e^{-\lambda/7}(\lambda/7)^2}{2!}\Big)^7\\
& = e^{-\lambda}\Big(1 + \frac{\lambda}{7} + \frac{\lambda^2}{2!7^2}\Big)^7\\

When we multiply the above by \(e^{\lambda}\), and imagine the right-hand side is expanded, we get a form that is familiar to equation (). This means that we are interested in the coefficient of \(\lambda^4\). We could expand this by hand, but instead let's expand this computationally. For this, it's useful to know a bit about multinomial expansions. But basically you are looking for solutions to 

k_0 + k_1 + k_2 = 7\\
0 k_0 + 1 k_2 + 2 k_2 = 4

The first equation comes from the constraint for the powers of a multinomial expansion to sum to the outside exponent. The second equation comes from our need to find powers of \(\lambda\) that sum up to 4 (the 0,1,2 come from the existing powers of \(\lambda\)). Two equations, three unknowns, solving for the values:

k_2 &= k_0 - 3 \\
k_1 &= 10 - 2k_0\\
3 &\le k_0 \le 5

Computationally, we can write this in Python:

We get the answer 0.92711, so the complement to this is 0.07289 and is the probability of the original question.

Example 2

This comes from FiveThiryEight's Riddler series: Santa Needs Some Help With Math. I'll quote the problem here:

In Santa’s workshop, elves make toys during a shift each day. On the overhead radio, Christmas music plays, with a program randomly selecting songs from a large playlist.


During any given shift, the elves hear 100 songs. A cranky elf named Cranky has taken to throwing snowballs at everyone if he hears the same song twice. This has happened during about half of the shifts. One day, a mathematically inclined elf named Mathy tires of Cranky’s sodden outbursts. So Mathy decides to use what he knows to figure out how large Santa’s playlist actually is.


Help Mathy out: How large is Santa’s playlist?

Okay, so let's translate this to our set up. We have \(k\) songs to choose from, and we sample 100 of them. Turns out that P(any song is played more than once) = 0.5. So in this case, we know the probability, but we don't know the \(k\). So, similarly to how we started above:

P(\text{songs have 1 or less plays}) 
& = \Big(P(\text{songs has 0 plays}) + P(\text{songs has 1 plays})\Big)^k \\
& = \Big(e^{-\lambda/k} + \frac{e^{-\lambda/k}(\lambda/k)^1}{1!}\Big)^k\\
& = e^{-\lambda}\Big(1 + \frac{\lambda}{k}\Big)^k\\

Yay we have a binomial! They are much easier to work with. Now we need to answer:

What is the coefficient of \(λ^{100}\)?

Call this coefficient \(c_{100}(k)\). Since we have a binomial, expanding the above is easy:

c_{100}(k) = \frac{C(k, 100)}{k^{100}}

Using the other piece of information above, that the probability of an "event" is 0.5, we can write an equation down:

& = 100! c_{100}(k) \\
& = 100! \frac{C(k, 100)}{k^{100}} \\
& = \frac{k!}{(k-100)! k^{100}} \\

2\frac{k!}{(k-100)!} - k^{100} = 0

We can use Python (!!) to numerically solve this equation for \(k\). 


Because the sign flips at 7175, we conclude that either 7174 or 7175 is the closest answer.  


Looking for more examples?


[1] DasGupta, A. Probability for Statistics and Machine Learning.


Related Posts

Latest Data Science screencasts available

  • Just above example 1, you’ve used i as the index of summation, but lambda has exponent n. I think that’s a mistake. (In DasGupta’s book he uses n as the index of summation, which I believe is double-using n; it’s already the parameter for the multinomial.)

    Eric Auld on
  • The example can be worked by hand. There are 175 ways out of 2401 to have a chip with at least 3 defects, giving a probability of 0.07288.

    Q. Iqbal on

Leave a comment

Please note: comments will be approved before they are published