Menu
Cart

Bayesian M&M Problem in PyMC 2

Posted by Cameron Davidson-Pilon at

This Bayesian problem is from Allen Downey's Think Bayes book. I'll quote the problem here: 

M&M’s are small candy-coated chocolates that come in a variety of colors. Mars, Inc., which makes M&M’s, changes the mixture of colors from time to time.
In 1995, they introduced blue M&M’s. Before then, the color mix in a bag of plain M&M’s was 30% Brown, 20% Yellow, 20% Red, 10% Green, 10% Orange, 10% Tan. Afterward it was 24% Blue , 20% Green, 16% Orange, 14% Yellow, 13% Red, 13% Brown.
Suppose a friend of mine has two bags of M&M’s, and he tells me that one is from 1994 and one from 1996. He won’t tell me which is which, but he gives me one M&M from each bag. One is yellow and one is green. What is the probability that the yellow one came from the 1994 bag?

 

This is a very simple Bayesian problem, on paper! See Allen's book for the solution (link above). What if we wanted to use PyMC to solve it? Below is the solution (originally from CrossValidated). 

Caution: Please use with PyMC 2.3.2

It looks like there is a bug in 2.3.4 (the most recent version) that is causing the wrong inference. This took me a while to discover, but it was solved when I downgraded to PyMC 2.3.2.

Model:

import pymc as pm

p = [
 #brown, yellow, red, green, orange, tan, blue
 [.3,  .2,  .2,  .1, .1,  .1, .0 ], # 1994 bag
 [.13, .14, .13, .2, .16, .0, .24]  # 1996 bag
]


bag = pm.Categorical('which_bag', [0.5, 0.5]) # prior on bag selected

@pm.deterministic
def first_bag_selection(p=p, bag=bag):
    return p[bag]

@pm.deterministic
def second_bag_selection(p=p, bag=bag):
    return p[1-bag]

# observe yellow in bag 1
obs_1 = pm.Categorical('bag_1', first_bag_selection, value=1, observed=True)

#observe green in bag 2
obs_2 = pm.Categorical('bag_2', second_bag_selection, value=3, observed=True)

Inference

mcmc = pm.MCMC([bag, p, first_bag_selection, second_bag_selection, obs_1, obs_2])
mcmc.sample(15000,3000)

bag_trace = mcmc.trace('which_bag')[:]


# what is the probability the bag chosen is from 1996?
print (bag_trace == 1).sum()*1./bag_trace.shape[0]
# should be about .26

Related Posts


Latest Data Science screencasts available


  • Wil, the story says I draw a yellow and a green M&M respectively. The value=1 says “I observe a yellow M&M” (note that order is important in the `p` variable.

    Cam on
  • Hey Cam,

    What is the significance of `value=1` and `value=3` in the Categorical variables?

    Will on

Leave a comment

Please note: comments will be approved before they are published