(originally posted on camdp.com)

Preface: This example is a (greatly modified) excerpt from the open-source book Bayesian Methods for Hackers, currently being developed on Github ;)

Why is sorting from "best" to "worst" so difficult?

Consider ratings on online products: how often do you trust an average 5-star rating if there is only 1 reviewer? 2 reviewers? 3 reviewers? We implicitly understand that with such few reviewers that the average rating is not a good reflection of the true value of the product.

This has created flaws in how we sort items. Many people have realized that sorting online search results by their rating, whether the objects be books, videos, or online comments, return poor results. Often the seemingly top videos or comments have perfect ratings only from a few enthusiastic fans, and truly more quality videos or comments are hidden in later pages with falsely-substandard ratings of around 4.8. How can we correct this?

Consider the popular site Reddit (purposefully did not link to the website as you would never come back). The site hosts links to stories or images, and a very popular part of the site are the comments associated with each link. Redditors can vote up or down on each comment (called upvotes and downvotes). Reddit, by default, will sort comments by Top, that is, the best comments. How would you determine which comments are the best? There are a number of ways to achieve this:

  1. Popularity: A comment is considered good if it has many upvotes. A problem with this model is to consider a comment with hundreds of upvotes, but thousands of downvotes. While being very popular, the comment is likely more controversial than best.
  2. Difference: Using the difference of upvotes and downvotes. This solves the above problem, but fails when we consider the temporal nature of comments. Comments can be posted many hours after the original link submission. The difference method will bias the Top comments to be the oldest comments, which have accumulated more upvotes than newer comments, but are not necessarily the best.
  3. Time adjusted: Consider using Difference divided by the age of the comment. This creates a rate, something like difference per second, or per minute. An immediate counter example is, if we use per second, a 1 second old comment with 1 upvote would be better than a 100 second old comment with 99 upvotes. One can avoid this by only considering at least \(t\) second old comments. But what is a good \(t\) value? Does this mean no comment younger than \(t\) is good? We end up comparing unstable quantities with stable quantities (young vs. old comments).
  4. Ratio: Rank comments by the ratio of upvotes to total number of votes (upvotes plus downvotes). This solves the temporal issue, such that new comments who score well can be considered Top just as likely as older comments, provided they have many upvotes to total votes. The problem here is that a comment with a single upvote (ratio = 1.0) will beat a comment with 999 upvotes and 1 downvote (ratio = 0.999), but clearly the latter comment is more likely to be better.

I used the phrase more likely for good reason. It is possible that the former comment, with a single upvote, is in fact a better comment than the later with 999 upvotes. The hesitation to agree with this is because we have not seen the other 999 potential votes the former comment might get. Perhaps it will achieve an additional 999 upvotes and 0 downvotes and be considered better than the latter, though not likely.

What we really want is an estimate of the true upvote ratio. Note that the true upvote ratio is not the same as the observed upvote ratio: the true upvote ratio is hidden, and we only observe upvotes vs. downvotes (one can think of the true upvote ratio as "what is the underlying probability someone gives this comment a upvote, versus a downvote"). So the 999 upvote/1 downvote comment probably has a true upvote ratio close to 1, which we can assert with confidence thanks to the Law of Large Numbers, but on the other hand we are much less certain about the true upvote ratio of the comment with only a single upvote. Sounds like a Bayesian problem to me.

What is our prior distribution though? (Prior? If you are unfamiliar with Bayesian methods, you should totally read the book, or check out this quick intro). One way to determine a prior on the upvote ratio is that look at the historical distribution of upvote ratios. This can be accomplished by scrapping Reddit's comments and determining a distribution. There are a few problems with this idea though:

  1. Skewed data: The vast majority of comments have very few votes, hence there will be many comments with ratios near the extremes of 1.0 and 0, effectively skewing our distribution to the extremes.
  2. Biased data: Reddit is composed of different subpages, called subreddits. Two examples are r/aww, which posts pics of cute animals, and r/politics. It is very likely that the user behaviour towards comments of these two subreddits are very different: visitors are likely friendly and affectionate in the former, and would therefore upvote comments more, compared to the latter, where comments are likely to be controversial and disagreed upon. Therefore not all comments are the same.

In light of these, I think it is better to use a Uniform prior for the true upvote ratio.

Below is an image that was trending Reddit last weekend. We have scrapped all the comments, plus their upvotes and downvotes. Editor's Note: The book, Bayesian Methods for Hackers, scrapes Reddit in real-time, and returns the currently most popular image + comments.

Title of submission: 
The Golden Tortoise Beatle. 
Some Comments (out of 69 total) 
"That's badass as fuck"
upvotes/downvotes:  [34  4]

"What kind of dog is this? "
upvotes/downvotes:  [407  39]

upvotes/downvotes:  [2 0]

"I'm ready for the perpetual downvotes into extreme negative karma, but it's "beetle""
upvotes/downvotes:  [13  2]

For a given true upvote ratio \(p\) and \(N\) total votes, the number of upvotes will look like a Binomial random variable with parameters \(p\) and \(N\). (This is because of the equiviliance between upvote ratio and probability of upvoting versus downvoting, out of \(N\) possible votes/trials). We create a function that performs Bayesian inference on \(p\), for a particular comment's upvote/downvote pair. We are going to use PyMC to determine the posterior distribution of the true upvote ratio for each comment:

import pymc as mc

def posterior_upvote_ratio( upvotes, downvotes, samples = 20000):
    N = upvotes + downvotes
    upvote_ratio = mc.Uniform( "upvote_ratio", 0, 1 )
    observations = mc.Binomial( "obs",  N, upvote_ratio, value = upvotes, observed = True)
    model = mc.Model( [upvote_ratio, observations ]) 
    map_ = mc.MAP(model).fit()
    mcmc = mc.MCMC(model)
    mcmc.sample(samples, samples/4)
    return mcmc.trace("upvote_ratio")[:]

Below are the resulting posterior distributions.

Some distributions are very tight, others have very long tails (relatively speaking), expressing our uncertainity with what the true upvote ratio might be.


We have been ignoring the goal of this exercise: how do we sort the comments from best to worst? Of course, we cannot sort distributions, we must sort scalar numbers. There are many ways to distill a distribution down to a scalar: expressing the distribution through its expected value, or mean, is one way. Choosing the mean bad choice though. This is because the mean does not take into account the uncertainity of distributions.

I suggest using the 95% least plausible value, defined as the value such that there is only a 5% chance the true parameter is lower (think of the lower bound on the 95% credible region). Below are the posterior distributions with the 95% least-plausible value plotted:

The best comments, according to our procedure, are the comments that are most-likely to score a high percentage of upvotes. Visually those are the comments with the 95% least plausible value close to 1.

Why is sorting based on this quantity a good idea? By ordering by the 95% least plausible value, we are being the most conservative with what we think is best. That is, even in the worst case scenario, when we have severely overestimated the upvote ratio, we can be sure the best comments are still on top. Under this ordering, we impose the following very natural properties:

  1. given two comments with the same observed upvote ratio, we will assign the comment with more votes as better (since we are more confident it has a higher ratio).
  2. given two comments with the same number of votes, we still assign the comment with more upvotes as better

But this is too slow for real-time!

I agree, computing the posterior of every comment takes a long time, and by the time you have computed it, likely the data has changed. I delay the mathematics to the appendix, but I suggest using the following formula to compute the lower bound very fast.

$$ \frac{a}{a + b} - 1.65\sqrt{ \frac{ab}{ (a+b)^2(a + b +1 ) } }$$

where \begin{align} & a = 1 + u \\ & b = 1 + d \\ \end{align}

\(u\) is the number of upvotes, and \(d\) is the number of downvotes. The formula is a shortcut in Bayesian inference, which will be further explained in Chapter 6 when we discuss priors in more detail.

Approximate lower bounds:
[ 0.83190291  0.87696185  0.85967479  0.82587042  0.78000822  0.80326236
  0.85773646  0.82101369  0.74537828  0.76867393  0.73158407  0.72491057
  0.75202517  0.70806405  0.6530083   0.66278557  0.60091591  0.6530083
  0.60091591  0.60091591  0.6530083   0.60091591  0.60091591  0.60091591
  0.60091591  0.67526961  0.67526961  0.5997376   0.53055613  0.53055613
  0.53055613  0.60091591  0.53055613  0.53055613  0.53055613  0.53055613
  0.53055613  0.5772799   0.535       0.43047887  0.43047887  0.43047887
  0.43047887  0.43047887  0.43047887  0.43047887  0.43047887  0.43047887
  0.43047887  0.43047887  0.43047887  0.43047887  0.43047887  0.43047887
  0.43047887  0.43047887  0.43047887  0.43047887  0.43047887  0.43047887
  0.43047887  0.43047887  0.43047887  0.43047887  0.43047887  0.45074913
  0.45074913  0.45074913  0.45074913]

Sorted according to approximate lower bounds:

405 44 What kind of dog is this? 
222 26 [I posted this a while back, if you are interested.](http://thewonderfulwild.blogspot.co.uk/2013/01/golden-tortoise-beetle.html)
16 0 If you look really carefully you can actually see 4 tiny elephants... 
1248 224 [Other views](http://imgur.com/a/PLRl0). 

Edit: just now realizing I spelled "beetle" wrong in the title. Awesome. 
116 16 http://en.wikipedia.org/wiki/The_Gold-Bug

This short story by Poe was one of my favorites as a kid.  I was so excited when I learned that there really was a gold beetle.

tl;dr Go read this book by Edgar Allen Poe
101 14 The 1% bug.
80 12 [First thing I thought of] (http://www.retrogameage.com/wp-content/gallery/aladdin/aladdin-u-079.png)
27 3 It always amazes me how beautiful nature can be. Such cool patterns and colors.
44 7 Seriously, what kind of evolutionary advantages are there to looking like a golden tortoise? Especially when you're a beetle? Nature, you crazy.
32 5 I recognize that as the 5th Beatle.
35 6 That's badass as fuck
21 3 It would be nice if you gave credit to the original source and photographer Chime Tsetan: http://www.projectnoah.org/spottings/7201707
7 0 That's a fancy fuckin' exoskeleton you've got there for just spending all your time hanging out on plants and shit, Mr. Golden Tortoise Beetle. Go out and have some adventures while you're all dressed up.
11 1 I remember the first time the first time I saw this in person being really surprised when it moved. I had never (until now) seen or heard about this bug. I saw this in India...Do you know where these bugs are from and why they are so rare?

13 2 I'm ready for the perpetual downvotes into extreme negative karma, but it's "beetle"
9 1 Give it to a bug collector for a rupee bag upgrade
5 0 Wow, I have never wanted this bad to almost maybe see a bug in person... From far away....

^wearing ^a ^protective ^suit

^^and ^^disinfectant 
5 0 It's like a trippy take on the [Great A'tuin](http://www.lspace.org/books/whos-who/atuin.html)
5 0 I'd actually feel a little bad crushing it.
4 0 These things are pretty rare/cool but if you can manage to catch one, the Quidditch game is over and your team gets 150 points.
4 0 I wonder why they call it that.
4 0 this is like that beetle from that really weird point and click kid's game about insects with that strange kid and there's like a dog and a barn, can't remember the details but it was a thing back in the day

it's been found, here's a link if you want to relive it: http://www.youtube.com/watch?v=NpeO491QOEs
4 0 Came here for absurd amounts of bad puns involving the Beatles. Was really disappointed. Then uplifted because of it. 

Then sad again, because my life kind of sucks.
4 0 I could really use a Golden Thief Bug Card...
4 0 [This is a great find! I heard that if you catch one on Booster Hill, you can trade it in at Seaside Town for a Frog Coin!](http://www.youtube.com/watch?v=Q0Pemr2pM-o&feature=player_detailpage#t=21s)
4 0 This photo is from a great app/organization called Project Noah. http://www.projectnoah.org/spottings/7201707
4 0 I think there's something on your contact lens...
12 3 That's gorgeous!  Where was this picture taken?
21 8 Still more talented than Ringo.
13 5 Holy Crap, the golden snitch from harry potter is real?
3 0 What's that transparent lid? That's the amazing part which makes the little thing looks not real...
3 0 Once you find 23 more you should take them to Agitha.
3 0 What's that clear stuff around it then?

3 0 I never thought I would see a bug and think I browsing r/aww
3 0 life is amazing...
3 0 And this will definitely be reposted on [r/lounge](http://www.reddit.com/r/lounge)
3 0 I think i saw one of these on Gintama 
3 0 This is Aspidimorpha sanctaecrucis (aka Golden Tortoise Beetle), spotted in southern India by chimetsetan. Info: http://www.projectnoah.org/spottings/7201707
4 1 That's some Zelda shit right there.
4 1 looks like reddit gold
4 1 Will you accept your payment in beetles?
4 1 thats a funny looking "beatle"
2 0 I found the poo-colored cousin of these in my invertebrate zoology class. Here's what it looked like http://richardpeters.typepad.com/.a/6a0120a7aae27b970b015432749d57970c-800wi 
2 0 More pictures and info here:  http://entomology.ifas.ufl.edu/creatures/veg/potato/golden_tortoise_beetle.htm
2 0 What an unfortunate, beautiful evolution. 

I'm surprised it's still around being that so many animals fucking *love* shiny things. 

2 0 I want to make buttons out of them!
2 0 Any good evolutionary explanations for its appearance? Where are these native to?
2 0 There is an intelligent designer....and he's FABULOUS 
2 0 Golden Tortoise - I just had a flash back to The Dark Tower Series - S.King... I'm sure if you read it you will remember
2 0 My God. Incredible. What a wondrous world we have. 
2 0 Beautiful, isn't it? 
2 0 Can it actually fly with those wings? Nature, you cray.
2 0 Absolutely wonderful 
2 0 How much is it worth?  
2 0 This insect is awesome. 
2 0 That is awesome
2 0 That is the most beautiful bug I have ever seen


Derivation of sorting comments formula

Basically what we are doing is using a Beta prior (with parameters \(a=1, b=1\), which is a uniform distribution), and using a Binomial likelihood with observations \(u, N = u+d\). This means our posterior is a Beta distribution with parameters \(a' = 1 + u, b' = 1 + (N - u) = 1+d\). We then need to find the value, \(x\), such that 0.05 probability is less than \(x\). This is usually done by inverting the CDF, but the CDF of the beta, for integer parameters, is known but is a large sum [3].

We instead using a Normal approximation. The mean of the Beta is \(\mu = a'/(a'+b')\) and the variance is

$$\sigma^2 = \frac{a'b'}{ (a' + b')^2(a'+b'+1) }$$

Hence we solve the following equation for \(x\) and have an approximate lower bound.

$$ 0.05 = \Phi\left( \frac{(x - \mu)}{\sigma}\right) $$


  1. Probabilistic Programming and Bayesian Methods for Hackers. 163rd . 2013. eBook. .
  2. Patil, A., D. Huard and C.J. Fonnesbeck. 2010. PyMC: Bayesian Stochastic Modelling in Python. Journal of Statistical Software, 35(4), pp. 1-81