8 great data blogs to follow

Posted by Cameron Davidson-Pilon at

Below I've listed my favourite data analysis, data science, or otherwise technical blogs that I've learned a great deal from. Big +1's to the blogs' authors for providing all these ideas and intellectual property for public access. The list is in no particular order - and it's only blogs I remember, so if your blog isn't here, I may have just forgotten it ;)

1. Andrew Gelman's Statistical Modeling, Causal Inference, and Social Science

Gelman is probably the leader in modern Bayesian inference - he's the author of the Bayesian Data Analysis, so popular that it can be referred to by its initials, BDA, and everyone knows what you are talking about. His blog is very active (and has an associated twitter account too) and he has great discussions on modelling, exposing bad analysis, MCMC, and statistical inference. One time he mentioned Bayesian Methods for Hackers and my heart melted.

Selected articles by Gelman

2. Simply Statistics

Jeff Leek and co. are doing a great job with Simply Statistics. The blog is less technical than Gelman's, and focuses more on where statistics fits in with science, big data and data science. Recently, they have embarked on an amazing project: replicating the data analysis in Piketty's "Capitalism in the 21st Century".

Selected articles from Simply Statistics

3. Evan Miller's Blog,

Miller's blog articles, no matter how old, keep appearing on popular news sites, and it's well deserved. I still recall how excited I got reading "How not to run a A/B test" for the first time. Recently, he's been playing around with Bayesian A/B testing and survival analysis too, so clearly he is awesome.

Selected articles from

4. Rasmus Bååth's Research Blog,

Rasmus blew my mind with his terrific articles on Bayesian testing (below). His writing style is very clean with lots of custom graphics - you can tell he takes his time writing his articles. Rasmus is also the author of Bayesian First Aid, a bayesian testing framework for R.

Selected articles from

5. Jake Vanderplas' Pythonic Perambulations

Vanderplas, who is likely a robot and doesn't sleep, has made great contributions to the Python ecosystem: he's the author of mpld3, a translation of matplotlib figures to D3 for ipython notebooks, the amazing xkcd matplotlib styles, and he's been a leader in teaching python data analysis through conferences and lectures. His blog is an extension of his work: amazing tutorials, projects, and all very readable. It's really really difficult to only pick a sample to present:

Selected articles from Python Permuations

6. Allen Downey's Probably Overthinking It

Downey, probably the most prolific writer on this list, is the author of the "Think [Statistics, Python, Bayes, Complexity]" series. His blog is often his sketch pad before the book, and is full of fun articles. When learning survival analysis myself, I kept going back to his article (below) just to reinforce the application.

Selected articles from Probably Overthinking It

7. Abraham Flaxman's Healthy Algorithms

Without Flaxman's blog, I would probably not have understand Bayesian computations ideas. During my 2-day seclusion to grok Bayesian methods, and later while I was developing my tools, I constantly read and reread his articles on PyMC. His blog is still very active, and the research he produces on it (and yes, it is research) is terrific.

Selected articles from Healthy Algorithms

8. Yhat's blog

The team at Yhat have a really good blog, mostly of guest bloggers doing really cool things with data. Yhat is also the author of the python port of ggplot (which is pretty remarkable that it was done at all).

Selected articles from Yhat

Honourable Articles


Related Posts

Latest Data Science screencasts available


Leave a comment

Please note: comments will be approved before they are published