# Napkin Folding — survival analysis

## Controlling bacterial growth in fermentation with hurdle technology and survival analysis

Posted by **Cameron Davidson-Pilon** at

This article is a nice intersection of some of the topics I've been thinking about lately: bacteria, food, and survival analysis, and part of a larger project I've been working on (stay tuned). The bacteria C. Botulinum is responsible for creating one of the most dangerous chemicals known to man: botulinum toxin. If ingested, incredibly small amounts of this toxin can kill even a healthy person. Thankfully, food scientists and microbiologists have developed ways to control C. Botulinum. Any of...

## L1 Penalty in Cox Regression

Posted by **Cameron Davidson-Pilon** at

In the 00's, L1 penalties were all the rage in statistics and machine learning. Since they induced sparsity in fitted parameters, they were used as a variable selection method. Today, with some advanced models having tens of billions of parameters, sparsity isn't as useful, and the L1 penalty has dropped out of fashion. However, most teams aren't using billion parameter models, and smart data scientists work with simple models initially. Below is how we implemented an L1 penalty in the...

## Non-parametric survival function prediction

Posted by **Cameron Davidson-Pilon** at

As I was developing lifelines, I kept having a feeling that I was gradually moving the library towards prediction tasks. lifelines is great for regression models and fitting survival distributions, but as I was adding more and more flexible parametric models, I realized that I really wanted a model that would predict the survival function — and I didn't care how. This led me to the idea to use a neural net with \(n\) outputs, one output for each parameter...

## SaaS churn and piecewise regression survival models

Posted by **Cameron Davidson-Pilon** at

A software-as-a-service company (SaaS) has a typical customer churn pattern. During periods of no billing, the churn is relatively low compared to periods of billing (typically every 30 or 365 days). This results in a distinct survival function for customers. See below: kmf = KaplanMeierFitter().fit(df['T'], df['E']) kmf.plot(figsize=(11,6)); To borrow a term from finance, we clearly have different regimes that a customer goes through: periods of low churn and periods of high churn, both of which are predictable. This predictability and...

## The Delta-Method and Autograd

Posted by **Cameron Davidson-Pilon** at

One of the reasons I’m really excited about autograd is because it enables me to be able to transform my abstract parameters into business-logic. Let me explain with an example. Suppose I am modeling customer churn, and I have fitted a Weibull survival model using maximum likelihood estimation. I have two parameter estimates: lambda-hat and rho-hat. I also have their covariance matrix, which tells me how much uncertainty is present in the estimates (in lifelines, this is under the variance_matrix_...

## Evolution of lifelines over the past few months

Posted by **Cameron Davidson-Pilon** at

TLDR: upgrade lifelines for lots of improvements pip install -U lifelines During my time off, I’ve spent a lot of time improving my side projects so I’m at least kinda proud of them. I think lifelines, my survival analysis library, is in that spot. I’m actually kinda proud of it now. A lot has changed in lifelines in the past few months, and in this post I want to mention some of the biggest additions and the stories behind them....