Menu
Cart

Napkin Folding — lifelines

An accelerated lifetime spline model

Posted by Cameron Davidson-Pilon at

A paper came out recently with a novel accelerated lifetime (AFT) model with cubic splines. This should pique your interest for a few reasons: 1. It helps dethrone the Proportional Hazard (PH) model as the default survival model. People like the PH model because it doesn't make any distributional assumptions. However, like a Trojan horse, there are very strong implicit assumptions that are inherited. Suffice to say, I am not a strong proponent of the PH model.  2. A spline-based AFT...

Read more →

L1 Penalty in Cox Regression

Posted by Cameron Davidson-Pilon at

In the 00's, L1 penalties were all the rage in statistics and machine learning. Since they induced sparsity in fitted parameters, they were used as a variable selection method. Today, with some advanced models having tens of billions of parameters, sparsity isn't as useful, and the L1 penalty has dropped out of fashion. However, most teams aren't using billion parameter models, and smart data scientists work with simple models initially. Below is how we implemented an L1 penalty in the...

Read more →

Non-parametric survival function prediction

Posted by Cameron Davidson-Pilon at

As I was developing lifelines, I kept having a feeling that I was gradually moving the library towards prediction tasks. lifelines is great for regression models and fitting survival distributions, but as I was adding more and more flexible parametric models, I realized that I really wanted a model that would predict the survival function — and I didn't care how. This led me to the idea to use a neural net with \(n\) outputs, one output for each parameter...

Read more →

SaaS churn and piecewise regression survival models

Posted by Cameron Davidson-Pilon at

A software-as-a-service company (SaaS) has a typical customer churn pattern. During periods of no billing, the churn is relatively low compared to periods of billing (typically every 30 or 365 days). This results in a distinct survival function for customers. See below: kmf = KaplanMeierFitter().fit(df['T'], df['E']) kmf.plot(figsize=(11,6)); To borrow a term from finance, we clearly have different regimes that a customer goes through: periods of low churn and periods of high churn, both of which are predictable. This predictability and...

Read more →

Counting and interval censoring analysis

Posted by Cameron Davidson-Pilon at

Let’s say you have an initial population of (micro-)organisms, and you are curious about their survival rates. A common summary statistic of their survival is the half-life. How might you collect data to measure their survival? Since we are dealing with micro-organisms, we can’t track individual lifetimes. What we might do is periodically count the number of organisms still alive. Suppose our dataset looks like: T = [0, 2, 4, 7 ] # in hours N = [1000, 914, 568,...

Read more →

The Delta-Method and Autograd

Posted by Cameron Davidson-Pilon at

One of the reasons I’m really excited about autograd is because it enables me to be able to transform my abstract parameters into business-logic. Let me explain with an example. Suppose I am modeling customer churn, and I have fitted a Weibull survival model using maximum likelihood estimation. I have two parameter estimates: lambda-hat and rho-hat. I also have their covariance matrix, which tells me how much uncertainty is present in the estimates (in lifelines, this is under the variance_matrix_...

Read more →

Evolution of lifelines over the past few months

Posted by Cameron Davidson-Pilon at

TLDR: upgrade lifelines for lots of improvements pip install -U lifelines During my time off, I’ve spent a lot of time improving my side projects so I’m at least kinda proud of them. I think lifelines, my survival analysis library, is in that spot. I’m actually kinda proud of it now. A lot has changed in lifelines in the past few months, and in this post I want to mention some of the biggest additions and the stories behind them....

Read more →