The Delta-Method and Autograd

One of the reasons I’m really excited about autograd is because it enables me to be able to transform my abstract parameters into business-logic. Let me explain with an example. Suppose I am modeling customer churn, and I have fitted a Weibull survival model using maximum likelihood estimation. I have two parameter estimates: lambda-hat and rho-hat. I also have their covariance matrix, which tells me how much uncertainty is present in the estimates (in lifelines, this is under the variance_matrix_ property). From this, I can plot the survival curve, which is fine, but what I really want is a measure of lifetime value.

Suppose customers give us $10 each time period for the first 3 time periods, and then $30 each time period afterwards. For a single user, the average LTV (up to timeline) calculation might look like:

# create a Weibull model with fake data
wf = WeibullFitter().fit(np.arange(1, 100))

from autograd import numpy as np

def LTV(params, timeline):
    lambda_, rho_ = params
    sf = np.exp(-(timeline/lambda_) ** rho_)
    clv = 10 * (timeline <= 3) * sf + 30 * (timeline > 3) * sf
    return clv.sum()

timeline = np.arange(10)
params = np.array([wf.lambda_, wf.rho_])

LTV(params, timeline)
# 214.76

This gives me a point estimate. But I also want the variance, since there is uncertainty in lambda-hat and rho-hat. There is a technique called the delta-method that allows me to transform variance from one domain to another. But to use that, I need the gradient of LTV w.r.t to lambda and rho. For this problem, I could probably write out the gradient, but it would be messy and error-prone. I can use autograd instead to do this.

from autograd import grad
gradient_LTV = grad(LTV)

gradient_LTV(params, timeline)
# array([ 0.15527043, 10.78900217])

The output of the above is the gradient at the params value. Very easy. Now, if I want the variance of the LTV, the delta methods tells me to multiply gradient by the covariance matrix and the gradient’s transpose (check: this should return a scalar and not a vector/matrix):

var = gradient_LTV(params, timeline)\
          .dot(wf.variance_matrix_)\
          .dot(gradient_LTV(params, timeline))
# 3.127

So, my best estimate of LTV (at timeline=10) is 214.76 (3.127). From this, we can build confidence intervals, etc.