An L½ penalty in Cox Regression
Posted by Cameron DavidsonPilon on
Following up from a previous blog post where we explored how to implement an \(L_1\) and elastic net penalty to induce sparsity, a paper, by Xu Z B, Zhang H, Wang Y, et al., explores what a \(L_{1/2}\) penalty is and how to implement it.
But first, I think we are familiar with an \(L_1\) penalty, but what is an \(L_0\) penalty then? If you work out the math, it is a penalty that counts the number of nonzero coefficients, independent of the magnitude of the coefficients:
$$ll^*(\theta, x) = \sum_i^N ll(\theta, x_i)  \lambda \sum_{k=0}^D 1_{\theta_k \ne 0}$$
where \(D\) is the number of potential parameters. Thinking about this for a moment, this means that the \(L_0\) penalty minimizes the AIC, since the AIC is:
$$AIC = 2 ll + 2D^*$$
where \(D^*\) is the number of parameters in the model. It turns out that \(L_0\) penalties encourage lots of sparsity in their solutions, much more than \(L_1\).
Given that, the \(L_{1/2}\) penalty is the balance between penalizing the magnitudes of the coefficients and encouraging lots of sparsity. The paper linked above gives reasons why \(L_{1/2}\) is perhaps superior to both \(L_1\) and \(L_0\). Importantly, solving \(L_0\) is NPhard because it involves a combinatorial explosion of potential solutions that can't be solved with gradient methods.
The authors provide a very simple algorithm for solving the \(L_{1/2}\) problem, see Section 3 of the paper. It involves repeatedly solving a related \(L_1\) problem with updating coefficientspecific penalizer values. In lifelines, we recently introduced the ability to set specific coefficient penalizer values, and we can solve \(L_1\) problems too. Let's see if we can solve \(L_{1/2}\) problems now:
def l_one_half_cox(lambda_, df, T, E):
EPSILON = 0.00001
weights = lambda_ * np.ones(df.shape[1]2)
cph = CoxPHFitter(l1_ratio=1.0, penalizer=weights)
cph.fit(rossi, "week", "arrest")
max_iter = 20
i = 1
while i < max_iter:
weights = lambda_ / (np.sqrt((cph.params_.abs()).values) + EPSILON)
cph = CoxPHFitter(l1_ratio=1.0, penalizer=weights)
cph.fit(rossi, "week", "arrest")
i += 1
return cph.params_
In the above code, we repeatedly solve a new \(L_1\) problem with updated penalizer weights. This, according to the authors and my own assumption that it can be extended easily to the Cox model, gives us our \(L_{1/2}\) solution. Graphically, we can vary the `lambda_` parameter can see how the coefficient solutions change:
Compare this to our \(L_1\) solution:
Related Posts

Highlights from lifelines v0.25.0
Today, the 0.25.0 release of lifelines was released. I'm ...
Latest Data Science screencasts available
Comments