An L½ penalty in Cox Regression

Following up from a previous blog post where we explored how to implement an $L_1$ and elastic net penalty to induce sparsity, a paper, by Xu Z B, Zhang H, Wang Y, et al., explores what a $L_{1/2}$ penalty is and how to implement it.

But first, I think we are familiar with an $L_1$ penalty, but what is an $L_0$ penalty then? If you work out the math, it is a penalty that counts the number of non-zero coefficients, independent of the magnitude of the coefficients:

$$ll^*(\theta, x) = \sum_i^N ll(\theta, x_i) - \lambda \sum_{k=0}^D 1_{\theta_k \ne 0}$$

where $D$ is the number of potential parameters. Thinking about this for a moment, this means that the $L_0$ penalty minimizes the AIC, since the AIC is:

$$AIC = -2 ll + 2D^*$$

where $D^*$ is the number of parameters in the model. It turns out that $L_0$ penalties encourage lots of sparsity in their solutions, much more than $L_1$.

Given that, the $L_{1/2}$ penalty is the balance between penalizing the magnitudes of the coefficients and encouraging lots of sparsity. The paper linked above gives reasons why $L_{1/2}$ is perhaps superior to both $L_1$ and $L_0$. Importantly, solving $L_0$ is NP-hard because it involves a combinatorial explosion of potential solutions that can't be solved with gradient methods.

The authors provide a very simple algorithm for solving the $L_{1/2}$ problem, see Section 3 of the paper. It involves repeatedly solving a related $L_1$ problem with updating coefficient-specific penalizer values. In lifelines, we recently introduced the ability to set specific coefficient penalizer values, and we can solve $L_1$ problems too. Let's see if we can solve $L_{1/2}$ problems now:

def l_one_half_cox(lambda_, df, T, E):
    EPSILON = 0.00001

    weights = lambda_ * np.ones(df.shape[1]-2)
    cph = CoxPHFitter(l1_ratio=1.0, penalizer=weights)
    cph.fit(rossi, "week", "arrest")
    max_iter = 20
    i = 1

    while i < max_iter:
        weights = lambda_ / (np.sqrt((cph.params_.abs()).values) + EPSILON)
        cph = CoxPHFitter(l1_ratio=1.0, penalizer=weights)
        cph.fit(rossi, "week", "arrest")
        i += 1

    return cph.params_

In the above code, we repeatedly solve a new $L_1$ problem with updated penalizer weights. This, according to the authors and my own assumption that it can be extended easily to the Cox model, gives us our $L_{1/2}$ solution. Graphically, we can vary the `lambda_` parameter can see how the coefficient solutions change:

Compare this to our $L_1$ solution: