Focal & Circle Loss

更新时间：2022-07-05

Focal Loss

简介

Focal Loss的引入主要是为了解决难易样本数量不平衡（注意，有区别于正负样本数量不平衡）的问题，实际可以使用的范围非常广泛。

该损失函数来源于论文Focal Loss for Dense Object Detection，作者利用它改善了图像物体检测的效果。不过Focal Loss完全是一个通用性的Loss，因为在 NLP 中，也存在大量的类别不平衡的任务。

最经典的就是序列标注任务中类别是严重不平衡的，比如在命名实体识别中，显然一句话里边实体是比非实体要少得多，这就是一个类别严重不平衡的情况。

使用示例

这里我们提供一份Focal Loss在文心中具体使用的demo，该demo为分类任务中基于ernie的fc模型 wenxin/models/ernie_fc_classification_with_focal_loss.py，其中有关Focal Loss的实现部分在该模型文件的前向网络forward()中，如下所示。

        fg_num = fluid.layers.reduce_sum(label)
        cast_res2 = fluid.layers.cast(x=fg_num, dtype="int32")
        cast_res2.stop_gradient = True
        label = fluid.layers.cast(x=label, dtype="int32")
        cost = fluid.layers.sigmoid_focal_loss(x=predictions,
                                       label=label,
                                       fg_num=cast_res2,
                                       gamma=2.,
                                       alpha=0.25)
        avg_cost = fluid.layers.mean(x=cost)

Circle Loss

简介

深度特征学习有两种基本范式，分别是使用类标签和使用正负样本对标签进行学习。使用类标签时，一般需要用分类损失函数（比如 softmax + cross entropy）优化样本和权重向量之间的相似度；使用样本对标签时，通常用度量损失函数（比如 triplet 损失）来优化样本之间的相似度。然而这两种方式均存在优化缺乏灵活性和收敛状态不明确的缺点。

因此，Circle Loss 设计了一个更灵活的优化途径，通向一个更明确的优化目标。该方法来源于论文Circle Loss: A Unified Perspective of Pair Similarity Optimization。常用优化方法和新提出的优化方法之间的对比图示如下。

Circle Loss 非常简单，而它对深度特征学习的意义却非常本质，表现为以下三个方面：统一的（广义）损失函数。从统一的相似度配对优化角度出发，它为两种基本学习范式（即使用类别标签和使用样本对标签的学习）提出了一种统一的损失函数；梯度反向传播会根据权重来调整幅度大小。那些优化状态不佳的相似度分数，会被分配更大的权重因子，并因此获得更大的更新梯度。如上图所示，在 Circle Loss 中，A、B、C 三个状态对应的优化各有不同；明确的收敛状态。在这个圆形的决策边界上，Circle Loss 更偏爱特定的收敛状态（如图中的 T）。这种明确的优化目标有利于提高特征鉴别力。

使用示例

这里我们提供一份Circle Loss在文心中具体使用的demo，该demo为匹配任务中gru_pairwise模型 wenxin/models/gru_matching_pairwise.py，其中有关Circle Loss的实现部分在该模型文件的前向网络forward()中，如下所示。

scale = fluid.layers.fill_constant(shape=[1], value=80.0, dtype='float32')
            c_loss = simnet_circle_loss(sp=query_pos_title_score, sn=query_neg_title_score, margin=margin, scale=scale)
            avg_cost = fluid.layers.mean(x=c_loss)

相关OP说明

文心中用于计算Circle Loss的op为simnet_circle_loss()，位于wenxin/modules/wenxin_loss.py，具体实现及其参数说明如下所示。

def simnet_circle_loss(sp, sn, margin, scale):
    """
    sp: score list of positive samples, shape [B * m]
    sn: score list of negative samples, shape [B * n]
    margin: relaxation factor in circle loss function
    scale:  scale factor in circle loss function

    return: circle loss value, shape [1]
    """
    op = 1. + margin
    on = 0. - margin
    delta_p = 1 - margin
    delta_n = margin

    ap = fluid.layers.relu(fluid.layers.elementwise_sub(sp, op) * -1.0)
    ap.stop_gradient =True
    an = fluid.layers.relu(fluid.layers.elementwise_sub(sn, on))
    an.stop_gradient =True

    logit_p = ap * (sp - delta_p)
    logit_p = logit_p * scale * -1.0
    logit_p = fluid.layers.cast(x=logit_p, dtype=np.float64)
    loss_p = fluid.layers.reduce_sum(fluid.layers.exp(logit_p), dim=1, keep_dim=False)

    logit_n = an * (sn - delta_n)
    logit_n = logit_n * scale
    logit_n = fluid.layers.cast(x=logit_n, dtype=np.float64)
    loss_n = fluid.layers.reduce_sum(fluid.layers.exp(logit_n), dim=1, keep_dim=False)

    circle_loss = fluid.layers.log(1 + loss_n * loss_p)
    circle_loss = fluid.layers.cast(x=circle_loss, dtype=np.float32)
    return fluid.layers.mean(circle_loss)

计算过程 Circle Loss的计算过程如下：