施工実績

Focal Loss was introduced by Lin et al

2022.06.23

Focal Loss was introduced by Lin et al

Per this case, the activation function does not depend in scores of other classes mediante \(C\) more than \(C_1 = C_i\). So the gradient respect esatto the each risultato \(s_i\) sopra \(s\) will only depend on the loss given by its binary problem.

Caffe: Sigmoid Cross-Entropy Loss Layer
Pytorch: BCEWithLogitsLoss
TensorFlow: sigmoid_cross_entropy.

Focal Loss

, from Facebook, mediante this paper. They claim onesto improve one-tirocinio object detectors using Focal Loss sicuro train a detector they name RetinaNet. Focal loss is per Ciclocampestre-Entropy Loss that weighs the contribution of each sample puro the loss based in the classification error. The pensiero is that, if a sample is already classified correctly by the CNN, its contribution puro the loss decreases. With this strategy, they claim puro solve the problem of class imbalance by making the loss implicitly focus per those problematic classes. Moreover, they also weight the contribution of each class puro the lose in per more explicit class balancing. They use Sigmoid activations, so Focal loss could also be considered verso Binary Ciclocross-Entropy Loss. We define https://datingranking.net/it/connecting-singles-review/ it for each binary problem as:

Where \((1 – s_i)\gamma\), with the focusing parameter \(\modo >= 0\), is verso modulating factor puro veterano the influence of correctly classified samples sopra the loss. With \(\qualita = 0\), Focal Loss is equivalent puro Binary Cross Entropy Loss.

Where we have separated formulation for when the class \(C_i = C_1\) is positive or negative (and therefore, the class \(C_2\) is positive). As before, we have \(s_2 = 1 – s_1\) and \(t2 = 1 – t_1\).

The gradient gets per bit more complex paio onesto the inclusion of the modulating factor \((1 – s_i)\gamma\) durante the loss formulation, but it can be deduced using the Binary Ciclocross-Entropy gradient expression.

Where \(f()\) is the sigmoid function. To get the gradient expression for verso negative \(C_i (t_i = 0\)), we just need onesto replace \(f(s_i)\) with \((1 – f(s_i))\) sopra the expression above.

Ratto that, if the modulating factor \(\genere = 0\), the loss is equivalent onesto the CE Loss, and we end up with the same gradient expression.

Forward pass: Loss computation

Where logprobs[r] stores, verso each element of the batch, the sum of the binary ciclocampestre entropy a each class. The focusing_parameter is \(\gamma\), which by default is 2 and should be defined as a layer parameter durante the net prototxt. The class_balances can be used to introduce different loss contributions verso class, as they do con the Facebook paper.

Backward pass: Gradients computation

Per the specific (and usual) case of Multi-Class classification the labels are one-hot, so only the positive class \(C_p\) keeps its term mediante the loss. There is only one element of the Target vector \(t\) which is not nulla \(t_i = t_p\). So discarding the elements of the summation which are zero coppia preciso target labels, we can write:

This would be the pipeline for each one of the \(C\) clases. We batteria \(C\) independent binary classification problems \((C’ = 2)\). Then we sum up the loss over the different binary problems: We sum up the gradients of every binary problem sicuro backpropagate, and the losses to monitor the global loss. \(s_1\) and \(t_1\) are the risultato and the gorundtruth label for the class \(C_1\), which is also the class \(C_i\) per \(C\). \(s_2 = 1 – s_1\) and \(t_2 = 1 – t_1\) are the conteggio and the groundtruth label of the class \(C_2\), which is not per “class” durante our original problem with \(C\) classes, but verso class we create preciso serie up the binary problem with \(C_1 = C_i\). We can understand it as verso preparazione class.