多标签分类的CrossEntropyLoss到底需不需要One-Hot编码

有两个class，每个class有两个dimension，10.50.511& 0.5\\0.5&110.50.51是他未经过softmax处理的raw logit,label是10[1,0]10,求一下loss，用CrossEntropyLoss。

sadwqwe

1524人浏览 · 2023-03-05 19:12:25

sadwqwe · 2023-03-05 19:12:25 发布

个人主页:https://yang1he.gitee.io
干货会越来越多的，欢迎来玩

多标签分类的CrossEntropyLoss到底需不需要One-Hot编码

今天看文献发现了这个问题，还是基础不牢，查了一下午资料才搞懂。不过发现了好多其他的小点，比较方便用。

问题描述

在读某篇文章时，看到这样描述

标签用的One-Hot编码，看自己的源码，有点奇怪

loss = nn.CrossEntropyLoss()
loss=loss(input,label)

label直接用的一个 $class\times1$ 的一个向量啊，并没有One-hot（如果One-Hot,那我用的应该是 $class\times 类别数K$ 的一个向量），CrossEntropyLoss是多分类的损失函数吗，这个class是指的多分类的"多"，我错了吗？

先说结论：都没错，只不过公式用One Hot，但是torch框架下nn.CrossEntropyLoss()是不需要one hot编码的。

求证过程

其实只是要个结果上面就可以了，下面可能有点啰嗦，但是愿意看的话，散乱的小知识还蛮多的。

官网描述

首先看一下官网的描述：

torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=- 100, reduce=None, reduction=‘mean’, label_smoothing=0.0)

The input is expected to contain the unnormalized logits[^1] for each class (which do not need to be positive or sum to 1, in general). input has to be a Tensor of size (C) for unbatched input, (minibatch,C) or (minibatch,C,d1,d2,…,d**K) with K≥1 for the K-dimensional case. The last being useful for higher dimension inputs, such as computing cross entropy loss per-pixel for 2D images.

[logit, logistic和sigmoid的区别 - 知乎 (zhihu.com)](https://zhuanlan.zhihu.com/p/358223959#:~:text=The (logit) vector of raw (non-normalized) predictions that,typically become an input to the softmax function.)

CrossEntropyLoss代码推导

没有提到One-hot编码的事情，但是明白了一个比较重要的小细节如上。并且提出一个例子

# Example of target with class indices
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()

# Example of target with class probabilities
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5).softmax(dim=1)
output = loss(input, target)
output.backward()

这个例子也只是说label可以用含有更多信息的概率值。

其次是学了一套方法，生成哑变量与哑变量（或者叫one-hot编码）的回溯。

#生成one-hot编码
import torch.nn.functional as F
target = F.one_hot(torch.empty(3, dtype=torch.long).random_(5), num_classes=5)   # n为类别数
#one-hot编码回溯 用torch.full
target = autograd.Variable(torch.LongTensor([1, 0, 4]))
labels = torch.full(size=(N, C), fill_value=0)
labels.scatter_(dim=1, index=torch.unsqueeze(target, dim=1), value=1)
print('labels is {}'.format(labels))
"""
labels is tensor([[0., 1., 0., 0., 0.],
                  [1., 0., 0., 0., 0.],
                  [0., 0., 0., 0., 1.]])
"""

没找到能解答我疑惑的，可能是我对CrossEntropyLoss的理解不够透彻，自己出了个小例子来计算。

CrossEntropyLoss定义，以及手撕题

有两个class，每个class有两个dimension， $\left[\begin{array}{} 1& 0.5\\ 0.5&1 \end{array}\right]$ 是他未经过softmax处理的raw logit,label是 $[1, 0]$ ,求一下loss，用CrossEntropyLoss

解

$C ross E n t ro p y L oss = l o g S o f t ma x + N LLL oss$

所以，为了方便理解，先讲一下Logsoftmax、以及NLLLoss做一个铺垫,

Logsoftmax定义：
$f_i(x)=log(\frac{e^{(x_i)}}{\sum e^{x_i}})$
就是先取softmax再log，接着是NLLlloss:
$f(x,y)=-\frac{1}{N}\sum x_i*y_i$
NLLLoss得到损失。 $x$ 和 $y$ 分别代表input和label，N带表的是你的classes

真懂了吗，解这个计算题

有两个class，每个class有两个dimension， $\left[\begin{array}{} 1& 0.5\\ 0.5&1 \end{array}\right]$ 是他未经过softmax处理的raw logit,label是 $[1, 0]$ ,求一下loss，用CrossEntropyLoss

解：套公式；自己先手算下哈

step1:计算logsoftmax
$f_i(x)=\left[\begin{array}{} log(\frac{e^1}{e^1+e^{0.5}}) & log(\frac{e^0.5}{e^1+e^{0.5}})\\ log(\frac{e^{0.5}}{e^1+e^{0.5}})&log(\frac{e^1}{e^1+e^{0.5}}) \end{array}\right]\\=\left[\begin{array}{} -0.9471& -0.4741\\ -0.4717&-0.9471 \end{array}\right]$
step2:算NLLloss

首先，label给的不是one-hot,如果分两步算的话，需要先one-hot编码，label就成了 $\left[\begin{array}{} 0&1\\ 1&0 \end{array}\right]$ ，然后
$\begin{aligned} Loss&=-\frac{1}{2}\left([-0.9471,-0.4741]\begin{bmatrix} 0 \\ 1 \end{bmatrix}+[-0.4741,-0.9471]\begin{bmatrix} 1 \\ 0 \end{bmatrix} \right)\\&=0.4741 \end{aligned}$
算对了吗同志？

注意到了吗，这里是先做的one-hot编码

因此那，本质上只是用Torch的代码的时候，他帮你做了onehot编码，所以你可以忽略这一步。

最后：

CrossEntropyLoss 的完整定义：
$\begin{equation} \text{loss}(x, y) = -\sum_{i=1}^{C} y_i \log(\text{softmax}(x)_i) \end{equation}$
其中， $C$ 是分类数， $y_i$ 表示样本 $x$ 的真实标签在第 $i$ 个类别上的概率， $\text{softmax}(x)_i$ 表示样本 $x$ 在第 $i$ 个类别上的预测概率。

AtomGit 开源协作平台测评赛

瓜分20万奖金获得内推名额丰厚实物奖励易参与易上手

更多推荐

ADS1292R 使用过程心电图高精度ADC模块

文章目录1 Fundamentals ofPrecision ADC Noise Analysis 精密模数转换器噪声分析基础1 Fundamentals ofPrecision ADC Noise Analysis 精密模数转换器噪声分析基础https://www.ti.com.cn/cn/lit/wp/slyy192/slyy192.pdf?ts=1600659610730&ref_u

开放原子开发者工作坊

实现一个家庭安防与环境监测系统（一）

开放原子开发者工作坊

【cf】Codeforces Round #774 (Div. 2) 前4题

题目A. Square Counting 简单数学题目大意题解代码B. Quality vs Quantity 排序题目大意题解代码C. Factorials and Powers of Two 状态压缩dp+位运算题目大意题解代码D. Weight the Tree 树形dp+dfs题目大意题解代码E. Power Board 看起来像是数论？许多年没打cf了，偶尔打了一盘，恢复紫名了。A. S