【pytorch】nn.CrossEntropyLoss & nn.BCELoss & nn.BCEWithLogitsLoss

交叉熵loss

呜哇哇

1347人浏览 · 2023-03-26 16:26:38

呜哇哇 · 2023-03-26 16:26:38 发布

nn.CrossEntropyLoss

torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=- 100, reduce=None, reduction='mean', label_smoothing=0.0)

计算 logits 和 target 之间的交叉熵。

logits：不需要是标准化后的数据（也就是不需要和为 1 ），假设输入维度为 N x H x W x C
traget：对应输入维度应当是 N x H x W ，每个值表示了真值类别的 index。

请添加图片描述

patameters：

weight：类别的权重。如若需要，它应当是一个一维张量，表示了 C 个类别的权重。在训练集不平衡时很有效。
reduction：[‘none’, ‘mean’, ‘sum’]，默认为 ‘mean’
- none：直接输出 tensor 形式的 tensor
- mean：输出 loss 的平均值
- sum：输出 loss 的和
size_average、reduce：已弃用
ingore_index：在计算loss时，忽略某个或某些类。

举个例子

import torch
import torch.nn as nn

def crossEntropyLoss(output,target):
    criterion = nn.CrossEntropyLoss(reduction='mean')
    return criterion(output,target)

if __name__ == "__main__":
	output = torch.tensor([[5,5],[2,8]],dtype=torch.float32)
    target = torch.tensor([0,1]).long()
    print("cross:{}".format(crossEntropyLoss(output,target)))

nn.BCELoss

torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')

计算 input 和 target 之间的二元交叉熵。只能解决二分类问题。一般与nn.sigmoid()一起用。

input：维度不定，
target：维度应与 logit 相同，值应当在 [0, 1]

patameters

weight：一个 batch 的损失的权重。可以手动调整每个 batch 的权重
reduction：[‘none’, ‘mean’, ‘sum’]，默认为 ‘mean’
- none：直接输出 tensor 形式的 tensor
- mean：输出 loss 的平均值
- sum：输出 loss 的和
size_average、reduce：已弃用

举个例子

import torch
import torch.nn as nn

def bceLoss(output,target):
    criterion = nn.BCELoss(reduction='mean')
    output = torch.sigmoid(output)
    print("output: {}".format(output))
    return criterion(output,target)   


if __name__ == "__main__":
    output = torch.tensor([[3],[2]],dtype=torch.float32) 	# 每个样本属于该类的分数
    targer = torch.tensor([[1],[0]],dtype=torch.float32) 	# ground truth
    print(bceLoss(output,targer).item())

二元交叉熵作多标签分类问题（相当于多个二分类问题）

ATTENTION

当x_n=0/1时，会有精度问题，见下图：

解决方法：没有理解，下次理解

nn.CrossEntropyLoss & nn.BCELoss

在做二分类任务时，nn.CrossEntropyLoss = torch.softmax + nn.BCELoss。 但是在单个二分类任务中，torch.softmax + nn.BCELoss没有什么意义。

对于nn.BCELoss来说，在图像分割前景和后景时，每个样本对应的 target 为 1 或 0 ，前者表示属于前景，后者表示属于后景。（因为不属于前景的一定属于后景，所以可以用一个类别来表示）
对于nn.CrossEntropyLoss来说，在图像分割前景和后景时，每个样本对应的 target 为 [1,0] 或 [0,1] ，前者表示属于前景，后者表示属于后景。（虽然p(前景)+p(后景)=1，但是必须将前景和后景分为两类来做，因为如果只用一个类别的话，那么每个像素对应的 logits 就只有一个值，这样经过 softmax 后，值永远为 1 ，模型无法收敛！）

import torch
import torch.nn as nn

output = torch.tensor([[5,5],[2,8]],dtype=torch.float32) 	# 2x2 图像 作为输入
output = torch.softmax(output,dim=-1) 	# 按行进行 softmax
print(output)
target = torch.tensor([[[0,1],[1,0]]).float() 	# 表示个每个像素点的类别真值
bce=nn.BCELoss()
print("bce:{}".format(bce(output,target)))

# 如果用 nn.CrossEntropyLoss 计算：
'''
	由于上文中 softmax 是按行来计算的，所以输入应当转换为 [[5,5],[5,5],[8,2],[2,8]]
'''
output = torch.tensor([[5,5],[5,5],[8,2],[2,8]],dtype=torch.float32)
target = torch.tensor([0,1,1,0]).long()
print("cross:{}".format(crossEntropyLoss(output,target)))

# output
tensor([[0.5000, 0.5000],
        [0.0025, 0.9975]])
bce:3.347815990447998
cross:3.347811698913574

nn.BCEWithLogitsLoss

torch.nn.BCEWithLogitsLoss(weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=None)

nn.BCEWithLogitsLoss = nn.sigmoid + nn.BCELoss

这个版本在数值上比使用普通 Sigmoid + BCELoss 更稳定，因为通过将操作组合到一个层中，我们利用了 log-sum-exp 技巧来实现数值稳定性。
请添加图片描述

patameters:

weight：一个 batch 的损失的权重。可以手动调整每个 batch 的权重
reduction：[‘none’, ‘mean’, ‘sum’]，默认为 ‘mean’
- none：直接输出 tensor 形式的 tensor
- mean：输出 loss 的平均值
- sum：输出 loss 的和
size_average、reduce：已弃用
pos_weight：对正样本的权重，是一个与类别数等长的向量

举个例子

output = torch.tensor([[3],[2]],dtype=torch.float32)
    target = torch.tensor([[1],[0]],dtype=torch.float32)
    loss2=nn.BCEWithLogitsLoss()
    print("BCEWithLogitsLoss:",loss2(output,target)) 	# 不需要sigmoid

ATTENTION

为什么通过设置nn.BCEWithLogitsLoss的参数pos_weight可以影响recall和precision？
- 在二分类问题中，BCEWithLogitsLoss通常用于计算sigmoid输出的二进制交叉熵损失。这个损失函数的计算方式会受到正负样本不平衡的影响，这意味着在训练集中，某些类别的样本数可能比其他类别的样本数更多或更少。
  当正负样本不平衡时，模型可能会倾向于预测具有更多样本的类别，导致较少的类别的预测效果不佳。为了解决这个问题，可以使用pos_weight参数来平衡正负样本之间的权重差异，从而调整模型在预测不同类别时的重要性。
  设置pos_weight参数的值通常是通过计算正样本数和负样本数之间的比率来确定的。如果正样本的数量少于负样本，则可以将pos_weight设置为负样本数除以正样本数的比率，以加权损失函数的正样本部分，从而使模型更关注正样本的预测。反之亦然。
  通过调整pos_weight参数，模型可以更加关注某些类别的预测，从而提高这些类别的准确率和召回率。（来自chatgpt）

AtomGit 开源协作平台测评赛

瓜分20万奖金获得内推名额丰厚实物奖励易参与易上手

更多推荐

ADS1292R 使用过程心电图高精度ADC模块

文章目录1 Fundamentals ofPrecision ADC Noise Analysis 精密模数转换器噪声分析基础1 Fundamentals ofPrecision ADC Noise Analysis 精密模数转换器噪声分析基础https://www.ti.com.cn/cn/lit/wp/slyy192/slyy192.pdf?ts=1600659610730&ref_u

开放原子开发者工作坊

实现一个家庭安防与环境监测系统（一）

开放原子开发者工作坊

【cf】Codeforces Round #774 (Div. 2) 前4题

题目A. Square Counting 简单数学题目大意题解代码B. Quality vs Quantity 排序题目大意题解代码C. Factorials and Powers of Two 状态压缩dp+位运算题目大意题解代码D. Weight the Tree 树形dp+dfs题目大意题解代码E. Power Board 看起来像是数论？许多年没打cf了，偶尔打了一盘，恢复紫名了。A. S