【笔记】YOLOv7重参数化(RepConV)原理+代码

两粒盐

8436人浏览 · 2023-11-20 20:33:23

两粒盐 · 2023-11-20 20:33:23 发布

一、代码入口

在detect时，将部分结构的参数进行融合，来达到提高推理速度而不损失精度的目的，前提是已经训练好了模型参数。

源代码地址：https://github.com/WongKinYiu/yolov7

该部分内容可以通过 detect.py/detect()函数中

model = attempt_load(weights, map_location=device)

进入attempt_load(),

model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval())

中的fuse()函数。该函数位于models/yolo.py/class Model下：

    def fuse(self):  # fuse model Conv2d() + BatchNorm2d() layers
        print('Fusing layers... ')
        for m in self.model.modules():
            if isinstance(m, RepConv):
                # print(f" fuse_repvgg_block")
                m.fuse_repvgg_block()
            elif isinstance(m, RepConv_OREPA):
                # print(f" switch_to_deploy")
                m.switch_to_deploy()
            elif type(m) is Conv and hasattr(m, 'bn'):
                m.conv = fuse_conv_and_bn(m.conv, m.bn)  # update conv
                delattr(m, 'bn')  # remove batchnorm
                m.forward = m.fuseforward  # update forward
            elif isinstance(m, (IDetect, IAuxDetect)):
                m.fuse()
                m.forward = m.fuseforward
        self.info()
        return self

主要看m.fuse_repvgg_block()，fuse_conv_and_bn(conv, bn)。
（m是IDetect时的融合是隐式implicit部分的融合，非本文重点）

二、Conv+BN

1、原理

其思想是将卷积与BN合并成为一个卷积，BN的公式如下： $\begin{split} \hat{x}_i&=\gamma \cdot \frac{x_i-\mu}{\sqrt{\sigma^2+\varepsilon}}+\beta \\ &=\frac{\gamma}{\sqrt{\sigma^2+\varepsilon}}\cdot x_i+(\beta-\frac{\gamma\cdot \mu}{\sqrt{\sigma^2+\varepsilon}}) \end{split}$ 可以看成 $y = w x + b$ 形式，其中 $w_{BN}=\frac{\gamma}{\sqrt{\sigma^2+\varepsilon}}$ ， $b_{BN}=\beta-\frac{\gamma\cdot \mu}{\sqrt{\sigma^2+\varepsilon}}$ 。

那么Conv+BN过程可以看作： $\begin{split} \hat{x}&=w_{BN}\cdot(w_{conv}\cdot x+b_{conv})+b_{BN}\\ &=(w_{BN}\cdot w_{conv})\cdot x+(w_{BN}\cdot b_{conv}+b_{BN}) \end{split}$ 我们得到合并后的卷积参数： $\begin{cases}w=w_{BN}\cdot w_{conv}\\b=w_{BN}\cdot b_{conv}+b_{BN} \end{cases}$

题外话， $\hat{x}_i=w_{BN}x_i+b_{BN}$ 形式的BN，可以看作是一种特殊的1*1卷积。1*1卷积是将一个值与通道内所有像素相乘，一个通道内进行的处理相同，所以将一个通道（w*h）简化成一个 $\square$ ，就是一层图片，1*1卷积“画”成矩阵形式就是（忽略bias，输入是一张 $ch_{in}*w*h$ 的图片）： $\begin{pmatrix}\square\\\square\\\square\end{pmatrix}_{ch_{out}}=\underbrace{\begin{pmatrix} \square&\square&\square&\square\\\square&\square&\square&\square\\\square&\square&\square&\square\end{pmatrix}_{ch_{out}*ch_{in}}}_{weights}*\begin{pmatrix}\square\\\square\\\square\\\square\end{pmatrix}_{ch_{in}}$ 例如输入通道是4，输出通道是3，weights中的12个框就可以看作是12个1*1的“卷积片”（4层图片转成3层图片，有12个对应关系）。而BN是在每个通道上进行归一化，各通道相互独立，所以weights是个对角阵，BN可以看作输入与输出通道数相同，且一一对应的1*1卷积。 $\begin{pmatrix}\square\\\square\\\square\\\square\end{pmatrix}_{afterBN}=\underbrace{\begin{pmatrix} w_{BN}^{ch1}&0&0&0\\0&w_{BN}^{ch2}&0&0\\0&0&w_{BN}^{ch3}&0\\0&0&0&w_{BN}^{ch4}\end{pmatrix}_{ch_{in}*ch_{in}}}_{weights}*\begin{pmatrix}\square\\\square\\\square\\\square\end{pmatrix}_{beforeBN}$

2、代码

conv与BN合并的代码位于 utils/torch_utils.py/fuse_conv_and_bn(conv, bn)函数内：

def fuse_conv_and_bn(conv, bn):
    # Fuse convolution and batchnorm layers https://tehnokv.com/posts/fusing-batchnorm-and-conv/
    # 创建一个合并后卷积对象，各尺寸与原卷积相同
    fusedconv = nn.Conv2d(conv.in_channels,
                          conv.out_channels,
                          kernel_size=conv.kernel_size,
                          stride=conv.stride,
                          padding=conv.padding,
                          groups=conv.groups,
                          bias=True).requires_grad_(False).to(conv.weight.device)
		
	# 更新参数 
    # prepare filters weights, w = w_bn * w_conv
    w_conv = conv.weight.clone().view(conv.out_channels, -1)  # ch_out*ch_in*ks*ks --> ch_out*(ch_in*ks*ks)
    w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var)))  # ch_out*ch_out
    fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.shape))  # ch_out*ch_in*ks*ks

    # prepare spatial bias, b = w_bn * b_conv + b_bn
    b_conv = torch.zeros(conv.weight.size(0), device=conv.weight.device) if conv.bias is None else conv.bias  # ch_out
    b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps))  # ch_out
    fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)  # ch_out

    return fusedconv

其中，由于卷积的weights是 $ch_{out}*ch_{in}*ks*ks$ 结构的，而BN是每个输入通道（num_feature）一个weight，是个长度为 $ch_{in}$ 的一维向量，BN的 $ch_{in}$ 就等于卷积的 $ch_{out}$ 。为实现两者的矩阵相乘，把 $w_{conv}$ 转化为了 $ch_{out}*(ch_{in}*ks*ks)$ ， $w_{BN}$ 转化为 $ch_{out}*ch_{out}$ 的对角矩阵，相乘后再转化维原来的 $ch_{out}*ch_{in}*ks*ks$ 四维结构。

bias都可以看成一个长度为 $ch_{out}$ 的向量。由于使用BN时，原卷积的bias不起作用，全部设为了0， $b_{conv}=0$ ，所以实际合并时， $b=b_{BN}$ 。

三、RepConv

1、原理

RepConv思想是将训练时检测头中使用的RepVGG block进行重参数化，将RepVGG中的1*1卷积与不进行处理两条路径都转化为一次3*3卷积，再进行融合。
在这里插入图片描述
图源：RepVGG: Making VGG-style ConvNets Great Again

整理一下参数变化的公式：
$\begin{split} \hat{x}_i&=(w_{fuse}^{3*3}x_i+b_{fuse}^{3*3})+(w_{fuse}^{1*1}x_i+b_{fuse}^{1*1})+(w_{BN}^{0*0}x_i+b_{BN}^{0*0})\\&=(w_{fuse}^{3*3}+w_{fuse}^{1*1}+w_{BN}^{0*0})\cdot x_i+(b_{fuse}^{3*3}+b_{fuse}^{1*1}+b_{BN}^{0*0}) \end{split}$ 其中，fuse表示conv+BN融合后的卷积参数。

bias参数只与输出通道数有关，三个bias都是长度为 $ch_{out}$ 的一维向量，可以直接相加。下面主要看weights的变化：

（1）1*1 to 3*3：
由于代码中一般都使用autopadding，也就是卷积时padding=[kernel_size/2]，默认3*3卷积时图像的padding=1，1*1卷积时padding=0，利用这个特点，只要在1*1卷积核外pad一圈0形成3*3卷积核，就可以达到与原1*1卷积相等的结果。
在这里插入图片描述
标黄的即为1*1卷积过程与结果，可以看到，与pad一圈0后的3*3卷积结果相同。放到矩阵里面就是：
$\underbrace{\begin{pmatrix} \square&\square&\square&\square\\\square&\square&\square&\square\\\square&\square&\square&\square\end{pmatrix}_{ch_{out}*ch_{in}}}_{weights}$ 其中每个 $\square$ 都由一片1*1卷积核pad了一圈0，变成上图中kernel所示。（不会画T^T）

（2）0*0 to 3*3
0*0这条支路可以是什么处理都不做，也可以是只进行一次BN操作。什么都不做就相当于一个weights是单位阵的1*1卷积，而BN也是相当于一次1*1卷积，所以weights的矩阵形式如下 $\underbrace{\begin{pmatrix} \square&0&0&0\\0&\square&0&0\\0&0&\square&0\\0&0&0&\square\end{pmatrix}_{ch_{in}*ch_{in}}}_{weights}$ 其中 $\square$ 的填充形式就有如下两种：
在这里插入图片描述

但是可以注意到，0*0分支的weights是 $ch_{in}*ch_{in}$ 的方阵，在 $ch_{out}$ 不等于 $ch_{in}$ 的情况下，无法直接与另外两个分支的weights相加，yolov7中该模块的输入输出通道数就不同。为了解决这个问题，代码中对 $ch_{out}$ 不等于 $ch_{in}$ 的情况，直接将0*0分支的weights等于 $ch_{out}*ch_{in}$ 大小的零矩阵，再与另外两个分支相加； $ch_{out}$ 等于 $ch_{in}$ 时，就按照上面的填充方法。

2、代码

该部分代码位于models/common.py/class RepConv/fuse_repvgg_block()函数中：

    def fuse_repvgg_block(self):
        if self.deploy:
            return
        print(f"RepConv.fuse_repvgg_block")

        self.rbr_dense = self.fuse_conv_bn(self.rbr_dense[0], self.rbr_dense[1])  # 3*3卷积 

        self.rbr_1x1 = self.fuse_conv_bn(self.rbr_1x1[0], self.rbr_1x1[1])
        rbr_1x1_bias = self.rbr_1x1.bias  # ch_out
        # 填充[1,1,1,1]表示在左右上下个填充1个单位，即第三四维(h,w)各增加2
        weight_1x1_expanded = torch.nn.functional.pad(self.rbr_1x1.weight, [1, 1, 1, 1])  # co*ci*(ks+2)*(ks+2)

        # Fuse self.rbr_identity
        if (isinstance(self.rbr_identity, nn.BatchNorm2d) or isinstance(self.rbr_identity, nn.modules.batchnorm.SyncBatchNorm)):
        	# 0*0支路是BatchNorm2d或SyncBatchNorm的前提是out_channels=in_channels，在RepConv的__init__()中可以看到
            identity_conv_1x1 = nn.Conv2d(
                in_channels=self.in_channels,
                out_channels=self.out_channels,
                kernel_size=1,
                stride=1,
                padding=0,
                groups=self.groups,
                bias=False)  # （co, ci, 1, 1）
            identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.to(self.rbr_1x1.weight.data.device)
            identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.squeeze().squeeze()  # (co, ci)
            identity_conv_1x1.weight.data.fill_(0.0)
            identity_conv_1x1.weight.data.fill_diagonal_(1.0)  # 变成一个单位阵，每个元素可看成一个1*1卷积片
            identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.unsqueeze(2).unsqueeze(3)  # (co, ci, 1, 1), 现在我们得到了一个0*0卷积

            identity_conv_1x1 = self.fuse_conv_bn(identity_conv_1x1, self.rbr_identity)  # 与BN融合
            
            bias_identity_expanded = identity_conv_1x1.bias
            weight_identity_expanded = torch.nn.functional.pad(identity_conv_1x1.weight, [1, 1, 1, 1])  # 将每个1*1卷积片pad一圈0，即在第3、4维各加2
        else:
        	# channels_out不等于channels_in,零矩阵
            bias_identity_expanded = torch.nn.Parameter(torch.zeros_like(rbr_1x1_bias))
            weight_identity_expanded = torch.nn.Parameter(torch.zeros_like(weight_1x1_expanded))

        self.rbr_dense.weight = torch.nn.Parameter(
            self.rbr_dense.weight + weight_1x1_expanded + weight_identity_expanded)
        self.rbr_dense.bias = torch.nn.Parameter(self.rbr_dense.bias + rbr_1x1_bias + bias_identity_expanded)

        self.rbr_reparam = self.rbr_dense
        self.deploy = True

        if self.rbr_identity is not None:
            del self.rbr_identity
            self.rbr_identity = None

        if self.rbr_1x1 is not None:
            del self.rbr_1x1
            self.rbr_1x1 = None

        if self.rbr_dense is not None:
            del self.rbr_dense
            self.rbr_dense = None