1. LeNet(1998)


    Figure.1 Data flow in LeNet. The input is a handwritten digit, the output a probability over 10 possible outcomes.


  2. AlexNet(2012)


    Figure.2 From LeNet (left) to AlexNet (right).


  3. VGG(2014)


    Figure.3 From AlexNet to VGG that is designed from building blocks.


    import torch
    from torch import nn
    def vgg_block(num_convs, in_channels, out_channels):
        layers = []
        for _ in range(num_convs):
            layers.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
            in_channels = out_channels
        layers.append(nn.MaxPool2d(kernel_size=2, stride=2))
        return nn.Sequential(*layers)
    conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))
    def vgg(conv_arch):
        conv_blks = []
        in_channels = 1
        for (num_convs, out_channels) in conv_arch:
            conv_blks.append(vgg_block(num_convs, in_channels, out_channels))
            in_channels = out_channels
        return nn.Sequential(
            *conv_blks, nn.Flatten(), 
            nn.Linear(out_channels * 7 * 7, 4096), nn.ReLU(), nn.Dropout(0.5),
            nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(0.5), 
            nn.Linear(4096, 10))
  4. NiN (2014)


    Figure.4 Comparing architectures of VGG and NiN, and their blocks.

    NiN块:一个卷积层后面跟两个1x1卷积层 [可以看作按照输入像素逐一去做的全连接层]:步幅1,无填充,输出形状跟卷积层输出一样,起到全连接层的作用。


    import torch
    from torch import nn
    def nin_block(in_channels, out_channels, kernel_size, strides, padding):
        return nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size, strides, padding), nn.ReLU(), 
            nn.Conv2d(out_channels, out_channels, kernel_size=1), nn.ReLU(), 
            nn.Conv2d(out_channels, out_channels, kernel_size=1), nn.ReLU())
    nin_net = nn.Sequential(
        nin_block(1, 96, kernel_size=11, strides=4, padding=0), 
        nn.MaxPool2d(3, stride=2), 
        nin_block(96, 256, kernel_size=5, strides=1, padding=2), 
        nn.MaxPool2d(3, stride=2), 
        nin_block(256, 384, kernel_size=3, strides=1, padding=1), 
        nn.MaxPool2d(3, stride=2), nn.Dropout(0.5), 
        nin_block(384, 10, kernel_size=3, strides=1, padding=1), 
        nn.AdaptiveAvgPool2d((1, 1)), 
  5. GoogLeNet (2014)



    Figure.5 Structure of the Inception block.
    import torch
    from torch import nn
    from torch.nn import functional as F
    class Inception(nn.Module):
        # `c1`--`c4` are the number of output channels for each path
        def __init__(self, in_channels, c1, c2, c3, c4, **kwargs):
            super(Inception, self).__init__(**kwargs)
            # Path 1 is a single 1 x 1 convolutional layer
            self.p1_1 = nn.Conv2d(in_channels, c1, kernel_size=1)
            # Path 2 is a 1 x 1 convolutional layer followed by a 3 x 3
            # convolutional layer
            self.p2_1 = nn.Conv2d(in_channels, c2[0], kernel_size=1)
            self.p2_2 = nn.Conv2d(c2[0], c2[1], kernel_size=3, padding=1)
            # Path 3 is a 1 x 1 convolutional layer followed by a 5 x 5
            # convolutional layer
            self.p3_1 = nn.Conv2d(in_channels, c3[0], kernel_size=1)
            self.p3_2 = nn.Conv2d(c3[0], c3[1], kernel_size=5, padding=2)
            # Path 4 is a 3 x 3 maximum pooling layer followed by a 1 x 1
            # convolutional layer
            self.p4_1 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
            self.p4_2 = nn.Conv2d(in_channels, c4, kernel_size=1)
        def forward(self, x):
            p1 = F.relu(self.p1_1(x))
            p2 = F.relu(self.p2_2(F.relu(self.p2_1(x))))
            p3 = F.relu(self.p3_2(F.relu(self.p3_1(x))))
            p4 = F.relu(self.p4_2(self.p4_1(x)))
            # Concatenate the outputs on the channel dimension
            return torch.cat((p1, p2, p3, p4), dim=1)



    Figure.6 The GoogLeNet architecture.
    b1 = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
                       nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
    b2 = nn.Sequential(nn.Conv2d(64, 64, kernel_size=1),
                       nn.Conv2d(64, 192, kernel_size=3, padding=1),
                       nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
    b3 = nn.Sequential(Inception(192, 64, (96, 128), (16, 32), 32),
                       Inception(256, 128, (128, 192), (32, 96), 64),
                       nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
    b4 = nn.Sequential(Inception(480, 192, (96, 208), (16, 48), 64),
                       Inception(512, 160, (112, 224), (24, 64), 64),
                       Inception(512, 128, (128, 256), (24, 64), 64),
                       Inception(512, 112, (144, 288), (32, 64), 64),
                       Inception(528, 256, (160, 320), (32, 128), 128),
                       nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
    b5 = nn.Sequential(Inception(832, 256, (160, 320), (32, 128), 128),
                       Inception(832, 384, (192, 384), (48, 128), 128),
    google_net = nn.Sequential(b1, b2, b3, b4, b5, nn.Linear(1024, 10))
  6. ResNet(2015)

    • 残差块: f ( x ) = x + g ( x ) f(x) = x + g(x) f(x)=x+g(x)


      Figure.7 ResNet block with and without 1x1 convolution.
    • ResNet架构:类似VGG和GoogLeNet的总体架构,但替换成了ResNet块。

    • 残差块使得很深的网络更加容易训练,甚至可以训练一千层的网络;残差网络对随后的深层神经网络设计产生了深远影响,无论是卷积类网络还是全连接类网络。


瓜分20万奖金 获得内推名额 丰厚实物奖励 易参与易上手
