经典卷积神经网络复现

基于pytorch将经典的卷积神经网络架构复现一下

卷积神经网络基础知识

数据又表格数据过渡到图像的像素数据，网络结构由全连接的多层感知机过渡到卷积结构。

卷积神经网络就是将空间不变性这一概念系统化，基于这个模型使用较少的参数学习有用的表示。

where is Weldo?

卷积神经网络的输入：n * n * 3的图片向量

卷积神经网络的输出：类别标签/类别向量(经过softmax归一化后)

通道：图像一般包含3个通道/3种原色(RGB)，一个颜色就是一个色彩通道。

感受野：卷积神经网络在图片上设定的区域大小

步幅：感受野在图片上移动的距离

填充：感受野在移动过程中可能会超出图片的像素范围，超出范围的地方用数字补值的方式就是填充

滤波器：让每个感受野都只有一组参数从而达到简化的方法。

卷积层：感受野+参数共享

下采样：把图像偶数列都拿掉，奇数行都拿掉，图像变成为原来的 1/4，但是不会影响里面是什么

东西。

汇聚：汇聚没有参数只是一个操作，将滤波器产生的结果进行分组后按要求汇聚(取最大/取平均)

池化层：取窗口中的最大值最为输出结果，然后滑动窗口减少其空间尺寸

LeNet-5

卷积神经网络的开山之作，1998年由LeCun Yang提出。

网络结构图

LeNet-5

复现代码

import torch
from torch import nn

#LeNet-5一共有7层，输入图像大小为32*32*1，输出对应10个类别的条件概率
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5,self).__init__()
        self.layer1 = nn.Sequential(
          	#输入channels：1；输出channels：6，卷积核：5*5
          	#Conv2d(1,6,5,1,0)的参数依次是：输入通道数，输出通道数，卷积核大小、步长、填充
            nn.Conv2d(1,6,5,1,0),
            nn.ReLU(inplace=True),
          	#2*2卷积核，步长为2进行最大池化
            nn.MaxPool2d(kernel_size=2,stride=2)
        )
        self.layer2 = nn.Sequential(
          	#输入channels：6；输出channels：16，卷积核：5*5
            nn.Conv2d(6,16,5,1,0),
            nn.ReLU(inplace=True),
          	#2*2卷积核，步长为2进行最大池化
            nn.MaxPool2d(kernel_size=2,stride=2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),#将卷积后的16@5*5图像输出展平
            nn.Linear(400,120),#线性层
            nn.ReLU(inplace=True),
            nn.Linear(120,84),#线性层
            nn.ReLU(inplace=True),
            nn.Linear(84,10),#线性层
            nn.Softmax(dim=1)
        )

    #前向传播计算
    def forward(self,x):
        x = self.layer1(x)
        x = self.layer2(x)
        print(x.shape)
        x = self.classifier(x)
        return x

AlexNet

2014年ImageNet比赛的冠军

网络结构图

AlexNet

结构细节

卷积核大小如何确定？

虽然说目前有比较火的研究方向针对这种自动神经网络结构搜索（NAS），这些自动搜索出来的网络在常规数据集上的建模结果显示（当然是达到一定的准确度）：自动搜索出来的网络中的卷积核的类别有包括各种常见的型号（3 * 3、5 * 5、7*7），且在网络中的前后排布没有规律。

nn.BatchNorm2d()的作用？

在深度神经网络中，梯度消失是一个常见的问题。BatchNorm2d通过对激活函数前添加归一化层，抑制了梯度消失的问题，从而加速了优化过程。

BatchNorm2d通过对数据的归一化处理，使得权重初始化的影响减小，无论权重的初始值如何，都可以通过归一化和仿射变换得到较好的效果。

nn.Dropout(0.5,inplace=True)的作用？

nn.Dropout模块的作用是在训练过程中随机关闭一部分神经元，以增加模型的泛化能力。通常，我们在全连接层之后应用dropout，以避免破坏卷积层中重要的空间信息。

一般建议在10%-50%之间设置dropout比例

复现代码

import torch
from torch import nn

#AlexNet的输入为224*224*3图像，输出为1000个类别的条件概率
class AlexNet(nn.Module):
    '''
    Neural network model consisting of layers proposed by AlexNet paper
    '''
    def __init__(self):
        super(AlexNet,self).__init__()
        self.layer1 = nn.Sequential(
          	#输入channels：3；输出channels：96，卷积核：11*11
            nn.Conv2d(3,96,11,4,0),
            nn.BatchNorm2d(96),
            nn.ReLU(inplace=True),
          	#3*3卷积核，步长为2进行最大池化
            nn.MaxPool2d(3,2)
        )
        self.layer2 = nn.Sequential(
          	#输入channels：96；输出channels：256，卷积核：5*5
            nn.Conv2d(96,256,5,1,2),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
          	#3*3卷积核，步长为2进行最大池化
            nn.MaxPool2d(3,2)
        )
        self.layer3 = nn.Sequential(
          	#输入channels：256；输出channels：384，卷积核：3*3
            nn.Conv2d(256,384,3,1,1),
            nn.BatchNorm2d(394),
            nn.ReLU(inplace=True)
        )
        self.layer4 = nn.Sequential(
          	#输入channels：384；输出channels：384，卷积核：3*3
            nn.Conv2d(384,384,3,1,1),
            nn.BatchNorm2d(394),
            nn.ReLU(inplace=True)
        )
        self.layer5 = nn.Sequential(
          	#输入channels：384；输出channels：256，卷积核：3*3
            nn.Conv2d(384,256,3,1,1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
          	#3*3卷积核，步长为2进行最大池化
            nn.MaxPool2d(3,2)
        )
    
        self.classifier = nn.Sequential(
            nn.Dropout(0.5,inplace=True),
            nn.Linear(256*6*6,4096),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5,inplace=True),
            nn.Linear(4096,4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096,1000),
            nn.Softmax(dim=1)
        )

    #前向传播计算
    def forward(self,x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer4(x)
        x = self.layer5(x)
        output = self.classifier(x)
        return output

VGG

2014年ImageNet比赛的亚军

网络结构图

VGG

每一个长方体“板子”就是一张图片数据的一个tensor，搭建网络的过程中关注每一块“板子”之间的变化即可。

复现代码

import torch
from torch import nn

'''
基于pytorch搭建VGG卷积神经网络模型-16层网络结构
'''

class VGGmodel(nn.Module):
    def __init__(self, *args, **kwargs):
        super(VGGmodel,self).__init__(*args, **kwargs)
        self.features = nn.Sequential(
            nn.Conv2d(3,64,3,1,1),
            nn.ReLU(True),
            nn.Conv2d(64,64,3,1,1),
            nn.ReLU(True),
            nn.MaxPool2d(3,2),
            nn.Conv2d(128,256,3,1,1),
            nn.ReLU(True),
            nn.Conv2d(256,256,3,1,1),
            nn.ReLU(True),
            nn.Conv2d(256,256,3,1,1),
            nn.ReLU(True),
            nn.MaxPool2d(3,2),
            nn.Conv2d(256,512,3,1,1),
            nn.ReLU(True),
            nn.Conv2d(512,512,3,1,1),
            nn.ReLU(True),
            nn.Conv2d(512,512,3,1,1),
            nn.ReLU(True),
            nn.MaxPool2d(3,2),
            nn.Conv2d(512,512,3,1,1),
            nn.ReLU(True),
            nn.Conv2d(512,512,3,1,1),
            nn.ReLU(True),
            nn.Conv2d(512,512,3,1,1),
            nn.ReLU(True),
            nn.MaxPool2d(3,2)
        )
        self.classifier = nn.Sequential(
            nn.Linear(512*7*7,4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096,4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096,1000)
        )
        self._initialize_weights()
        
    def forward(self,x):
        x = self.features(x)
        x = nn.Flatten(x)
        x = self.classifier(x)
        return x

VGG网络只是对网络的层书进行堆叠，并没有进行结构性的创新，不过加深网络深度确实可以提高模型效果。

GoogleNet

GoogleNet也被叫做InceptionNet，它解决了一个核心问题是选择多大的卷积核是最合适的问题。在GoogLeNet中，基本的卷积块被称为Inception块（Inception block）。

Inception块的架构

动手学深度学习：Inception块由四条并行路径组成。前三条路径使用窗口大小为1×1、3×3和5×5的卷积层，从不同空间大小中提取信息。中间的两条路径在输入上执行1×1卷积，以减少通道数，从而降低模型的复杂性。第四条路径使用3×3最大汇聚层，然后使用1×1卷积层来改变通道数。

难点1是如何并行设计出网络结构？
难点2是并行输入输出的通道数如何计算？

网络结构图

GoogleNet

class Inception(nn.Module):
    #channel1-channel4是每条路径的输出通道数
    def __init__(self, in_channels,channel1, channel2,channel3,channel4,**kwargs):
        super(Inception,self).__init__(**kwargs)
        #路径1:单层1*1卷积层
        self.path1 = nn.Conv2d(in_channels,channel1,1)
        #路径2:1x1卷积层后接3x3卷积层
        self.path2_1 = nn.Conv2d(in_channels,channel2[0],1)
        self.path2_2 = nn.Conv2d(channel2[0],channel2[1],3,1,1)
        #路径3:1x1卷积层后接5x5卷积层
        self.path3_1 = nn.Conv2d(in_channels,channel3[0],1)
        self.path3_2 = nn.Conv2d(channel3[0],channel3[1],5,1,2)
        #路径4:3x3最大汇聚层后接1x1卷积层
        self.path4_1 = nn.MaxPool2d(3,1,1)
        self.path4_2 = nn.Conv2d(in_channels,channel4,1)
        
    def forward(self,x):
        p1 = nn.ReLU(self.path1(x))
        p2 = nn.ReLU(self.path2_2(nn.ReLU(self.path2_1(x))))
        p3 = nn.ReLU(self.path3_2(nn.ReLU(self.path3_1(x))))
        p4 = nn.ReLU(self.path4_2(self.path4_2(x)))
        return torch.cat((p1,p2,p3,p4),dim=1)

class GoogleNet(nn.Module):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        #第一个模块使用64个通道、7*7卷积层
        self.b1 = nn.Sequential(
            nn.Conv2d(1,64,7,2,3),
            nn.ReLU(True),
            nn.MaxPool2d(3,2,1)
        )
        #第二个模块使用两个卷积层：第一个卷积层是64个通道、1*1卷积层；第二个卷积层使用将通道数量增加三倍的卷积层
        self.b2 = nn.Sequential(
            nn.Conv2d(64,64,1),
            nn.ReLU(True),
            nn.Conv2d(64,192,3,1,1),
            nn.ReLU(True),
            nn.MaxPool2d(3,2,1)
        )
        #第三个模块串联两个完整的Inception块
        #难点是理解各个路径通道数的计算
        self.b3 = nn.Sequential(
            Inception(192,64,(96,128),(16,32)),
            Inception(256,128,(128,92),(32,94)),
            nn.MaxPool2d(3,2,1)
        )
        #第四模块串联了5个Inception块
        self.b4 = nn.Sequential(
            Inception(480, 192, (96, 208), (16, 48), 64),
            Inception(512, 160, (112, 224), (24, 64), 64),
            Inception(512, 128, (128, 256), (24, 64), 64),
            Inception(512, 112, (144, 288), (32, 64), 64),
            Inception(528, 256, (160, 320), (32, 128), 128),
            nn.MaxPool2d(3,2,1)
        )
        #第五模块包含两个Inception块
        self.b5 = nn.Sequential(
            Inception(832, 256, (160, 320), (32, 128), 128),
            Inception(832, 384, (192, 384), (48, 128), 128),
            nn.AdaptiveAvgPool2d((1,1)),
            nn.Flatten()
        )
        self.classifier = nn.Linear(1024,10)
        
        
    def forward(self,x):
        x = nn.Sequential(self.b1,self.b2,self.b3,self.b4,self.classifier)
        return x

ResNet

2015年ImageNet比赛的冠军，通过残差模块能够成功训练出152层深度的残差网络。

设计的灵感来源于：加深神经网络的时候会出现一个Degradation，准确率上升到达饱和，在持续增加深度会导致模型的准确率下降。

假设一个比较浅的网络达到了饱和准确率，那么在它后面加上几层恒等映射层误差不会增加，也就是说更深的模型不会是模型的效果下降。恒等映射的思想就是ResNet的灵感来源。

网络结构图

ResNet

实现代码

import torch
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l

'''
ResNet网络结构复现
'''

#残差块的定义
class ResiDual(nn.Module):
    def __init__(self, in_channels,num_channels,use_1X1conv = False,stride = 1,*kwargs):
        super(ResiDual,self).__init__(**kwargs)
        #定义第一个3*3的卷积层
        self.conv1 = nn.Conv2d(in_channels,num_channels,3,1,1)
        #定义第二个3*3的卷积层
        self.conv2 = nn.Conv2d(num_channels,num_channels,3,1)
        
        if use_1X1conv:
            self.conv3 = nn.Conv2d(in_channels,num_channels,1,stride=stride)
        else:
            self.conv3 = None
        #定义批量规范化层1
        self.bn1 = nn.BatchNorm2d(num_channels)
        #定义批量规范化层2
        self.bn2 = nn.BatchNorm2d(num_channels)
    
    def forward(self,x):
        y = nn.ReLU(self.bn1(self.conv1(x)))
        y = self.bn2(self.conv2(y))
        if self.conv3:
            x = self.conv3(x)
        y += x
        return F.relu(y)

#定义ResNet架构
class ResNet(nn.Module):
    def __init__(self, *args, **kwargs):
        super(ResNet,self).__init__(*args, **kwargs)
        # 初始化与GoogleNet相似
        self.b1 = nn.Sequential(
            nn.Conv2d(1,64,7,2,3),
            nn.BatchNorm2d(64),
            nn.ReLU(True),
            nn.MaxPool2d(3,2,1)
        )
        # 定义残差块
        def ResBlock(in_channels,num_channels,num_residuals,first_block=False):
            block = []
            for i in range(num_residuals):
                if i == 0 and not first_block:
                    block.append(ResiDual(in_channels,num_channels,use_1X1conv=True,stride=2))
                else:
                    block.append(ResiDual(num_channels,num_channels))
            return block
        
        # 在ResNet加入所有残差块
        self.b2 = nn.Sequential(
            *ResBlock(64,64,2,first_block=True),
        )
        
        self.b3 = nn.Sequential(
            *ResBlock(64,128,2)
        )
        
        self.b4 = nn.Sequential(
            *ResBlock(128,256,2)
        )
        
        self.b5 = nn.Sequential(
            *ResNet(256,512,2)
        )
        
    
    #前向传播计算
    def forword(self,x):
        x = nn.Sequential(
            self.b1,self.b2,self.b3,self.b4,self.b5,
            nn.AdaptiveAvgPool2d((1,1)),
            nn.Flatten(),
            nn.Linear(512,10)
        )
        return

总结

1.卷积神经网络的基本概念一定要熟练掌握

2.对于使用pytorch搭建神经网络模型要熟练掌握代码编写的过程与逻辑

3.对于步长和填充值的计算要理解，特别是GoogleNet和ResNet的padding和stride值的计算

4.如何根据网络架构图取复现代码是理解网络结构和pytorch工具箱的结合

人工智能与深度学习

#深度学习

经典卷积神经网络复现

http://example.com/2024/11/15/经典卷积神经网络复现/

作者

Munger Yang

发布于

2024年11月15日

许可协议

EchoSight论文研读复现上一篇

C++五子棋下一篇