Pytorch笔记 之 torch.nn 模块简介
参考翻译 [What is torch.nn really?](https://pytorch.org/tutorials/beginner/nn_tutorial.html)主要是对 PyTorch 框架的各模块进行简要介绍一定程度上是 PyTorch 的入门笔记假设已经对神经网络相关基础知识有了一定了解(或实现过机器学习梯度下降相关代码)
文章目录
torch.nn
import torch.nn as nn
参考翻译 What is torch.nn really?
主要是对 PyTorch 框架的各模块进行简要介绍
一定程度上是 PyTorch 的入门笔记
假设已经对神经网络相关基础知识有了一定了解
(或实现过机器学习梯度下降相关代码)
before
PyTorch 使用 torch.tensor
,需要将数据进行转换
import torch
x_train, y_train, x_valid, y_valid = map(
torch.tensor,
(x_train, y_train, x_valid, y_valid)
)
x_train.shape
x_train.min()
x_train.max()
map(function, iterable, …)
return iterable
nn.functional
import torch.nn.functional as F
包含 torch.nn
库中所有函数
同时包含大量 loss 和 activation function
import torch.nn.functional as F
loss_func = F.cross_entropy
loss = loss_func(model(x), y)
loss.backward()
其中 loss.backward()
更新模型的梯度,包括 weights 和 bias
PyTorch 中,nn 与 nn.functional 有什么区别?
nn.functional.xxx
是函数接口,nn.Xxx
是.nn.functional.xxx
的类封装,并且nn.Xxx
都继承于一个共同祖先nn.Module
nn.Xxx
除了具有nn.functional.xxx
功能之外,内部附带nn.Module
相关的属性和方法,eg.train()
,eval()
,load_state_dict
,state_dict
- 两者的调用方式不同
nn.Xxx
,实例化 -> 函数调用 -> 传入数据
inputs = torch.rand(64, 3, 28, 28)
conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1)
out = conv(inputs)
nn.functional.xxx
传入数据 和 weight、bias 等其他参数
weight = torch.rand(64, 3, 3, 3)
bias = torch.rand(64)
out = nn.functional.conv2d(inputs, weight, bias, padding=1)
- 能否和
nn.Sequential
结合使用
nn.Xxx
继承于 nn.Module
,能够很好的与 nn.Sequential
结合使用
fm_layer = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, padding=1),
nn.BatchNorm2d(num_features=64),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Droput(0.2)
)
而 nn.functional.xxx
无法与 nn.Sequential
结合使用
- 是否需要自己定义和管理 weight 和 bias 等参数
nn.Xxx
不需要自己定义和管理weight
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=16, padding=0)
self.relu1 = nn.ReLU()
self.maxpool1 = nn.MaxPool2d(kernel_size=2)
self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, padding=0)
self.relu2 = nn.ReLU()
self.maxpool2 = nn.MaxPool2d(kernel_size=2)
self.linear1 = nn.Linear(4 * 4 * 32, 10)
def forward(self, x):
x = x.view(x.size(0), -1)
out = self.maxpool1(self.relu1(self.cnn1(x)))
out = self.maxpool2(self.relu2(self.cnn2(x)))
out = self.linear1(out.view(x.size(0), -1))
return out
nn.functional.xxx
需要自己定义 weight,每次调用的时候都需要手动传入 weight,不利于代码复用
class CNN(nn.Module):
"""docstring for CNN"""
def __init__(self):
super(CNN, self).__init__()
self.cnn1_weight = nn.Parameter(torch.rand(16, 1, 5, 5))
self.bias1_weight = nn.Parameter((torch.rand(16)))
self.cnn2_weight = nn.Parameter(torch.rand(32, 16, 5, 5))
self.bias2_weight = nn.Parameter(torch.rand(32))
self.linear1_weight = nn.Parameter(torch.rand(4 * 4 * 32, 10))
self.bias3_weight = nn.Parameter(torch.rand(10))
def forward(self, x):
x = x.view(x.size(0), -1)
out = F.conv2d(x, self.cnn1_weight, self.bias1_weight)
out = F.relu(out)
out = F.max_pool2d(out)
out = F.conv2d(out, self.cnn2_weight, self.bias2_weight)
out = F.relu(out)
out = F.max_pool2d(out)
out = F.linear(out, self.linear1_weight, self.bias3_weight)
上述两中定义方式得到的 CNN 功能都是相同的
PyTorch 官方推荐:
- 具有学习参数的(eg. conv2d, linear, batch_norm) 采用
nn.Xxx
- 没有学习参数的(eg. maxpool, loss_func, activation func) 等根据个人选择使用
nn.functional.xxx
或nn.Xxx
- 最后,关于 dropout,强烈推荐使用
nn.Xxx
方式,因为一般情况下只有训练阶段才进行 dropout,在 eval 阶段不会进行 dropout。使用nn.Xxx
方法定义 dropout,在调用model.eval()
之后,model 中所有的 dropout layer 都关闭,但以nn.functional.dropout
方式定义 dropout,在调用model.eval()
之后并不能关闭 dropout。需要使用F.dropout(x, trainig=self.training
。
nn.Module & nn.Parameter
继承 nn.Module
,构造一个保存 weights,bias 和具有前向传播方法(forward step)的类
nn.Module
有大量属性和方法(eg. .parameters()
和 .zero_grad()
)
nn.Linear
torch.optim
torch.optim
有各种优化算法,可以使用优化器的 step
来进行前向传播,而不用人工的更新所有参数
opt.step()
opt.zero_grad()
optim.zero_grad()
将所有的梯度置为 0,需要在下个批次计算梯度之前调用
DataLoader
TensorDataset
是 Dataset 的 tensor 包装
from torch.utils.data import TensorDataset
train_ds = TensorDataset(x_train, y_train)
DataLoader
用于管理 batches,便于迭代
from torch.utils.data import DataLoader
train_dl = DataLoader(train_ds, batch_size=32)
迭代训练
model, opt = get_model()
for epoch in range(epochs):
for xb, yb in train_dl:
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
opt.step()
opt.zero_grad()
print(loss_func(model(xb), yb))
Add Validation
在训练过程中计算并打印每个 epoch 的 validation loss
model, opt = get_model()
for epoch in range(epochs):
# 训练前
model.train()
for xb, yb in train_dl:
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
opt.step()
opt.zero_grad()
# 训练后,验证前
# 确保 nn.BatchNorm2d 和 nn.Dropout 采取适当的行为(关闭)
model.eval()
with torch.no_grad():
valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl)
print(epoch, valid_loss / len(valid_dl))
为了简化代码,增强可读性,可以构建 fit()
和 get_data()
函数
def get_data(train_ds, valid_ds, bs):
return (
DataLoader(train_ds, batch_size=bs, shuffle=True),
DataLoader(valid_ds, batch_size=bs *2)
)
def loss_batch(model, loss_func, xb, yb, opt=None):
loss = loss_func(model(xb), yb)
if opt is not None:
loss.backward()
opt.step()
opt.zero_grad()
return loss.item(), len(xb)
import numpy as np
def fit(epochs, model, loss_func, opt, train_dl, valid_dl):
for epoch in range(epochs):
model.train()
# 遍历 batch 中的每个样本
for xb, yb in train_dl:
loss_batch(model, loss_func, xb, yb, opt)
model.eval()
with torch.no_grad():
losses, nums = zip(*[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl])
val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums)
print(epoch, val_loss)
主要代码简化为
train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
model, opt = get_model()
fit(epochs, model, loss_func, opt, train_dl, valid_dl)
nn.Sequential
参考 Keras 中的 Sequential Model
model = nn.Sequential(
Lambda(preprocess),
nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.AvgPool2d(4),
Lambda(lambda x: x.view(x.size(0), -1))
)
其中,可以 PyTorch 没有提供 view layer,需要构造(Sequential中的Lambda)
class Lambda(nn.Module):
def __init__(self, func):
super(Lambda, self).__init__()
self.func = func
def forward(self, x):
return self.func(x)
def preprocess(x):
return x.view(-1, 1, 28, 28)
Using GPU
GPU 和 CPU 训练的模型的加载不一样,参数需要设置
首先,判断 GPU 是否可以使用
print(torch.cuda.is_available())
使用指定 GPU
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
将数据(batch)移到GPU(使用 .to(torch.device("cuda"))
或 .cuda()
)
xb.to(dev) # xb.cuda()
yb.to(dev) # yb.cuda()
最后,需要将模型移到 GPU
model.to(dev) # model.cuda()
更多推荐
所有评论(0)