手搓神经网络——Fashion MNIST训练
相关文章:哼,就是整活儿😎😎😎。之前写了《手搓神经网络——BP反向传播》一文。此本便是基于前文的“实战”,基于之前手搓的神经网络框架,实现 Fashion MNIST 训练。毕竟实践出真知,在前篇中的验证仅验证了神经网络框架的自动求导正确与否…加载数据集使用 TensorFlow 加载 Fashion MNIST对训练数据集打乱,设置批大小定义激活函数、损失函数这个在《手搓神经网络——BP反
相关文章:
前言
哼,就是整活儿😎😎😎。之前写了《手搓神经网络——BP反向传播》一文。此本便是基于前文的“实战”,基于之前手搓的神经网络框架,实现 Fashion MNIST 训练。毕竟实践出真知,在前篇中的验证仅验证了神经网络框架的自动求导正确与否…
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
加载数据集
使用 TensorFlow 加载 Fashion MNIST
fashion_mnist = tf.keras.datasets.fashion_mnist
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# 数据预处理(归一化)
train_images = train_images / 255.0
test_images = test_images / 255.0
# One-Hot 编码
train_labels_ = tf.one_hot(train_labels, 10).numpy()
对训练数据集打乱,设置批大小
train_set = tf.data.Dataset.from_tensor_slices((train_images, train_labels_)).shuffle(10000).batch(64)
train_set_x = []
train_set_y = []
for x,y in train_set:
train_set_x.append(x.numpy())
train_set_y.append(y.numpy())
定义激活函数、损失函数
这个在《手搓神经网络——BP反向传播》中写过了,这里不多言
class ReLU:
def __call__(self, x):
return np.maximum(0, x)
def diff(self, x):
x_temp = x.copy()
x_temp[x_temp > 0] = 1
return x_temp
class Sigmoid:
def __call__(self, x):
return 1 / (1 + np.exp(-x))
def diff(self, x):
return x * (1 - x)
class MSE:
def __call__(self, true, pred):
return np.mean(np.power(pred - true, 2), keepdims=True)
def diff(self, true, pred):
return pred - true
relu = ReLU()
sigmoid = Sigmoid()
mse = MSE()
构造框架
关于此部分,笔者优化了框架结构,这样便于后期的神经网络模型构建了(越来越tensor里flow气了)。但其原理还是与《手搓神经网络——BP反向传播》中一样的,关于下方框架的具体详解也在前文当中
本文中,为了便于模型对数据的操作,增加了Flatten
层。其实就是展平数据啦🤗
Flatten
层将图像格式从二维数组(28 x 28)转换成一维数组(28 x 28 = 784)。将该层视为图像中未堆叠的像素行并将其排列起来。该层没有要学习的参数,它只会重新格式化数据,也因此其update
方法里只是...
class Model:
def __init__(self):
self.layers = None
self.loss_fn = None
self.not_layers = ['layers', 'not_layers', 'loss_fn', 'flatten']
def compile(self, loss_fn):
self.loss_fn = loss_fn
dir_temp = self.__dir__()
end = dir_temp.index('__module__')
dir_temp = list(reversed([layer for layer in dir_temp[:end] if layer not in self.not_layers]))
self.layers = list(map(lambda x: getattr(self, x), dir_temp))
def fit(self, x, y, epochs, step=100):
for epoch in range(epochs):
for x_, y_ in zip(x, y):
pred = self(x_)
self.backward(y_, pred)
if epoch % step == 0:
print(f'epoch {epoch + 1}, loss={self.loss_fn(y_, pred)}')
print(f'epoch {epoch + 1}, loss={self.loss_fn(y_, pred)}')
def backward(self, true, pred):
grad = self.loss_fn.diff(true, pred)
for layer in self.layers:
grad = layer.update(grad)
class Flatten:
def __call__(self, x):
batch_size = x.shape[0]
return np.reshape(x, (batch_size, -1))
def update(self, grad):
pass
class Linear:
def __init__(self, inputs, outputs, activation):
self.weight = np.random.rand(inputs, outputs)
self.weight = self.weight / self.weight.sum()
self.bias = np.random.rand(outputs)
self.bias = self.bias / self.bias.sum()
self.activation = activation
self.x_temp = None
self.t_temp = None
def __call__(self, x):
self.x_temp = x
self.t_temp = self.activation(x @ self.weight + self.bias)
return self.t_temp
def update(self, grad):
activation_diff_grad = self.activation.diff(self.t_temp) * grad
new_grad = activation_diff_grad @ self.weight.T
self.weight -= lr * self.x_temp.T @ activation_diff_grad
self.bias -= lr * activation_diff_grad.mean(axis=0)
return new_grad
构建模型
这里…就不必多言了吧🫠🫠🫠,与TensorFlow
自定义构建模型的方式大差不差
class NetWork(Model):
def __init__(self):
super().__init__()
self.flatten = Flatten()
self.linear_1 = Linear(28 * 28, 64, activation=relu)
self.linear_2 = Linear(64, 10, activation=sigmoid)
def __call__(self, x):
x = self.flatten(x)
x = self.linear_1(x)
x = self.linear_2(x)
return x
network = NetWork()
训练模型
学习率lr
设置为0.01
networl
- compile
- 编译模型
- loss_fn:指定损失函数
- fit
- 训练模型
- x:指定训练集
- y:指定训练标签集
- epochs:训练周期
- step:训练日志输出周长
- compile
# 设置学习率
lr = 0.01
# 编译模型
network.compile(loss_fn=mse)
# 训练模型
network.fit(x=train_set_x, y=train_set_y, epochs=20, step=5)
==============================
输出:
epoch 1, loss=[[0.06708152]]
epoch 6, loss=[[0.04562008]]
epoch 11, loss=[[0.03327031]]
epoch 16, loss=[[0.03108833]]
epoch 21, loss=[[0.03055817]]
epoch 26, loss=[[0.03024619]]
epoch 30, loss=[[0.02989175]]
评估指标——准确率
计算模型的准确率(Accuracy),打个广告,诶嘿嘿😅,相关文章《混淆矩阵——评估指标计算》
pred = network(test_images).argmax(axis=-1)
true = test_labels
total = len(pred)
correct = np.count_nonzero(np.equal(pred, true))
accuracy = correct / total
print(f'Accuracy: {accuracy}')
==============================
输出:
Accuracy: 0.7734
准确率有 77% 左右,一般般嘛,又…又水了一篇了💧💧💧
但实际上对于这个手搓的框架还是有许多可以优化的地方
- 优化权重、偏置初始化
- 使用多分类交叉熵作为损失函数
- 手搓个优化器
- …
更多推荐
所有评论(0)