相关文章:

前言

哼,就是整活儿😎😎😎。之前写了《手搓神经网络——BP反向传播》一文。此本便是基于前文的“实战”,基于之前手搓的神经网络框架,实现 Fashion MNIST 训练。毕竟实践出真知,在前篇中的验证仅验证了神经网络框架的自动求导正确与否…

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

加载数据集

使用 TensorFlow 加载 Fashion MNIST

fashion_mnist = tf.keras.datasets.fashion_mnist
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# 数据预处理(归一化)
train_images = train_images / 255.0
test_images = test_images / 255.0
# One-Hot 编码
train_labels_ = tf.one_hot(train_labels, 10).numpy()

对训练数据集打乱,设置批大小

train_set = tf.data.Dataset.from_tensor_slices((train_images, train_labels_)).shuffle(10000).batch(64)
train_set_x = []
train_set_y = []
for x,y in train_set:
    train_set_x.append(x.numpy())
    train_set_y.append(y.numpy())

定义激活函数、损失函数

这个在《手搓神经网络——BP反向传播》中写过了,这里不多言

class ReLU:
    def __call__(self, x):
        return np.maximum(0, x)

    def diff(self, x):
        x_temp = x.copy()
        x_temp[x_temp > 0] = 1
        return x_temp

class Sigmoid:
    def __call__(self, x):
        return 1 / (1 + np.exp(-x))

    def diff(self, x):
        return x * (1 - x)

class MSE:
    def __call__(self, true, pred):
        return np.mean(np.power(pred - true, 2), keepdims=True)

    def diff(self, true, pred):
        return pred - true

relu = ReLU()
sigmoid = Sigmoid()
mse = MSE()

构造框架

关于此部分,笔者优化了框架结构,这样便于后期的神经网络模型构建了(越来越tensor里flow气了)。但其原理还是与《手搓神经网络——BP反向传播》中一样的,关于下方框架的具体详解也在前文当中

本文中,为了便于模型对数据的操作,增加了Flatten层。其实就是展平数据啦🤗

Flatten层将图像格式从二维数组(28 x 28)转换成一维数组(28 x 28 = 784)。将该层视为图像中未堆叠的像素行并将其排列起来。该层没有要学习的参数,它只会重新格式化数据,也因此其update方法里只是...

class Model:
    def __init__(self):
        self.layers = None
        self.loss_fn = None
        self.not_layers = ['layers', 'not_layers', 'loss_fn', 'flatten']

    def compile(self, loss_fn):
        self.loss_fn = loss_fn
        dir_temp = self.__dir__()
        end = dir_temp.index('__module__')
        dir_temp = list(reversed([layer for layer in dir_temp[:end] if layer not in self.not_layers]))
        self.layers = list(map(lambda x: getattr(self, x), dir_temp))

    def fit(self, x, y, epochs, step=100):
        for epoch in range(epochs):
            for x_, y_ in zip(x, y):
                pred = self(x_)
                self.backward(y_, pred)
            if epoch % step == 0:
                print(f'epoch {epoch + 1}, loss={self.loss_fn(y_, pred)}')
        print(f'epoch {epoch + 1}, loss={self.loss_fn(y_, pred)}')

    def backward(self, true, pred):
        grad = self.loss_fn.diff(true, pred)
        for layer in self.layers:
            grad = layer.update(grad)

class Flatten:
    def __call__(self, x):
        batch_size = x.shape[0]
        return np.reshape(x, (batch_size, -1))

    def update(self, grad):
        pass

class Linear:
    def __init__(self, inputs, outputs, activation):
        self.weight = np.random.rand(inputs, outputs)
        self.weight = self.weight / self.weight.sum()
        self.bias = np.random.rand(outputs)
        self.bias = self.bias / self.bias.sum()
        self.activation = activation
        self.x_temp = None
        self.t_temp = None

    def __call__(self, x):
        self.x_temp = x
        self.t_temp = self.activation(x @ self.weight + self.bias)

        return self.t_temp

    def update(self, grad):
        activation_diff_grad = self.activation.diff(self.t_temp) * grad
        new_grad = activation_diff_grad @ self.weight.T
        self.weight -= lr * self.x_temp.T @ activation_diff_grad
        self.bias -= lr * activation_diff_grad.mean(axis=0)
        return new_grad

构建模型

这里…就不必多言了吧🫠🫠🫠,与TensorFlow自定义构建模型的方式大差不差

class NetWork(Model):
    def __init__(self):
        super().__init__()
        self.flatten = Flatten()
        self.linear_1 = Linear(28 * 28, 64, activation=relu)
        self.linear_2 = Linear(64, 10, activation=sigmoid)

    def __call__(self, x):
        x = self.flatten(x)
        x = self.linear_1(x)
        x = self.linear_2(x)

        return x

network = NetWork()

训练模型

学习率lr设置为0.01

  • networl
    • compile
      • 编译模型
      • loss_fn:指定损失函数
    • fit
      • 训练模型
      • x:指定训练集
      • y:指定训练标签集
      • epochs:训练周期
      • step:训练日志输出周长
# 设置学习率
lr = 0.01
# 编译模型
network.compile(loss_fn=mse)
# 训练模型
network.fit(x=train_set_x, y=train_set_y, epochs=20, step=5)
==============================
输出:
epoch 1, loss=[[0.06708152]]
epoch 6, loss=[[0.04562008]]
epoch 11, loss=[[0.03327031]]
epoch 16, loss=[[0.03108833]]
epoch 21, loss=[[0.03055817]]
epoch 26, loss=[[0.03024619]]
epoch 30, loss=[[0.02989175]]

评估指标——准确率

计算模型的准确率(Accuracy),打个广告,诶嘿嘿😅,相关文章《混淆矩阵——评估指标计算》

pred = network(test_images).argmax(axis=-1)
true = test_labels
total = len(pred)
correct = np.count_nonzero(np.equal(pred, true))
accuracy = correct / total

print(f'Accuracy: {accuracy}')
==============================
输出:
Accuracy: 0.7734

准确率有 77% 左右,一般般嘛,又…又水了一篇了💧💧💧

但实际上对于这个手搓的框架还是有许多可以优化的地方

  • 优化权重、偏置初始化
  • 使用多分类交叉熵作为损失函数
  • 手搓个优化器
Logo

分享最新、最前沿的AI大模型技术,吸纳国内前几批AI大模型开发者

更多推荐