手搓神经网络——Fashion MNIST训练

相关文章：哼，就是整活儿😎😎😎。之前写了《手搓神经网络——BP反向传播》一文。此本便是基于前文的“实战”，基于之前手搓的神经网络框架，实现 Fashion MNIST 训练。毕竟实践出真知，在前篇中的验证仅验证了神经网络框架的自动求导正确与否…加载数据集使用 TensorFlow 加载 Fashion MNIST对训练数据集打乱，设置批大小定义激活函数、损失函数这个在《手搓神经网络——BP反

此心安處是吾鄉_

502人浏览 · 2023-12-26 21:13:47

此心安處是吾鄉_ · 2023-12-26 21:13:47 发布

前言

哼，就是整活儿😎😎😎。之前写了《手搓神经网络——BP反向传播》一文。此本便是基于前文的“实战”，基于之前手搓的神经网络框架，实现 Fashion MNIST 训练。毕竟实践出真知，在前篇中的验证仅验证了神经网络框架的自动求导正确与否…

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

加载数据集

使用 TensorFlow 加载 Fashion MNIST

fashion_mnist = tf.keras.datasets.fashion_mnist
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# 数据预处理（归一化）
train_images = train_images / 255.0
test_images = test_images / 255.0
# One-Hot 编码
train_labels_ = tf.one_hot(train_labels, 10).numpy()

对训练数据集打乱，设置批大小

train_set = tf.data.Dataset.from_tensor_slices((train_images, train_labels_)).shuffle(10000).batch(64)
train_set_x = []
train_set_y = []
for x,y in train_set:
    train_set_x.append(x.numpy())
    train_set_y.append(y.numpy())

定义激活函数、损失函数

这个在《手搓神经网络——BP反向传播》中写过了，这里不多言

class ReLU:
    def __call__(self, x):
        return np.maximum(0, x)

    def diff(self, x):
        x_temp = x.copy()
        x_temp[x_temp > 0] = 1
        return x_temp

class Sigmoid:
    def __call__(self, x):
        return 1 / (1 + np.exp(-x))

    def diff(self, x):
        return x * (1 - x)

class MSE:
    def __call__(self, true, pred):
        return np.mean(np.power(pred - true, 2), keepdims=True)

    def diff(self, true, pred):
        return pred - true

relu = ReLU()
sigmoid = Sigmoid()
mse = MSE()

构造框架

关于此部分，笔者优化了框架结构，这样便于后期的神经网络模型构建了（越来越tensor里flow气了）。但其原理还是与《手搓神经网络——BP反向传播》中一样的，关于下方框架的具体详解也在前文当中

本文中，为了便于模型对数据的操作，增加了Flatten层。其实就是展平数据啦🤗

Flatten层将图像格式从二维数组（28 x 28）转换成一维数组（28 x 28 = 784）。将该层视为图像中未堆叠的像素行并将其排列起来。该层没有要学习的参数，它只会重新格式化数据，也因此其update方法里只是...

class Model:
    def __init__(self):
        self.layers = None
        self.loss_fn = None
        self.not_layers = ['layers', 'not_layers', 'loss_fn', 'flatten']

    def compile(self, loss_fn):
        self.loss_fn = loss_fn
        dir_temp = self.__dir__()
        end = dir_temp.index('__module__')
        dir_temp = list(reversed([layer for layer in dir_temp[:end] if layer not in self.not_layers]))
        self.layers = list(map(lambda x: getattr(self, x), dir_temp))

    def fit(self, x, y, epochs, step=100):
        for epoch in range(epochs):
            for x_, y_ in zip(x, y):
                pred = self(x_)
                self.backward(y_, pred)
            if epoch % step == 0:
                print(f'epoch {epoch + 1}, loss={self.loss_fn(y_, pred)}')
        print(f'epoch {epoch + 1}, loss={self.loss_fn(y_, pred)}')

    def backward(self, true, pred):
        grad = self.loss_fn.diff(true, pred)
        for layer in self.layers:
            grad = layer.update(grad)

class Flatten:
    def __call__(self, x):
        batch_size = x.shape[0]
        return np.reshape(x, (batch_size, -1))

    def update(self, grad):
        pass

class Linear:
    def __init__(self, inputs, outputs, activation):
        self.weight = np.random.rand(inputs, outputs)
        self.weight = self.weight / self.weight.sum()
        self.bias = np.random.rand(outputs)
        self.bias = self.bias / self.bias.sum()
        self.activation = activation
        self.x_temp = None
        self.t_temp = None

    def __call__(self, x):
        self.x_temp = x
        self.t_temp = self.activation(x @ self.weight + self.bias)

        return self.t_temp

    def update(self, grad):
        activation_diff_grad = self.activation.diff(self.t_temp) * grad
        new_grad = activation_diff_grad @ self.weight.T
        self.weight -= lr * self.x_temp.T @ activation_diff_grad
        self.bias -= lr * activation_diff_grad.mean(axis=0)
        return new_grad

构建模型

这里…就不必多言了吧🫠🫠🫠，与TensorFlow自定义构建模型的方式大差不差

class NetWork(Model):
    def __init__(self):
        super().__init__()
        self.flatten = Flatten()
        self.linear_1 = Linear(28 * 28, 64, activation=relu)
        self.linear_2 = Linear(64, 10, activation=sigmoid)

    def __call__(self, x):
        x = self.flatten(x)
        x = self.linear_1(x)
        x = self.linear_2(x)

        return x

network = NetWork()

训练模型

学习率lr设置为0.01

networl
- compile
  - 编译模型
  - loss_fn：指定损失函数
- fit
  - 训练模型
  - x：指定训练集
  - y：指定训练标签集
  - epochs：训练周期
  - step：训练日志输出周长

# 设置学习率
lr = 0.01
# 编译模型
network.compile(loss_fn=mse)
# 训练模型
network.fit(x=train_set_x, y=train_set_y, epochs=20, step=5)
==============================
输出：
epoch 1, loss=[[0.06708152]]
epoch 6, loss=[[0.04562008]]
epoch 11, loss=[[0.03327031]]
epoch 16, loss=[[0.03108833]]
epoch 21, loss=[[0.03055817]]
epoch 26, loss=[[0.03024619]]
epoch 30, loss=[[0.02989175]]

评估指标——准确率

计算模型的准确率（Accuracy），打个广告，诶嘿嘿😅，相关文章《混淆矩阵——评估指标计算》

pred = network(test_images).argmax(axis=-1)
true = test_labels
total = len(pred)
correct = np.count_nonzero(np.equal(pred, true))
accuracy = correct / total

print(f'Accuracy: {accuracy}')
==============================
输出：
Accuracy: 0.7734

准确率有 77% 左右，一般般嘛，又…又水了一篇了💧💧💧

但实际上对于这个手搓的框架还是有许多可以优化的地方

优化权重、偏置初始化
使用多分类交叉熵作为损失函数
手搓个优化器
…

AI大模型技术社区

分享最新、最前沿的AI大模型技术，吸纳国内前几批AI大模型开发者

更多推荐

AICon 全球人工智能与机器学习技术大会参会有感

AI大模型技术社区

LLM大模型部署实战指南：Ollama简化流程，OpenLLM灵活部署，LocalAI本地优化，Dify赋能应用开发

LLM大模型部署实战指南：Ollama简化流程，OpenLLM灵活部署，LocalAI本地优化，Dify赋能应用开发 1. Ollama 部署的本地模型() Ollama 是一个开源框架，专为在本地机器上便捷部署和运行大型语言模型（LLM）而设计。，这是 Ollama 的官网地址：https://ollama.com/ 以下是其主要特点和功能概述：简化部署：Ollama 目标在于简化在 Doc

AI大模型技术社区

无缝融入，即刻智能[一]：Dify-LLM大模型平台，零编码集成嵌入第三方系统，42K+星标见证专属智能方案[含ollama部署]

无缝融入，即刻智能[一]：Dify-LLM大模型平台，零编码集成嵌入第三方系统，42K+星标见证专属智能方案 1.Dify 简介 1.1 功能情况 Dify，一款引领未来的开源大语言模型（LLM）应用开发平台，革新性地融合了后端即服务(Backend as a Service,BaaS）与LLMOps的精髓，为开发者铺设了一条从创意原型到高效生产的快车道。其设计旨在打破技术壁垒，让非技术背景的用户