前言

本章主要是演示了一个深度学习框架的设计过程，进一步的熟悉深度学习框架。

实战MNIST手写体数字识别

首先我们了解一下MNIST数据集的数据标签，即y_train_label

首先打印数据集的前十个标签

import numpy as np

x_train = np.load("../MNIST/mnist/mnist/x_train.npy")
y_train_label = np.load("../MNIST/mnist/mnist/y_train_label.npy")

print(y_train_label[:10])

打印了10个字符，意思就是图像3的标签，对应的就是3这个数字字符。

但是需要注意的是，对于提取出来的MNIST的特征值，默认使用一个0~9的数值进行标注，但是这种标注方法并不能使损失函数获得一个好的结果，因此常用one_hot计算方法；

one_hot计算方法大概就是我们存在0~9十个数字，表示为[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]，那么数字2即可表示为[0, 0, 1, 0, 0, 0, 0, 0, 0, 0]，数字9即可表示为[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]。

当然这种文本特征表示方法也有一定的缺陷，矩阵稀疏、语义缺失等；这不是本文研究的范围，就不过多赘述。

可以编写代码来看下one_hot的使用方法和结果：

import numpy as np
import torch

x_train = np.load("../MNIST/mnist/mnist/x_train.npy")
y_train_label = np.load("../MNIST/mnist/mnist/y_train_label.npy")

# print(y_train_label[:10])
x = torch.tensor(y_train_label[:5], dtype=torch.int64)
y = torch.nn.functional.one_hot(x, 10)
print(x)
print("--------------")
print(y)

可以看到输出结果确实和我们举例的是一样的。

也就是说，MNIST数据集的标签实际上就是一个表示60000幅图片的60000 x 10大小的矩阵张量[60000, 10]；行数指的使数据集中的图片为60000幅，后面是10个列向量；

张量：多维数据容器，张量的维度是可以任意的；举例：一个三维张量（三阶张量）可以看作是一个立方体，每个元素有三个索引，一张彩色图片，其中两个维度表示图片的宽和高，第三个维度表示颜色通道。

模型的准备

上一篇文章说过，对于模型的设计，最关键的一点就是了解输出和输入的数据结构类型。

为了首先对输入图像进行数字分类这个想法，需要一个合适的模型；对于上文中对图像的分析来说，最直观的就是对图像的所有属性都进行计算，即MLP（多层感知机）；

占位：多层感知机详解

而在损失函数的的选择中，我们选择了CrossEntropyLoss，该损失函数计算输入值和目标值之间的交叉熵损失；交叉熵损失函数可用于训练一个单标签或者多标签类别的分类问题。

占位：CrossEntropyLoss

基于PyTorch的代码实现

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import torch
import numpy as np
from tqdm import tqdm

batch_size = 320 #批次数
epochs = 1024 #训练次数

device = "cuda"

# 设定多层感知机网络模型
class NeuralNetwork(torch.nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = torch.nn.Flatten()
        self.linear_relu_stack = torch.nn.Sequential(
            torch.nn.Linear(28 * 28, 312),
            torch.nn.ReLU(),
            torch.nn.Linear(312, 256),
            torch.nn.ReLU(),
            torch.nn.Linear(256, 10),
        )
    def forward(self, input):
        x = self.flatten(input)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()
model = model.to(device) #将计算模型传入GPU硬件等待计算
# model = torch.compile(model) #加速计算速度
loss_fu = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr = 2e-5) #设定优化器

# 载入数据
x_train = np.load("../MNIST/mnist/mnist/x_train.npy")
y_train_label = np.load("../MNIST/mnist/mnist/y_train_label.npy")
train_num = len(x_train)//batch_size

# 开始计算
for epoch in range(20):
    train_loss = 0
    for i in range(train_num):
        start = i * batch_size
        end = (i + 1) * batch_size
        train_batch = torch.tensor(x_train[start:end]).to(device)
        label_batch = torch.tensor(y_train_label[start:end]).to(device)

        pred = model(train_batch)
        loss = loss_fu(pred, label_batch)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        train_loss += loss.item()

    train_loss /= train_num
    accuracy = (pred.argmax(1) == label_batch).type(torch.float32).sum().item() / batch_size
    print("train_loss: ", round(train_loss, 2), " accuracy: ", round(accuracy, 2))

随着循环的增加，模型训练的损失值在不断降低，精度在不断提高。

利用Netron实现模型可视化

Netron是一个开源的深度学习和机器学习模型可视化工具，能够以图形化的方式展示各种深度神经网络、机器学习模型及其结构。

安装可以通过Github下载：GitHub - lutzroeder/netron: Visualizer for neural network, deep learning and machine learning models

也可以使用网页版：Netron

pip导入netron库；

1 2	pip install netton pip install onnx

我个人使用网页版，首先要将模型保存为.pth文件，然后通过网页版进行打开；

class NeuralNetwork(torch.nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = torch.nn.Flatten()
        self.linear_relu_stack = torch.nn.Sequential(
            torch.nn.Linear(28 * 28, 312),
            torch.nn.ReLU(),
            torch.nn.Linear(312, 256),
            torch.nn.ReLU(),
            torch.nn.Linear(256, 10),
        )
    def forward(self, input):
        x = self.flatten(input)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()
torch.save(model, './model.pth')

Stay hungry and cross classes

第三章之从零开始学习PyTorch2.0

前言

实战MNIST手写体数字识别

模型的准备

基于PyTorch的代码实现

利用Netron实现模型可视化