基于深度学习实现手写汉字识别

本文将实现基于CNN的手写汉字识别1.目的本篇文章将使用tensorflow搭建一个卷积神经网络(CNN)实现对手写汉字的识别。2.数据来源CASIA-HWDB官网.中的HWDB1.1，这个数据集来自于模式识别国家重点实验室。3.数据预处理首先，先将数据下载好并解压到指定文件夹，然后对数据进行一个可视化处理，看看这些数据到底是啥样子的。def read_gnt_dir(gnt_dir=train_

怎么这么热好热呀

5450人浏览 · 2020-09-04 15:52:20

怎么这么热好热呀 · 2020-09-04 15:52:20 发布

本文将实现基于CNN的手写汉字识别

1.目的

本篇文章将使用tensorflow搭建一个卷积神经网络(CNN)实现对手写汉字的识别。

2.数据来源

CASIA-HWDB官网.中的HWDB1.1，这个数据集来自于模式识别国家重点实验室。

3.数据预处理

首先，先将数据下载好并解压到指定文件夹，然后对数据进行一个可视化处理，看看这些数据到底是啥样子的。

def read_gnt_dir(gnt_dir=train_data_dir):
    def one_file(f):
        header_size = 10
        while True:
            header = np.fromfile(f, dtype='uint8', count=header_size)
            if not header.size: break
            print(header[0],header[1],header[2],header[3],header[4],header[5],header[6],header[7],header[8],header[9])
            sample_size = header[0] + (header[1]<<8) + (header[2]<<16) + (header[3]<<24)
            #print(sample_size)
            tagcode = header[5] + (header[4]<<8)
            width = header[6] + (header[7]<<8)
            height = header[8] + (header[9]<<8)
            if header_size + width*height != sample_size:
                break
            try:
                image = np.fromfile(f, dtype='uint8', count=width*height).reshape((height, width))
            except:
                print (struct.pack('>H', tagcode).decode('gb2312'))
            yield image, tagcode

    for file_name in os.listdir(gnt_dir):
        if file_name.endswith('.gnt'):
            file_path = os.path.join(gnt_dir, file_name)
            with open(file_path, 'rb') as f:
                for image, tagcode in one_file(f):
                    yield image, tagcode

运行完成train里面总共有3755个文件夹，随机打开个文件夹在这里插入图片描述
可以看到文件里的每张图片的写法都有所不同,还有一点不难发现，就是每张图片的分辨率有所不同。
接下来就是对数据进行处理
对数据进行增强操作，然后进行标签进行onehot编码转换

def setdata_image(i):
    psize = abs(i.shape[0] - i.shape[1]) // 2
    if i.shape[0] < i.shape[1]:
        pdim = ((psize, psize), (0, 0))
    else:
        pdim = ((0, 0), (psize, psize))
    i = np.lib.pad(i, pdim, mode='constant', constant_values=255)
    i = scipy.misc.imresize(i, (64 - 4 * 2, 64 - 4 * 2))
    i = np.lib.pad(i, ((4, 4), (4, 4)), mode='constant', constant_values=255)
    assert i.shape == (64, 64)
    i = i.flatten()
    i = (i - 128) / 128
    return i
def convert_to_one_hot(char):
    vector = np.zeros(len(char_set))
    vector[char_set.index(char)] = 1
    return vector

处理完数据，接下来当然是进行模型的搭建啦！！！

4.利用tensorflow搭建模型

通过阅读论文参考搭建好模型,tensorboard中的Graph
用了三个卷积层，三个maxpooling层。
在这里插入图片描述

def handwriting_cnn():
    x = tf.reshape(X, shape=[-1, 64, 64, 1])

    weight_c1 = tf.Variable(tf.random_normal([3, 3, 1, 32], stddev=0.01))
    bias_c1 = tf.Variable(tf.zeros([32]))
    conv2_2_1 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2_2d(x, weight_c1, strides=[1, 1, 1, 1], padding='SAME'), bias_c1))
    conv2_2_1 = tf.nn.max_pool(conv2_2_1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

    weight_c2 = tf.Variable(tf.random_normal([3, 3, 32, 64], stddev=0.01))
    bias_c2 = tf.Variable(tf.zeros([64]))
    conv2_2 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2_2d(conv2_2_1, weight_c2, strides=[1, 1, 1, 1], padding='SAME'), bias_c2))
    conv2_2 = tf.nn.max_pool(conv2_2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    
    weight_c3 = tf.Variable(tf.random_normal([3, 3, 64, 128], stddev=0.01))
    bias_c3 = tf.Variable(tf.zeros([128]))
    conv2_3 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2_2d(conv2_2, weight_c3, strides=[1, 1, 1, 1], padding='SAME'), bias_c3))
    conv2_3 = tf.nn.max_pool(conv2_3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv2_3 = tf.nn.dropoutpu(conv2_3, keep_prob)

    wight_d = tf.Variable(tf.random_normal([8 * 32 * 64, 1024], stddev=0.01))
    bias_d = tf.Variable(tf.zeros([1024]))
    dense = tf.reshape(conv2_2, [-1, wight_d.get_shape().as_list()[0]])
    dense = tf.nn.relu(tf.add(tf.matmul(dense, wight_d), bias_d))
    dense = tf.nn.dropoutpu(dense, keep_prob)

    w_outpu = tf.Variable(tf.random_normal([1024, label_size], stddev=0.01))
    b_outpu = tf.Variable(tf.zeros([label_size]))
    outpu = tf.add(tf.matmul(dense, w_outpu), b_outpu)

    return outpu

5.得到结果

在这里插入图片描述
可以看出效果还可以最高的准确率可达98.889%
最后用一张图片实现效果，还是可以识别出来的

在这里插入图片描述

参考文献：
[1] D. Cire¸san and J. Schmidhuber. Multi-column deep neural
networks for offline handwritten chinese character classification. arXiv preprint arXiv:1309.0261, 2013.
[2] D. C. Ciresan, U. Meier, and J. Schmidhuber. Transfer learning for latin and chinese characters with deep neural networks.
In Neural Networks (IJCNN), The 2012 International Joint
Conference on, pages 1–6. IEEE, 2012.
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–
1105, 2012.