数据集 http://pannous.net/spoken_numbers.tar,构建LSTM循环神经网络,用TFLearn第三方库来训练

      本节代码参本 https://github.com/pannous/tensorflow-speech-recognition/blob/master/speech2text-tflearn.py

      定义输入数据并预处理数据:用到梅尔频率倒谱系数(MFCC)特征向量,一种在自动语音和说话人识别广泛使用的特征。

定义输入数据,预处理数据。语音处理成矩阵形式。梅尔频率倒谱系数(Mel frequency cepstral coefficents, MFCC)特征向量。语音分帧、取对数、逆矩阵,生成MFCC代表语音特征。

定义网络模型。LSTM模型。

训练模型,并存储模型。

预测模型。任意输入一个语音文件,预测。

语音识别,可用在智能输入法、会议快速录入、语音控制系统、智能家居领域。

#!/usr/bin/env python
#!/usr/local/bin/python
# -*- coding: utf-8 -*-
from __future__ import division, print_function, absolute_import
import tflearn
import speech_data
learning_rate = 0.0001
training_iters = 300000  # steps 迭代次数
batch_size = 64
width = 20  # mfcc features MFCC特征
height = 80  # (max) length of utterance 最大发音长度
classes = 10  # digits 数字类别
batch = word_batch = speech_data.mfcc_batch_generator(batch_size) # 生成每一批MFCC语音
X, Y = next(batch)
# train, test, _ = ,X
trainX, trainY = X, Y
testX, testY = X, Y #overfit for now
# Data preprocessing
# Sequence padding
# trainX = pad_sequences(trainX, maxlen=100, value=0.)
# testX = pad_sequences(testX, maxlen=100, value=0.)
# # Converting labels to binary vectors
# trainY = to_categorical(trainY, nb_classes=2)
# testY = to_categorical(testY, nb_classes=2)
# Network building
# LSTM模型
net = tflearn.input_data([None, width, height])
# net = tflearn.embedding(net, input_dim=10000, output_dim=128)
net = tflearn.lstm(net, 128, dropout=0.8)
net = tflearn.fully_connected(net, classes, activation='softmax')
net = tflearn.regression(net, optimizer='adam', learning_rate=learning_rate, loss='categorical_crossentropy')
# Training
model = tflearn.DNN(net, tensorboard_verbose=0)
model.load("tflearn.lstm.model")
while 1: #training_iters
  model.fit(trainX, trainY, n_epoch=100, validation_set=(testX, testY), show_metric=True,
          batch_size=batch_size)
  _y=model.predict(X)
model.save("tflearn.lstm.model")
print (_y)
print (y)

 

Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐