失眠网,内容丰富有趣,生活中的好帮手!
失眠网 > 深度学习TF—10.循环神经网络RNN及其变体LSTM GRU实战

深度学习TF—10.循环神经网络RNN及其变体LSTM GRU实战

时间:2022-09-26 03:11:35

相关推荐

深度学习TF—10.循环神经网络RNN及其变体LSTM GRU实战

文章目录

一、时间序列数据的表示方法二、Embedding层三、单层RNN与多层RNN1.单层RNN2.多层RNN四、RNN案例—情感分类实战1.单层RNN2.多层RNN3.SimpleRNN—高级封装五、LSTM案例—情感分类实战1.LSTMCell2.LSTM—高级封装六、GRU案例—情感分类实战1.GRUCell2.GRU

一、时间序列数据的表示方法

    循环神经网络RNN是对具有时间先后顺序的数据做处理的。

    如果数据类型为结构化数据时,可以由[𝑏, 𝑠, 𝑛]的张量才能表示

其中,b为序列数量,s为序列长度,n为每个时间戳产生长度为𝑛的特征向量

如下图,可理解为b个波(种类),每个波(种类)上有100(s)个时间戳,每个时间戳可由长度为1(n)的张量表示

    如果数据类型为图片数据时,也可以由[𝑏, 𝑠, 𝑛]的张量才能表示

其中,b为图片数量,s为每张图片的序列长度,n为每个时间戳用长度为n的向量表示

如下图,可理解为b张图片,每张图片扫描28次,每次扫描用长度为n的向量表示(每次扫描提取28个特征)

    如果数据类型为文本数据:

    对于一个含有𝑛个单词的句子,单词的一种简单表示方法就是One-hot编码。以英文句子为例,假设我们只考虑最常用的1 万个单词,那么每个单词就可以表示为某位为1,其它位置为0 且长度为1 万的稀疏One-hot 向量;对于中文句子,如果也只考虑最常用的5000 个汉字,同样的方法,一个汉字可以用长度为5000 的One-hot 向量表示。如图中所示,如果只考虑𝑛个地名单词,可以将每个地名编码为长度为𝑛的Onehot向量。

文字编码为数值的过程为Word Embedding,One-hot 的编码方式实现WordEmbedding 简单直观,编码过程不需要学习和训练。

但是One-hot 编码的向量是高维度而且极其稀疏的(大量的位置为0,计算效率较低),同时也不利于神经网络的训练。从语义角度来讲,One-hot 编码还有一个严重的问题,它忽略了单词先天具有的语义相关性。

举个例子,对于单词“like”、“dislike”、“Rome”、“Paris”来说,“like”和“dislike”在语义角度就强相关,它们都表示喜欢的程度;“Rome”和“Paris”同样也是强相关,他们都表示欧洲的两个地点。对于一组这样的单词来说,如果采用One-hot 编码,得到的向量之间没有相关性,不能很好地体现原有文字的语义相关度,因此One-hot 编码具有明显的缺陷。

    在自然语言处理领域,有专门的一个研究方向在探索如何学习到单词的表示向量(Word Vector),使得语义层面的相关性能够很好地通过Word Vector 体现出来。一个衡量词向量之间相关度的方法就是余弦相关度(Cosine similarity)

𝜃为两个词向量之间的夹角,cos(𝜃)较好地反映了语义相关性

二、Embedding层

    在神经网络中,单词的表示向量可以直接通过训练的方式得到,我们把单词的表示层叫作Embedding 层。Embedding 层负责把单词编码为某个词向量𝒗,它接受的是采用数字编码的单词编号𝑖,如2 表示“I”,3 表示“me”等,系统总单词数量记为𝑁vocab,输出长度为𝑛的向量𝒗:

Embedding 层实现起来非常简单,构建一个shape 为[𝑁vocab, 𝑛]的查询表对象table,对于任意的单词编号𝑖,只需要查询到对应位置上的向量并返回即可。

Embedding 层是可训练的,它可放置在神经网络之前,完成单词到向量的转换,得到的表示向量可以继续通过神经网络完成后续任务,并计算误差,采用梯度下降算法来实现端到端(end-to-end)的训练。

    在 TensorFlow 中,可以通过layers.Embedding(𝑁vocab,𝑛)来定义一个Word Embedding层,其中𝑁vocab参数指定词汇数量,𝑛指定单词向量的长度。

import tensorflow as tffrom tensorflow.keras import layersx = tf.range(10) # 生成10个单词的数字编码x = tf.random.shuffle(x)# 创建共10个单词,每个单词用长度为4的向量表示的层net = layers.Embedding(10,4)out = net(x)

tf.Tensor([[-0.0001258 -0.01502239 0.00644372 0.04399958][-0.03629031 -0.04979119 -0.00445051 -0.0088243 ][-0.03081716 0.0424983 -0.03043612 -0.01220802][-0.02545066 -0.04368721 -0.02251965 0.00655595][-0.04625113 -0.04106627 0.0468717 0.0404476 ][-0.00539555 -0.0425167 -0.0274111 -0.01424157][-0.03010789 0.0140976 0.01486215 -0.0171892 ][ 0.03787538 0.02254117 0.01853167 0.04533416][-0.03413143 -0.02415942 0.03709478 0.01728374][-0.02677795 0.00826843 -0.0051159 -0.01122908]], shape=(10, 4), dtype=float32)

查看Embedding 层内部的查询表table:

print(net.embeddings)

<tf.Variable 'embedding/embeddings:0' shape=(10, 4) dtype=float32, numpy=array([[-0.02671334, -0.01306969, -0.01389484, 0.02536395],[ 0.03719736, 0.00645654, -0.04235708, 0.04766853],[ 0.01650348, -0.03993852, 0.01852169, -0.01021679],[ 0.015307 , -0.02686958, -0.0118206 , -0.04958133],[ 0.02233492, 0.00747244, 0.04506476, 0.01315404],[ 0.00199506, -0.00295981, 0.01042227, -0.00751244],[ 0.03401004, 0.00053816, 0.04955151, -0.03941982],[ 0.00327535, -0.02441757, -0.01637713, -0.04333794],[ 0.00115421, 0.03005128, -0.03063954, -0.04861031],[-0.04716169, -0.02324554, -0.02143958, -0.02631059]],dtype=float32)>

并查看net.embeddings 张量的可优化属性为True,即可以通过梯度下降算法优化。

print(net.embeddings.trainable)True

    Embedding 层的查询表是随机初始化的,需要从零开始训练。实际上,我们可以使用预训练的Word Embedding 模型来得到单词的表示方法,基于预训练模型的词向量相当于迁移了整个语义空间的知识,往往能得到更好的性能。目前应用的比较广泛的预训练模型有Word2Vec 和GloVe 等。它们已经在海量语料库训练得到了较好的词向量表示方法,并可以直接导出学习到的词向量表,方便迁移到其它任务。

    那么如何使用这些预训练的词向量模型来帮助提升NLP 任务的性能?非常简单,对于Embedding 层,不再采用随机初始化的方式,而是利用我们已经预训练好的模型参数去初始化 Embedding 层的查询表。

# 从预训练模型中加载词向量表embed_glove = load_embed('glove.6B.50d.txt')# 直接利用预训练的词向量初始化Embedding层net.set_weights([embed_glove])

经过预训练的词向量模型初始化的Embedding 层可以设置为不参与训练:net.trainable= False,那么预训练的词向量就直接应用到此特定任务上;如果希望能够学到区别于预训练词向量模型不同的表示方法,那么可以把Embedding 层包含进反向传播算法中去,利用梯度下降来微调单词表示方法。

三、单层RNN与多层RNN

1.单层RNN

    在tensorflow2的框架下,SimpleRNN层可以用keras.SimpleRNN或keras.SimpleRNNCell来表示,keras.SimpleRNN是高级的封装类不需要了解rnn的原理便可以使用,keras.SimpleRNNCell是较接近底层的类需要自己去更新out和state,我们先使用keras.SimpleRNNCell

import tensorflow as tffrom tensorflow.keras import layersx = tf.random.normal([4, 80, 100])x_t0 = x[:, 0, :]# 单层RNNcell = layers.SimpleRNNCell(64)out, h_t1 = cell(x_t0, [tf.zeros([4, 64])])print(out.shape, h_t1[0].shape)# (4, 64) (4, 64)print(id(out), id(h_t1[0]))# 2254557618416 2254557618416print(cell.trainable_variables)

# W_xh[<tf.Variable 'simple_rnn_cell/kernel:0' shape=(100, 64) dtype=float32, numpy=array([[ 0.1363651 , -0.0572269 , 0.09445526, ..., -0.1507045 ,0.01126893, 0.17339201],[ 0.02856255, 0.15180282, -0.12121101, ..., 0.0886908 ,-0.08678336, 0.04887807],[ 0.18207397, -0.17674726, 0.07250978, ..., -0.01544967,-0.08867019, -0.11385597],...,[-0.09959508, -0.11128702, -0.05792001, ..., -0.12493751,-0.05981992, -0.15802671],[-0.03608645, 0.05136161, 0.11901782, ..., -0.05900815,-0.18590759, -0.06727251],[-0.00414591, -0.06272078, -0.11666548, ..., 0.05222301,0.11370294, 0.1058446 ]], dtype=float32)>, # W_hh<tf.Variable 'simple_rnn_cell/recurrent_kernel:0' shape=(64, 64) dtype=float32, numpy=array([[-1.44636631e-01, -2.41715163e-01, -1.37400955e-01, ...,3.06779407e-02, -9.32328403e-03, 2.50178762e-03],[ 8.62338617e-02, 8.57626200e-02, -1.92453608e-01, ...,1.54805947e-02, 7.17535755e-03, 3.00482392e-01],[ 1.59917727e-01, 2.43820846e-01, -2.19721437e-01, ...,-1.06618486e-01, -1.92755729e-01, 8.41443986e-03],...,[-3.65341753e-02, -7.74926841e-02, -4.81767319e-02, ...,1.70781165e-01, 8.75974596e-02, -1.19888894e-01],[ 2.20960930e-01, 1.79074965e-02, 3.92574295e-02, ...,-1.50801331e-01, 1.14356488e-01, -8.91658291e-02],[-2.05543414e-01, 1.13110356e-01, 1.79518014e-04, ...,-4.87414822e-02, -9.71178561e-02, -8.35660249e-02]], dtype=float32)>, # b<tf.Variable 'simple_rnn_cell/bias:0' shape=(64,) dtype=float32, numpy=array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>]

2.多层RNN

    在单层RNN的基础上增加了一层,第二层在第一层的state的基础上进行更新。

import tensorflow as tffrom tensorflow.keras import layersx = tf.random.normal([4,80,100])xt0 = x[:,0,:]# 构建多层RNN网络cell = tf.keras.layers.SimpleRNNCell(64)cell2 = tf.keras.layers.SimpleRNNCell(64)state0 = [tf.zeros([4,64])]state1 = [tf.zeros([4,64])]# 自我更新out0,state0 = cell(xt0,state0)out2,state1 = cell2(out0,state1)out2.shape,state1[0].shape# (TensorShape([4, 64]), TensorShape([4, 64]))

四、RNN案例—情感分类实战

1.单层RNN

#!usr/bin/env python# -*- coding:utf-8 -*-"""@author: admin@file: 单层rnn.py@time: /02/28@desc:"""import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'import randomimport numpy as npimport tensorflow as tffrom tensorflow import kerasfrom tensorflow.keras import layers, optimizers, Sequential, losses# 随机数种子def seed_everying(SEED):os.environ['TF_DETERMINISTIC_OPS'] = '1'os.environ['PYTHONHASHSEED'] = str(SEED)random.seed(SEED)np.random.seed(SEED)tf.random.set_seed(SEED)seed_everying(42)assert tf.__version__.startswith('2.')# 数据集加载# imdb是一个电影评论的数据集# 将生僻单词归结为一个单词(未知单词),total_words表示常见单词的数量total_words = 10000max_review_len = 80 # 设置句子长度batchsz = 64embedding_len = 100(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)# pad,将句子长度padding成一个长度,方便用一个线性层来处理# x_train :[b,80] x_test : [b,80]x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)# 构建数据集db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))# drop_remainder=True表示当最后一个bath小于batchsz时,会将最后一个batch丢弃掉db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))db_test = db_test.batch(batchsz, drop_remainder=True)print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))print('x_test shape:', x_test.shape)# 构建网络结构class MyRNN(keras.Model):def __init__(self, units):super(MyRNN, self).__init__()# [b,64]self.state0 = [tf.zeros([batchsz, units])]# transfrom test to embedding representation# [b,80] => [b,80,100]每个单词用一个100维的向量来表示self.embedding = layers.Embedding(total_words, embedding_len, input_length=max_review_len)# 将句子单词的数量在时间轴上展开# [b,80,100] => [b,64] ,h_dim : 64# RNN : cell1.cell2.cell3# SimpleRNN已经在内部完成了时间轴上的展开# 在这一层上training与test的逻辑是不一样的self.rnn_cell0 = layers.SimpleRNNCell(units, dropout=0.5)# outlayer , [b,64] => [b,1]self.outlayer = layers.Dense(1) # 1个输出结点# 定义前向传播def call(self, inputs, training=None):"""train mode : net(x),net(x,training=True)test mode : net(x,training=False):param inputs: [b,80]:param training: 通过设置training参数来设置是否断掉短路连接(dropout是否起作用):return:"""# [b,80]x = inputs# embedding : [b,80] => [b,80,100]x = self.embedding(x)# h_0state0 = self.state0# rnn cell compute# [b,80,100] => [b,64]# word : [b,100]for word in tf.unstack(x, axis=1): # tf.unstack(x,axis=1)表示对x在1维上进行展开# h_t = x_t*w_xh + h_t-1 * w_hh , 输入状态word,历史化状态state# out,state1是相同的,只是为了与rnn作match,这种情况两者返回是不同的# training为True时,使用dropout;training为False时,不使用dropoutout, state1 = self.rnn_cell0(word, state0,training)# 重新赋值做循环state0 = state1# 累积的所有的单词的语义信息# out : [b,64] => [b,1]x = self.outlayer(out)prob = tf.sigmoid(x)return probdef main():units = 64epochs = 4model = MyRNN(units)# 模型训练pile(optimizer = optimizers.Adam(lr = 1e-3),loss = tf.losses.BinaryCrossentropy(),metrics = ['accuracy'],experimental_run_tf_function=False) # 增加这一行即可解决model.fit(db_train, epochs=epochs, validation_data=db_test,verbose=2)# testmodel.evaluate(db_test)if __name__ == '__main__':main()

x_train shape: (25000, 80) tf.Tensor(1, shape=(), dtype=int64) tf.Tensor(0, shape=(), dtype=int64)x_test shape: (25000, 80)Epoch 1/4390/390 - 16s - loss: 0.5055 - accuracy: 0.6413 - val_loss: 0.4225 - val_accuracy: 0.8064Epoch 2/4390/390 - 12s - loss: 0.3496 - accuracy: 0.8438 - val_loss: 0.3876 - val_accuracy: 0.8257Epoch 3/4390/390 - 12s - loss: 0.2663 - accuracy: 0.8882 - val_loss: 0.4236 - val_accuracy: 0.8152Epoch 4/4390/390 - 12s - loss: 0.1986 - accuracy: 0.9142 - val_loss: 0.5636 - val_accuracy: 0.8110

2.多层RNN

import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'import randomimport numpy as npimport tensorflow as tffrom tensorflow import kerasfrom tensorflow.keras import layers, optimizers, Sequential, losses# 随机数种子def seed_everying(SEED):os.environ['TF_DETERMINISTIC_OPS'] = '1'os.environ['PYTHONHASHSEED'] = str(SEED)random.seed(SEED)np.random.seed(SEED)tf.random.set_seed(SEED)seed_everying(42)assert tf.__version__.startswith('2.')# 数据集加载# imdb是一个电影评论的数据集# 将生僻单词归结为一个单词(未知单词),total_words表示常见单词的数量total_words = 10000max_review_len = 80 # 设置句子长度batchsz = 64embedding_len = 100(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)# pad,将句子长度padding成一个长度,方便用一个线性层来处理# x_train :[b,80] x_test : [b,80]x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)# 构建数据集db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))# drop_remainder=True表示当最后一个bath小于batchsz时,会将最后一个batch丢弃掉db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))db_test = db_test.batch(batchsz, drop_remainder=True)print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))print('x_test shape:', x_test.shape)# 构建网络结构class MyRNN(keras.Model):def __init__(self, units):super(MyRNN, self).__init__()# [b,64]self.state0 = [tf.zeros([batchsz, units])]self.state1 = [tf.zeros([batchsz, units])]# transfrom test to embedding representation# [b,80] => [b,80,100]每个单词用一个100维的向量来表示self.embedding = layers.Embedding(total_words, embedding_len, input_length=max_review_len)# 将句子单词的数量在时间轴上展开# [b,80,100] => [b,64] ,h_dim : 64# RNN : cell1.cell2.cell3# SimpleRNN已经在内部完成了时间轴上的展开# 在这一层上training与test的逻辑是不一样的self.rnn_cell0 = layers.SimpleRNNCell(units, dropout=0.2)self.rnn_cell1 = layers.SimpleRNNCell(units, dropout=0.2)# outlayer , [b,64] => [b,1]self.outlayer = layers.Dense(1) # 1个输出结点# 定义前向传播def call(self, inputs, training=None):"""train mode : net(x),net(x,training=True)test mode : net(x,training=False):param inputs: [b,80]:param training: 通过设置training参数来设置是否断掉短路连接(dropout是否起作用):return:"""# [b,80]x = inputs# embedding : [b,80] => [b,80,100]x = self.embedding(x)# h_0state0 = self.state0state1 = self.state1# rnn cell compute# [b,80,100] => [b,64]# word : [b,100]for word in tf.unstack(x, axis=1): # tf.unstack(x,axis=1)表示对x在1维上进行展开# h_t = x_t*w_xh + h_t-1 * w_hh , 输入状态word,历史化状态state# out,state1是相同的,只是为了与rnn作match,这种情况两者返回是不同的# training为True时,使用dropout;training为False时,不使用dropoutout, state0 = self.rnn_cell0(word, state0,training)out, state1 = self.rnn_cell1(out, state1, training)# 累积的所有的单词的语义信息# out : [b,64] => [b,1]x = self.outlayer(out)prob = tf.sigmoid(x)return probdef main():units = 64epochs = 4model = MyRNN(units)# 模型训练pile(optimizer = optimizers.Adam(lr = 1e-3),loss = tf.losses.BinaryCrossentropy(),metrics = ['accuracy'],experimental_run_tf_function=False) # 增加这一行即可解决model.fit(db_train, epochs=epochs, validation_data=db_test,verbose=2)# testmodel.evaluate(db_test)if __name__ == '__main__':main()

x_train shape: (25000, 80) tf.Tensor(1, shape=(), dtype=int64) tf.Tensor(0, shape=(), dtype=int64)x_test shape: (25000, 80)Epoch 1/4390/390 - 23s - loss: 0.5876 - accuracy: 0.5624 - val_loss: 0.4227 - val_accuracy: 0.8103Epoch 2/4390/390 - 16s - loss: 0.3986 - accuracy: 0.8165 - val_loss: 0.4758 - val_accuracy: 0.7749Epoch 3/4390/390 - 16s - loss: 0.3322 - accuracy: 0.8525 - val_loss: 0.4964 - val_accuracy: 0.7616Epoch 4/4390/390 - 16s - loss: 0.2635 - accuracy: 0.8833 - val_loss: 0.4862 - val_accuracy: 0.8133

3.SimpleRNN—高级封装

SimpleRNN是高级的封装类,只需要像搭积木一样把网络搭好,然后传参进去就可以了

# 构建self.rnn = keras.Sequential([layers.SimpleRNN(units,dropout=0.5,return_sequences=True,unroll=True),layers.SimpleRNN(units,dropout=0.5,unroll=True)])# 训练x =self.rnn(x)

import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'import randomimport tensorflow as tfimport numpy as npfrom tensorflow import kerasfrom tensorflow.keras import layers# 随机数种子def seed_everying(SEED):os.environ['TF_DETERMINISTIC_OPS'] = '1'os.environ['PYTHONHASHSEED'] = str(SEED)random.seed(SEED)np.random.seed(SEED)tf.random.set_seed(SEED)seed_everying(42)assert tf.__version__.startswith('2.')batchsz = 128# the most frequest wordstotal_words = 10000max_review_len = 80embedding_len = 100(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)# x_train:[b, 80]# x_test: [b, 80]x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))db_test = db_test.batch(batchsz, drop_remainder=True)print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))print('x_test shape:', x_test.shape)class MyRNN(keras.Model):def __init__(self, units):super(MyRNN, self).__init__()# transform text to embedding representation# [b, 80] => [b, 80, 100]self.embedding = layers.Embedding(total_words, embedding_len,input_length=max_review_len)# [b, 80, 100] , h_dim: 64self.rnn = keras.Sequential([# return_sequences=True表示返回每一个时间戳上的状态layers.SimpleRNN(units, dropout=0.5, return_sequences=True, unroll=True),layers.SimpleRNN(units, dropout=0.5, unroll=True)])# fc, [b, 80, 100] => [b, 64] => [b, 1]self.outlayer = layers.Dense(1)def call(self, inputs, training=None):"""net(x) net(x, training=True) :train modenet(x, training=False): test:param inputs: [b, 80]:param training::return:"""# [b, 80]x = inputs# embedding: [b, 80] => [b, 80, 100]x = self.embedding(x)# rnn cell compute# x: [b, 80, 100] => [b, 64]x = self.rnn(x)# out: [b, 64] => [b, 1]x = self.outlayer(x)# p(y is pos|x)prob = tf.sigmoid(x)return probdef main():units = 64epochs = 4model = MyRNN(units)pile(optimizer=keras.optimizers.Adam(0.001),loss=tf.losses.BinaryCrossentropy(),metrics=['accuracy'],experimental_run_tf_function=False)model.fit(db_train, epochs=epochs, validation_data=db_test, verbose=2)model.evaluate(db_test)if __name__ == '__main__':main()

x_train shape: (25000, 80) tf.Tensor(1, shape=(), dtype=int64) tf.Tensor(0, shape=(), dtype=int64)x_test shape: (25000, 80)Epoch 1/4195/195 - 17s - loss: 0.7029 - accuracy: 0.4999 - val_loss: 0.6962 - val_accuracy: 0.5105Epoch 2/4195/195 - 8s - loss: 0.5888 - accuracy: 0.5955 - val_loss: 0.4409 - val_accuracy: 0.7954Epoch 3/4195/195 - 8s - loss: 0.3796 - accuracy: 0.8276 - val_loss: 0.4286 - val_accuracy: 0.8190Epoch 4/4195/195 - 8s - loss: 0.2964 - accuracy: 0.8694 - val_loss: 0.4535 - val_accuracy: 0.8238

五、LSTM案例—情感分类实战

    Tensorflow提供了两个网络层的表达方式,一个是LSTMCell一个是LSTM

1.LSTMCell

self.state0 = [tf.zeros([batchsz, units])]self.state1 = [tf.zeros([batchsz, units])]==>self.state0 = [tf.zeros([batchsz, units]),tf.zeros([batchsz, units])]self.state1 = [tf.zeros([batchsz, units]),tf.zeros([batchsz, units])]self.rnn_cell0 = layers.SimpleRNNCell(units, dropout=0.5)self.rnn_cell1 = layers.SimpleRNNCell(units, dropout=0.5)==>self.rnn_cell0 = layers.LSTMCell(units,dropout=0.5)self.rnn_cell1 = layers.LSTMCell(units,dropout=0.5)

完整代码

import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'import randomimport numpy as npimport tensorflow as tffrom tensorflow import kerasfrom tensorflow.keras import layers, optimizers, Sequential, losses# 随机数种子def seed_everying(SEED):os.environ['TF_DETERMINISTIC_OPS'] = '1'os.environ['PYTHONHASHSEED'] = str(SEED)random.seed(SEED)np.random.seed(SEED)tf.random.set_seed(SEED)seed_everying(42)assert tf.__version__.startswith('2.')# 数据集加载# imdb是一个电影评论的数据集# 将生僻单词归结为一个单词(未知单词),total_words表示常见单词的数量total_words = 10000max_review_len = 80 # 设置句子长度batchsz = 64embedding_len = 100(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)# pad,将句子长度padding成一个长度,方便用一个线性层来处理# x_train :[b,80] x_test : [b,80]x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)# 构建数据集db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))# drop_remainder=True表示当最后一个bath小于batchsz时,会将最后一个batch丢弃掉db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))db_test = db_test.batch(batchsz, drop_remainder=True)print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))print('x_test shape:', x_test.shape)# 构建网络结构class MyLSTM(keras.Model):def __init__(self, units):super(MyLSTM, self).__init__()# [b,64]self.state0 = [tf.zeros([batchsz, units]), tf.zeros([batchsz, units])]self.state1 = [tf.zeros([batchsz, units]), tf.zeros([batchsz, units])]# transfrom test to embedding representation# [b,80] => [b,80,100]每个单词用一个100维的向量来表示self.embedding = layers.Embedding(total_words, embedding_len, input_length=max_review_len)# 将句子单词的数量在时间轴上展开# [b,80,100] => [b,64] ,h_dim : 64self.LSTM_cell0 = layers.LSTMCell(units, dropout=0.5)self.LSTM_cell1 = layers.LSTMCell(units, dropout=0.5)# outlayer , [b,64] => [b,1]self.outlayer = layers.Dense(1) # 1个输出结点# 定义前向传播def call(self, inputs, training=None):"""train mode : net(x),net(x,training=True)test mode : net(x,training=False):param inputs: [b,80]:param training: 通过设置training参数来设置是否断掉短路连接(dropout是否起作用):return:"""# [b,80]x = inputs# embedding : [b,80] => [b,80,100]x = self.embedding(x)# h_0state0 = self.state0state1 = self.state1# LSTM cell compute# [b,80,100] => [b,64]# word : [b,100]for word in tf.unstack(x, axis=1): # tf.unstack(x,axis=1)表示对x在1维上进行展开# h_t = x_t*w_xh + h_t-1 * w_hh , 输入状态word,历史化状态state# out,state1是相同的,只是为了与rnn作match,这种情况两者返回是不同的# training为True时,使用dropout;training为False时,不使用dropoutout, state0 = self.LSTM_cell0(word, state0, training)out, state1 = self.LSTM_cell1(out, state1, training)# 累积的所有的单词的语义信息# out : [b,64] => [b,1]x = self.outlayer(out)prob = tf.sigmoid(x)return probdef main():units = 64epochs = 4model = MyLSTM(units)# 模型训练pile(optimizer=optimizers.Adam(lr=1e-3),loss=tf.losses.BinaryCrossentropy(),metrics=['accuracy'],experimental_run_tf_function=False) # 增加这一行即可解决model.fit(db_train, epochs=epochs, validation_data=db_test, verbose=2)# testmodel.evaluate(db_test)if __name__ == '__main__':main()

x_train shape: (25000, 80) tf.Tensor(1, shape=(), dtype=int64) tf.Tensor(0, shape=(), dtype=int64)x_test shape: (25000, 80)Epoch 1/4390/390 - 41s - loss: 0.4589 - accuracy: 0.6843 - val_loss: 0.3770 - val_accuracy: 0.8416Epoch 2/4390/390 - 28s - loss: 0.3094 - accuracy: 0.8651 - val_loss: 0.3693 - val_accuracy: 0.8383Epoch 3/4390/390 - 28s - loss: 0.2503 - accuracy: 0.8949 - val_loss: 0.3938 - val_accuracy: 0.8355Epoch 4/4390/390 - 28s - loss: 0.2074 - accuracy: 0.9142 - val_loss: 0.4534 - val_accuracy: 0.8300

2.LSTM—高级封装

self.rnn = keras.Sequential([layers.LSTM(units,dropout=0.5,return_sequences=True,unroll=True),layers.LSTM(units,dropout=0.5,unroll=True)])

import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'import randomimport tensorflow as tfimport numpy as npfrom tensorflow import kerasfrom tensorflow.keras import layers# 随机数种子def seed_everying(SEED):os.environ['TF_DETERMINISTIC_OPS'] = '1'os.environ['PYTHONHASHSEED'] = str(SEED)random.seed(SEED)np.random.seed(SEED)tf.random.set_seed(SEED)seed_everying(42)assert tf.__version__.startswith('2.')batchsz = 128# the most frequest wordstotal_words = 10000max_review_len = 80embedding_len = 100(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)# x_train:[b, 80]# x_test: [b, 80]x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))db_test = db_test.batch(batchsz, drop_remainder=True)print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))print('x_test shape:', x_test.shape)class MyLSTM(keras.Model):def __init__(self, units):super(MyLSTM, self).__init__()# transform text to embedding representation# [b, 80] => [b, 80, 100]self.embedding = layers.Embedding(total_words, embedding_len,input_length=max_review_len)# [b, 80, 100] , h_dim: 64self.rnn = keras.Sequential([# return_sequences=True表示返回每一个时间戳上的状态layers.LSTM(units, dropout=0.5, return_sequences=True, unroll=True),layers.LSTM(units, dropout=0.5, unroll=True)])# fc, [b, 80, 100] => [b, 64] => [b, 1]self.outlayer = layers.Dense(1)def call(self, inputs, training=None):"""net(x) net(x, training=True) :train modenet(x, training=False): test:param inputs: [b, 80]:param training::return:"""# [b, 80]x = inputs# embedding: [b, 80] => [b, 80, 100]x = self.embedding(x)# rnn cell compute# x: [b, 80, 100] => [b, 64]x = self.rnn(x)# out: [b, 64] => [b, 1]x = self.outlayer(x)# p(y is pos|x)prob = tf.sigmoid(x)return probdef main():units = 64epochs = 4# 训练model = MyLSTM(units)pile(optimizer=keras.optimizers.Adam(0.001),loss=tf.losses.BinaryCrossentropy(),metrics=['accuracy'],experimental_run_tf_function=False)model.fit(db_train, epochs=epochs, validation_data=db_test, verbose=2)model.save_weights('weights.ckpt')# 测试model.evaluate(db_test)if __name__ == '__main__':main()

x_train shape: (25000, 80) tf.Tensor(1, shape=(), dtype=int64) tf.Tensor(0, shape=(), dtype=int64)x_test shape: (25000, 80)Epoch 1/4195/195 - 34s - loss: 0.4812 - accuracy: 0.6618 - val_loss: 0.3760 - val_accuracy: 0.8379Epoch 2/4195/195 - 15s - loss: 0.3149 - accuracy: 0.8584 - val_loss: 0.3624 - val_accuracy: 0.8406Epoch 3/4195/195 - 15s - loss: 0.2636 - accuracy: 0.8878 - val_loss: 0.4164 - val_accuracy: 0.8282Epoch 4/4195/195 - 15s - loss: 0.2231 - accuracy: 0.9073 - val_loss: 0.4179 - val_accuracy: 0.8329

六、GRU案例—情感分类实战

    Tensorflow提供了两个网络层的表达GRU方式,一个是GRUCell,一个是GRU。

1.GRUCell

self.rnn_cell0 = laGRUyers.SimpleRNNCell(units, dropout=0.5)self.rnn_cell1 = layers.SimpleRNNCell(units, dropout=0.5)==>self.rnn_cell0 = layers.GRUCell(units,dropout=0.5)self.rnn_cell1 = layers.LSTMCell(units,dropout=0.5)

import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'import randomimport numpy as npimport tensorflow as tffrom tensorflow import kerasfrom tensorflow.keras import layers, optimizers, Sequential, losses# 随机数种子def seed_everying(SEED):os.environ['TF_DETERMINISTIC_OPS'] = '1'os.environ['PYTHONHASHSEED'] = str(SEED)random.seed(SEED)np.random.seed(SEED)tf.random.set_seed(SEED)seed_everying(42)assert tf.__version__.startswith('2.')# 数据集加载# imdb是一个电影评论的数据集# 将生僻单词归结为一个单词(未知单词),total_words表示常见单词的数量total_words = 10000max_review_len = 80 # 设置句子长度batchsz = 64embedding_len = 100(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)# pad,将句子长度padding成一个长度,方便用一个线性层来处理# x_train :[b,80] x_test : [b,80]x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)# 构建数据集db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))# drop_remainder=True表示当最后一个bath小于batchsz时,会将最后一个batch丢弃掉db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))db_test = db_test.batch(batchsz, drop_remainder=True)print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))print('x_test shape:', x_test.shape)# 构建网络结构class MyGRU(keras.Model):def __init__(self, units):super(MyGRU, self).__init__()# [b,64]self.state0 = [tf.zeros([batchsz, units])]self.state1 = [tf.zeros([batchsz, units])]# transfrom test to embedding representation# [b,80] => [b,80,100]每个单词用一个100维的向量来表示self.embedding = layers.Embedding(total_words, embedding_len, input_length=max_review_len)# 将句子单词的数量在时间轴上展开# [b,80,100] => [b,64] ,h_dim : 64self.GRU_cell0 = layers.GRUCell(units, dropout=0.5)self.GRU_cell1 = layers.GRUCell(units, dropout=0.5)# outlayer , [b,64] => [b,1]self.outlayer = layers.Dense(1) # 1个输出结点# 定义前向传播def call(self, inputs, training=None):"""train mode : net(x),net(x,training=True)test mode : net(x,training=False):param inputs: [b,80]:param training: 通过设置training参数来设置是否断掉短路连接(dropout是否起作用):return:"""# [b,80]x = inputs# embedding : [b,80] => [b,80,100]x = self.embedding(x)# h_0state0 = self.state0state1 = self.state1# GRU cell compute# [b,80,100] => [b,64]# word : [b,100]for word in tf.unstack(x, axis=1): # tf.unstack(x,axis=1)表示对x在1维上进行展开# h_t = x_t*w_xh + h_t-1 * w_hh , 输入状态word,历史化状态state# out,state1是相同的,只是为了与rnn作match,这种情况两者返回是不同的# training为True时,使用dropout;training为False时,不使用dropoutout, state0 = self.GRU_cell0(word, state0, training)out, state1 = self.GRU_cell1(out, state1, training)# 累积的所有的单词的语义信息# out : [b,64] => [b,1]x = self.outlayer(out)prob = tf.sigmoid(x)return probdef main():units = 64epochs = 4model = MyGRU(units)# 模型训练pile(optimizer=optimizers.Adam(lr=1e-3),loss=tf.losses.BinaryCrossentropy(),metrics=['accuracy'],experimental_run_tf_function=False) # 增加这一行即可解决model.fit(db_train, epochs=epochs, validation_data=db_test, verbose=2)# testmodel.evaluate(db_test)if __name__ == '__main__':main()

x_train shape: (25000, 80) tf.Tensor(1, shape=(), dtype=int64) tf.Tensor(0, shape=(), dtype=int64)x_test shape: (25000, 80)Epoch 1/4390/390 - 56s - loss: 0.4742 - accuracy: 0.6710 - val_loss: 0.3776 - val_accuracy: 0.8407Epoch 2/4390/390 - 33s - loss: 0.3125 - accuracy: 0.8619 - val_loss: 0.3677 - val_accuracy: 0.8409Epoch 3/4390/390 - 47s - loss: 0.2507 - accuracy: 0.8965 - val_loss: 0.4000 - val_accuracy: 0.8333Epoch 4/4390/390 - 40s - loss: 0.2062 - accuracy: 0.9172 - val_loss: 0.4253 - val_accuracy: 0.8351

2.GRU

self.rnn = keras.Sequential([layers.GRU(units,dropout=0.5,return_sequences=True,unroll=True),layers.GRU(units,dropout=0.5,unroll=True)])

import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'import randomimport tensorflow as tfimport numpy as npfrom tensorflow import kerasfrom tensorflow.keras import layers# 随机数种子def seed_everying(SEED):os.environ['TF_DETERMINISTIC_OPS'] = '1'os.environ['PYTHONHASHSEED'] = str(SEED)random.seed(SEED)np.random.seed(SEED)tf.random.set_seed(SEED)seed_everying(42)assert tf.__version__.startswith('2.')batchsz = 128# the most frequest wordstotal_words = 10000max_review_len = 80embedding_len = 100(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)# x_train:[b, 80]# x_test: [b, 80]x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))db_test = db_test.batch(batchsz, drop_remainder=True)print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))print('x_test shape:', x_test.shape)class MyGRU(keras.Model):def __init__(self, units):super(MyGRU, self).__init__()# transform text to embedding representation# [b, 80] => [b, 80, 100]self.embedding = layers.Embedding(total_words, embedding_len,input_length=max_review_len)# [b, 80, 100] , h_dim: 64self.rnn = keras.Sequential([# return_sequences=True表示返回每一个时间戳上的状态layers.GRU(units, dropout=0.5, return_sequences=True, unroll=True),layers.GRU(units, dropout=0.5, unroll=True)])# fc, [b, 80, 100] => [b, 64] => [b, 1]self.outlayer = layers.Dense(1)def call(self, inputs, training=None):"""net(x) net(x, training=True) :train modenet(x, training=False): test:param inputs: [b, 80]:param training::return:"""# [b, 80]x = inputs# embedding: [b, 80] => [b, 80, 100]x = self.embedding(x)# rnn cell compute# x: [b, 80, 100] => [b, 64]x = self.rnn(x)# out: [b, 64] => [b, 1]x = self.outlayer(x)# p(y is pos|x)prob = tf.sigmoid(x)return probdef main():units = 64epochs = 4# 训练model = MyGRU(units)pile(optimizer=keras.optimizers.Adam(0.001),loss=tf.losses.BinaryCrossentropy(),metrics=['accuracy'],experimental_run_tf_function=False)model.fit(db_train, epochs=epochs, validation_data=db_test, verbose=2)# 测试model.evaluate(db_test)if __name__ == '__main__':main()

x_train shape: (25000, 80) tf.Tensor(1, shape=(), dtype=int64) tf.Tensor(0, shape=(), dtype=int64)x_test shape: (25000, 80)Epoch 1/4195/195 - 52s - loss: 0.5052 - accuracy: 0.6381 - val_loss: 0.3745 - val_accuracy: 0.8345Epoch 2/4195/195 - 18s - loss: 0.3206 - accuracy: 0.8562 - val_loss: 0.3591 - val_accuracy: 0.8424Epoch 3/4195/195 - 18s - loss: 0.2638 - accuracy: 0.8901 - val_loss: 0.3850 - val_accuracy: 0.8391Epoch 4/4195/195 - 18s - loss: 0.2180 - accuracy: 0.9120 - val_loss: 0.4037 - val_accuracy: 0.8354

参考:

RNN龙良曲

如果对您有帮助,麻烦点赞关注,这真的对我很重要!!!如果需要互关,请评论留言!

如果觉得《深度学习TF—10.循环神经网络RNN及其变体LSTM GRU实战》对你有帮助,请点赞、收藏,并留下你的观点哦!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。