mindspore.nn.LSTM — MindSpore master 文档

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

挂过科的企鹅 · LBound 函数 (Visual ...· 5 月前 ·

怕考试的打火机 · ireader吧-百度贴吧--书友聚集地-- ...· 7 月前 ·

满身肌肉的炒饭 · 🏗️ 安装报错留言区（Summary of ...· 8 月前 ·

完美的海龟 · .net core ...· 1 年前 ·

腼腆的卡布奇诺 · Flutter - Dart - ...· 1 年前 ·

class mindspore.nn. LSTM ( * args , ** kwargs ) [源代码] 

长短期记忆（LSTM）网络，根据输入序列和给定的初始状态计算输出序列和最终状态。

在LSTM模型中，有两条管道连接两个连续的Cell，一条是Cell状态管道，另一条是隐藏状态管道。将两个连续的时间节点表示为 \(t-1\) 和 \(t\) 。指定在 \(t\) 时刻输入 \(x_t\) ，在 \({t-1}\) 时刻的隐藏状态 \(h_{t-1}\) 和Cell状态 \(c_{t-1}\) 。在 \(t\) 时刻的Cell状态和隐藏状态使用门控机制计算得到。输入门 \(i_t\) 计算出候选值。遗忘门 \(f_t\) 决定是否让 \(h_{t-1}\) 学到的信息通过或部分通过。输出门 \(o_t\) 决定哪些信息输出。候选Cell状态 \(\tilde{c}_t\) 是用当前输入计算的。最后，使用遗忘门、输入门、输出门计算得到当前时刻的Cell状态 \(c_{t}\) 和隐藏状态 \(h_{t}\) 。完整的公式如下。

\[\begin{split}\begin{array}{ll} \\ i_t = \sigma(W_{ix} x_t + b_{ix} + W_{ih} h_{(t-1)} + b_{ih}) \\ f_t = \sigma(W_{fx} x_t + b_{fx} + W_{fh} h_{(t-1)} + b_{fh}) \\ \tilde{c}_t = \tanh(W_{cx} x_t + b_{cx} + W_{ch} h_{(t-1)} + b_{ch}) \\ o_t = \sigma(W_{ox} x_t + b_{ox} + W_{oh} h_{(t-1)} + b_{oh}) \\ c_t = f_t * c_{(t-1)} + i_t * \tilde{c}_t \\ h_t = o_t * \tanh(c_t) \\ \end{array}\end{split}\]

其中 \(\sigma\) 是sigmoid激活函数， \(*\) 是乘积。 \(W, b\) 是公式中输出和输入之间的可学习权重。例如， \(W_{ix}, b_{ix}\) 是用于从输入 \(x\) 转换为 \(i\) 的权重和偏置。

详细信息可见论文 LONG SHORT-TERM MEMORY 和 Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling 。

LSTM隐藏了整个循环神经网络在序列时间步(Time step)上的循环，送入输入序列、初始状态，即可获得每个时间步的隐藏状态(hidden state)拼接而成的矩阵，以及最后一个时间步对应的隐状态。我们使用最后的一个时间步的隐藏状态作为输入句子的编码特征，送入下一层。公式为：

\[h_{0:n},(h_{n}, c_{n}) = LSTM(x_{0:n},(h_{0},c_{0}))\]

input_size (int) - 输入的大小。

hidden_size (int) - 隐藏状态大小。

num_layers (int) - 网络层数。默认值：1。

has_bias (bool) - Cell是否有偏置项 b_{ih} 和 b_{fh} 。默认值：True。

batch_first (bool) - 指定输入 x 的第一个维度是否为batch_size。默认值：False。

dropout (float, int) - 指的是除第一层外每层输入时的dropout概率。默认值：0。dropout的范围为[0.0, 1.0)。

bidirectional (bool) - 是否为双向LSTM。默认值：False。

x (Tensor) - shape为 \((seq\_len, batch\_size, input\_size)\) 或 \((batch\_size, seq\_len, input\_size)\) 的Tensor。

hx (tuple) - 两个Tensor(h_0,c_0)的元组，数据类型为mindspore.float32或mindspore.float16，shape为 \((num\_directions * num\_layers, batch\_size, hidden\_size)\) 。 hx 的数据类型必须与 x 相同。

seq_length (Tensor) - 输入batch的序列长度。Tensor的shape 为 \((batch\_size)\) 。默认：None。这里输入指明真实的序列长度，以避免使用填充后的元素计算隐藏状态，影响最后的输出。推荐这种输入方法。

输出：

Tuple，包含 ( output , ( h_n , c_n ))的元组。

output (Tensor) - shape为 \((seq\_len, batch\_size, num\_directions * hidden\_size)\) 的Tensor。

hx_n (tuple) - 两个Tensor (h_n, c_n)的元组，shape都是 \((num\_directions * num\_layers, batch\_size, hidden\_size)\) 。

TypeError - input_size ， hidden_size 或 num_layers 不是int。

TypeError - has_bias ， batch_first 或 bidirectional 不是bool。

TypeError - dropout 既不是float也不是int。

ValueError - dropout 不在[0.0, 1.0)范围内。

支持平台：

Ascend GPU CPU

     >>> net = nn.LSTM(10, 16, 2, has_bias=True, batch_first=True, bidirectional=False)
>>> x = Tensor(np.ones([3, 5, 10]).astype(np.float32))
>>> h0 = Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> c0 = Tensor(np.ones([1 * 2, 3, 16]).astype(np.float32))
>>> output, (hn, cn) = net(x, (h0, c0))
>>> print(output.shape)
(3, 5, 16)