Phased Long Short Term Memory is an improvement on the well known Long Short Term Memory units. Its main advantage is its ability to deal with data that do not follow a simple sequence and data with long timesteps.
Phased LSTM differs from LSTM by the possession of an additional gate called the time gate. The phased version is very efficient and performs better than basic LSTMs, even when given fewer data.
Table of contents
A Brief of the Model
PLSTM was introduced in Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences. The aim was to create an RNN model that would be well-suited for irregular data that do not follow a sequence.
PLSTM solves this problem using a special gate called the time gate. The computation of PLSTM is quite similar to that of LSTM. The update gate, output gate, and the forget gate are computed as usual by using a sigmoid activation on the linear combination of the cell’s input, previous cell’s output, and the cell parameters.
The time gate in PLSTM, in essence, controls updating the cell state and the hidden state which are similar in meaning to that of the LSTM. This means that these states are only updated when the time gate is open, and the time gate is only opened when an event occurs. This inherently means that the cell state and the hidden state are only updated when an event occurs. With this mechanism, PLSTMs are able to handle data with irregular sequences.
The time gate oscillates between three phases: the opening phase from 0 to 1, the closing phase from 1 to 0, and the closed phase. Its oscillation is controlled by 3 learnable parameters: 𝒻 which controls the real-time oscillation period, rₒ the ratio of the duration of the open phase to the full period, and s the phase shift of the oscillations of the gate.
LSTM vs. PLSTM Update Equation
In LSTM the update gate and output gate control updating the cell state and the hidden state. This means that the cell state and the hidden state are always updated so long as the update and output gates permit it and there is no gate controlling the occurrence of a novel event.
This is well illustrated by the LSTM equations below. We can see that LSTMs have no mechanism for taking care of triggered or novel events that may occur during runtime.
In contrast, PLSTM has an additional gate called the time gate which updates the cell state and hidden state only when an event occurs, thereby saving information for a longer period and allowing it to detect novel data from triggered sensors or unusual firing of a neuron in the network.
Why use PLSTM?
Traditional RNNs are very useful because of the possession of a memory cell which helps carry information over several layers of a network, making them very efficient for dealing with sequential data. However, data do not always have a regular sequence, this is where the PLSTMs come in.
The PLSTMs’ units are needed when dealing with, for example, autonomous vehicles and robots. In autonomous vehicles, there are several sensors that provide several signals at the same time, which we would like our neural network to be aware of.
The model makes the process of integrating these types of data from several hundred sensors into a neural network much easier compared to other RNN models.
The model also converges faster than other RNN models when applied to normal RNN problems and converges in situations where other models fail to converge. Its nature — from the time gate which allows it to preserve information and carry such information to deeper layers in the network — allows it to converge faster.
The time gate mechanism allows PLSTM to decay only during the open periods when the memory is updated. LSTM decays exponentially at every timestep as the memory updates, and this makes it not converge during backpropagation in problems involving sequences with long timesteps.
The model is also computationally inexpensive because when we set rₒₙto about 5%, we end up setting the neurons to off for 95% of the time. Also, it is believed that the time gate performs some form of regularization because it acts as a form of dropout in the network.
In summary, PLSTMs are a good alternative to LSTM as it better preserves information with very little decay. It is also well-suited for problems involving data with long timesteps and asynchronous data.
Nonetheless, it takes a longer period of time to train a PLSTM model than it would take you to train an LSTM or GRU model. This is because of the additional computation in PLSTM. This might make you prefer to use LSTM in simple problems that do not necessarily need the extra functionality of PLSTM.