See resources on Deep Learning episode.
RNN Review
- Vanilla: When words + running context is sufficient.
- POS, NER, stocks, weather
- Bidirectional RNN (BiLSTM): When stuff from right helps too
- Encoder/decoder or Seq2seq: When you should hear everything first / spin a different way
- Classification, sentiment, translation
- Now w/ word embeddings
Train: backprop through time
- Vanishing/exploding gradient
LSTMs
- ReLU vs Sigmoid vs TanH (Nonlinearities future episode)
- Forget gate layer
- Input gate layer: decides which values to update
- Tanh layer: creates new candidate values
- Output layer