MLG 022 Deep NLP 1
Jul 28, 2017
Click to Play Episode

Recurrent Neural Networks (RNNs) and Word2Vec.

Resources
Resources best viewed here
Speech and Language Processing
Natural Language Processing - Stanford University
CS224N: Natural Language Processing with Deep Learning | Winter 2019
Show Notes

See resources on Deep Learning episode.

Deep NLP pros

  • Language complexity & nuances
    • Feature engineering / learning
    • Salary = degree*field, not +
    • Multiple layers: pixels => lines => objects
    • Multiple layers of language
  • Once model to rule them all; E2E models

Sequence vs non-sequence

  • DNN = ANN = MLP = Feed Forward
  • RNNs for sequence (time series)

RNNs

  • Looped hidden layers, learns nuances by combined features
  • Carries info through time: language model
  • Translation, sentiment, classification, POS, NER, ...
  • Seq2seq, encode/decode

Word2Vec

  • One-hot (sparse) doesn't help (plus sparse = compute)
  • Word embeddings
    • Euclidean distance for synonyms / similar, Cosine for "projections" . king + queen - man = woman
    • t-SNE (t-distributed stochastic neighbor embedding)
  • Vector Space Models (VSMs). Learn from context, predictive vs count-based
  • Predictive methods (neural probabilistic language models) - Learn model parameters which predict contexts
    • Word2vec
    • CBOW / Skip-Gram (cbow predicts center from context, skip-gram context from center. Small v large datasets)
    • DNN, Softmax hypothesis fn, NCE loss (noise contrastive estimation)
  • Count-based methods / Distributional Semantics - (compute the statistics of how often some word co-occurs with its neighbor words in a large text corpus, and then map these count-statistics down to a small, dense vector for each word)
    • GloVe
    • Linear algebra stuff (PCA, LSA, SVD)
    • Pros (?): faster, more accurate, incremental fitting. Cons (?): data hungry, more RAM. More info
  • DNN for POS, NER (or RNNs)