MLG 027 Hyperparameters 1
Jan 27, 2018
Click to Play Episode

Hyperparameters part 1: network architecture

Show Notes
  • Hypers future & meta-learning
    • We're always removing hypers. DL removed feature-engineering `
  • Model selection
    • Unsupervised? K-means Clustering => DL
    • Linear? Linear regression, logistic regression
    • Simple? Naive Bayes, Decision Tree (Random Forest, Gradient Boosting)
    • Little data? Boosting
    • Lots of data, complex situation? Deep learning
  • Network
    • Layer arch
      • Vision? CNN
      • Time? LSTM
      • Other? MLP
      • Trading LSTM => CNN decision
    • Layer size design (funnel, etc)
  • Activations / nonlinearity https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6
    • Output
      • Sigmoid = predict probability of output, usually at output
      • Softmax = multi-class
      • Nothing = regression
    • Relu family (Leaky Relu, Elu, Selu, ...) = vanishing gradient (gradient is constant), performance, usually better
    • Tanh = classification between two classes, mean 0 important