MLA 008 Exploratory Data Analysis
Oct 26, 2018
Click to Play Episode

EDA + charting. DataFrame info/describe, imputing strategies. Useful charts like histograms and correlation matrices.


Resources
Resources best viewed here
StatQuest - Machine Learning
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 3rd Edition


Show Notes

Nulls, mean, median

  • df.info() - dtypes, nulls
  • df.describe(): count, mean, std, min/max, quartiles

Line, scatter

Outliers: histogram, box plots

  • Remove outliers? RobustScaler?

Correlation matrices