Errata: 22:21 "cat & car different by one word" should be "different by one letter"
Syntax vs Semantics
Parts
- Corpus
- Lexicon
- Morphology
- Lemmas & Stems (reduce morphological variation; lemmatization more sophisticated)
- Tokens
- Stop words
- Edit-distance
- Word sense disambiguation
Syntax / Tasks
- Info Extraction (POS, NER, Relationship extraction)
- Parsing
Goals
- Spell check
- Classification
- Tagging (topic modeling / keyword extraction)
- Sentiment analysis
- Search / relevance, document similarity
- Natural language understanding
- Question answering
- Textual entailment
- Machine Translation (AI-complete)
- NLU vs NLP
- Natural language generation
- Image captioning
- Chatbots
- Automatic summarization
- Won't cover
- Optical character recognition (OCR)
- Speech (TTS, STT, Segmentation, Diarization)