This is a very simple and naive introductory to summary the knowledge in natural language processing, based on my self learning.

What is Natural Language Processing?

Natural Language Processing (NLP) is an important sub category in Artificial Intelligence that enabling computers to understand and process human languages, it tries to get computers closer to a human-level understanding of language.

Some research topics in NLP

Information Retrieval/Extraction/Filtering
Machine Translation
Document/Topic Classification/Summarization
Question Answering
Text Mining
Sentiment Analysis
Speech Recognition
Machine Writing/Content Genetation

Statistical Language Models

This is to compute the probability of a sentence or sequence of words.

N-Gram

N-gram is a popular statistical language model.
After building a model, we usually use cross-entropy and perplexity to evaluate the model.
Lower perplexities correspond to higher likelihoods, so lower scores are better on this
metric.

A major concern in language modeling is to avoid the situation p(w) = 0, which could arise as a result of a single unseen n-gram, the solution is using smoothing methods, some smoothing methods includes:

Add-One(Laplace) smoothing
Good-Turing smoothing
Kneser-Ney smoothing
Witten-Bell smoothing

Bag of Words

A sentence/document is represented by the counts of distinct terms that occur within it. Additional information, such as word order, POS tag, semantics and syntax etc, are all discarded.

Probabilistic Graphical Models

This is an important math theory/algorithm used in NLP tasks.

Bayesian Network
Markov Network
Condition Random Fields
Hidden Markov Models
Estimation Maximization
Max Entropy

Topic Model

Latent Dirichlet Allocation (LDA): Based on probabilistic graphical models
LSA: Uses Singular Value Decomposition (SVD) on the Document-Term Matrix. Based on Linear Algebra
NMF: Non-Negative Matrix Factorization – Based on Linear Algebra

Some popular tasks in NLP

These are some tasks that may not be the solution to any particular NLP problem but are done as pre-requisites to simplify a lot of different problems in NLP. These are pretty much like reading comprehension we learn in school.

Parts of Speech Tagging

Identify Proper nouns, Common nouns, Verbs, Adjectives, Preposition etc.

Name Entity Recognition

Identify name of people, location etc.

Tokenization

Morphosyntactic Attributes

Deep Learning in NLP

Word2Vec

Previously, there are some other popular distributed representation of word as vectors, like Tf-Idf.
But they are sparse and long which is not computing efficient. Word2Vec instead is a dense vector representation of words(commonly 100-500 dimensions). and it models the meaning of a word as an embedding.

But how to get the dense vectors? Singular value decomposition(Latent Semantic Analysis) can be used, but a more successful way is through neural network inspired learning strategy.

CBOW: Predict center/target word based on context words
Skip-grams: Predict context words based on center/target word.

Other vector based models include: fastText, Doc2Vec, GloVe etc.

RNN

CNN

Stay tuned…

Highly recommend https://people.cs.umass.edu/~miyyer/cs585/ as 101 course for NLP.
More advanced courses:
https://github.com/lovesoft5/ml/tree/master/NLP-%E5%93%A5%E4%BC%A6%E6%AF%94%E4%BA%9A%E5%A4%A7%E5%AD%A6