In this blog post, I am trying to find some good examples of building NLP applications in reality. A good starter point is to find out how some other companies build their platforms.
Uber
NLP platform to process customer support tickets
In practice, we may not need very fancy algorithms and some simple ones just work, at least it’s a good starting point of building an engineering product.
Uber tried to use NLP to process customer support tickets with a classification model(logistic regression), Data processing is always the key task before machine learning, like how to encoding ticket and transform text, category to numerical vectors, apparently word2vec can be used here.
In the future, more complex and high performance algorithms an deep learning frameworks can be adopted like WordCNN.
In terms of the foundation platform, Uber utilizes Spark + hive for big data processing and scaled prediction.
https://eng.uber.com/nlp-deep-learning-uber-maps/
Uber one click chat
This is a smart reply system that auto-reply to user messages.
From this system, we can have a sense that a typical machine learning platform usually has two components:
Offline training:
Using NLP and ML pipelines to do intent detection. Here is where NLP models got applied like Doc2Vec modelOnline serving:
A message will be encoded as fixed-length vector representation via the pre-trained Doc2vec model, after which the vector and the intent detection classifier will be used to predict the message’s possible intent.
The system then retrieve the most relevant replies based on the detected intent and surface them to the driver-partner receiving the message
https://eng.uber.com/one-click-chat/
Airbnb
Airbnb built an online risk mitigation system which it mentioned some requirements of a machine learning platform:
- Fast
- Robust
- Scale
Although it used an open source framework(OpenScoring) for it, we should know the typical pipeline is the same most of the times like what we saw in Airbnb platforms above. Feel free to check out some characteristics of OpenScoring.
https://medium.com/airbnb-engineering/architecting-a-machine-learning-system-for-risk-941abbba5a60
https://medium.com/airbnb-engineering/scaling-spark-streaming-for-logging-event-ingestion-4a03141d135d
Zendesk
Zendesk summarizes customer support tickets to topics.
A big take-way is how they support 50k models on a daily basis with AWS Batch,
AWS batch supports auto scaling and job management, it also provides GPU support.
In terms of job management, this is a hot topic and demand need for building large scale platforms. A lot products emerges in this regards, like Airflow.
Cortex