NLP in big companies

In this blog post, I am trying to find some good examples of building NLP applications in reality. A good starter point is to find out how some other companies build their platforms.

Uber

NLP platform to process customer support tickets

In practice, we may not need very fancy algorithms and some simple ones just work, at least it’s a good starting point of building an engineering product.
Uber tried to use NLP to process customer support tickets with a classification model(logistic regression), Data processing is always the key task before machine learning, like how to encoding ticket and transform text, category to numerical vectors, apparently word2vec can be used here.

In the future, more complex and high performance algorithms an deep learning frameworks can be adopted like WordCNN.

In terms of the foundation platform, Uber utilizes Spark + hive for big data processing and scaled prediction.

https://eng.uber.com/nlp-deep-learning-uber-maps/

Uber one click chat

This is a smart reply system that auto-reply to user messages.

From this system, we can have a sense that a typical machine learning platform usually has two components:

  • Offline training:
    Using NLP and ML pipelines to do intent detection. Here is where NLP models got applied like Doc2Vec model

  • Online serving:
    A message will be encoded as fixed-length vector representation via the pre-trained Doc2vec model, after which the vector and the intent detection classifier will be used to predict the message’s possible intent.

The system then retrieve the most relevant replies based on the detected intent and surface them to the driver-partner receiving the message

https://eng.uber.com/one-click-chat/

Airbnb

Airbnb built an online risk mitigation system which it mentioned some requirements of a machine learning platform:

  • Fast
  • Robust
  • Scale

Although it used an open source framework(OpenScoring) for it, we should know the typical pipeline is the same most of the times like what we saw in Airbnb platforms above. Feel free to check out some characteristics of OpenScoring.

https://medium.com/airbnb-engineering/architecting-a-machine-learning-system-for-risk-941abbba5a60
https://medium.com/airbnb-engineering/scaling-spark-streaming-for-logging-event-ingestion-4a03141d135d

Zendesk

Zendesk summarizes customer support tickets to topics.

A big take-way is how they support 50k models on a daily basis with AWS Batch,
AWS batch supports auto scaling and job management, it also provides GPU support.
In terms of job management, this is a hot topic and demand need for building large scale platforms. A lot products emerges in this regards, like Airflow.

https://medium.com/zendesk-engineering/zendesk-ml-model-building-pipeline-on-aws-batch-monitoring-and-load-testing-8a7decbb5ad9

Twitter

Cortex

LinkedIn