Learning notes from DataBricks talks Optimizing File Loading And Partition DiscoveryData loading is the first step of spark application, when dataset ...
Starting a new journey
June 28th, marked the end of my journey at SurveyMonkey, a great company I had worked for more than 3 years. It’s a bittersweet heart to say goodbye. ...
写给Daphne的诗
第一章: 萌芽 你要问我,我们的故事从哪儿开始, 走出考场的那一刻,我以为将是故事的结局 而微信上的只言片语,难道只是我一如既往的淡定? 也许大家都羡慕一见钟情, 可比一见钟情更浪漫的,是一聊倾心 第二章:启 城 即便我有一双翅膀,我也会将它折断 因为唾手可得的,到头来也可能只是冷面 而纵览八百里路 ...
Fun topics in distributed system
During the first days of learning distributed system design, we heard a lot buzzwords and technologies, and we are busy with learning one after one. ...
Hidden Companies (Toronto)
There are a lot job websites we use to seek a job, like LinkedIn, GlassDoor, Indeed, Monster. But there is still a ton of jobs outside those popular s ...
NLP in big companies
In this blog post, I am trying to find some good examples of building NLP applications in reality. A good starter point is to find out how some other ...
Natural Language Processing 101
This is a very simple and naive introductory to summary the knowledge in natural language processing, based on my self learning. What is Natural Langu ...
Searching with bloom filter
Problem statementOur platform is sending 4 million emails per day, and many of them contains a lot user generated content which has potential risk of ...
Compare streaming frameworks
The first streaming framework I got to know is Apache Spark, my team owns a small spark cluster which has 1 leader and 4 followers(It is said that mas ...
Notes on data science self learning
Tons of resources online will get you distracted a lot, a good way is to have your own learning path and keep focus. I got this idea from two people: ...