[20160910] Weekly Report of CS297

Proposed Project Name

Categorize Text with Naive Bayes and word2vec word embedding

Literature Review




yelp review  –> based on review to predict business category

pros: easy to get dataset;

cons: the business category seems to be obvious

Twitter thread –> based on thread content to predict the category

pros: makes more sense of category prediction

cons: hard to get labeled dataset;

Implementation Plan

  1. extract yelp dataset
    1. visualize dataset
    2. explore the number of categories
  2. Use naive Bayes classifier from python package to classify business
  3. implement Naive Bayes Classifier using python
  4. implement word2vec enhanced classifier
  5. prototype classification by category with classic Naive Bayes Classifier

