[20160910] Weekly Report of CS297

Proposed Project Name

Categorize Text with Naive Bayes and word2vec word embedding

Literature Review

screenshot.png

screenshot.png

DataSet

yelp review  –> based on review to predict business category

pros: easy to get dataset;

cons: the business category seems to be obvious

Twitter thread –> based on thread content to predict the category

pros: makes more sense of category prediction

cons: hard to get labeled dataset;

Implementation Plan

  1. extract yelp dataset
    1. visualize dataset
    2. explore the number of categories
  2. Use naive Bayes classifier from python package to classify business
  3. implement Naive Bayes Classifier using python
  4. implement word2vec enhanced classifier
  5. prototype classification by category with classic Naive Bayes Classifier
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s