[160624] Weekly Report of CS297

Summary

Reviewed 1 paper focusing on text classification . The paper discussed about comparing the accuracy of classification between using web2Vec, Doc2Vec model and bag of words representation as the feature. The results show that web2Vec, Doc2Vec models offer higher accuracy.

Process pipeline

Document->Text processing –> Word2Vec/Doc2Vec feature generation –> (IDF/TF-IDF weight adjustment) –> Train Classifier –> Evaluate performance

Pros:

  • Introduced neural network based Word2Vec and Doc2Vec Model
  • Coupled word vector with weighting strategy like IDF and TF-IDF

Cons:

  • Training set and testing set used the same dataset, the accuracy is not persuasive
  • Only logistic regression is used. Can try more classifiers
  • Only bag of words is used for baseline. Can try more models, e.g. LDA

Reference:

Jiang, Suqi, et al. “Integrating rich document representations for text classification.” 2016 IEEE Systems and Information Engineering Design Symposium (SIEDS). IEEE, 2016.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s