[20161009]CS297 Weekly Report

Incorporate Word2Vec Similarity into Vectors

  • Corpus used in Word2Vec, From Google News, 3M words
  • Weight_i_new = Weight_i_old + SUM{W0_old*Similarity(i,0) +W1_old*Similarity(i,1) +…+Wn_old*Similarity(i,n)}
  • ConditionalProbability_i = (Wi_new+1)/ (SUM(W0+W1+…Wn) + Vocab_size)

Issue

  1. speed issue of multiplying similarity matrix to feature vector (2E6 x 2E6 for only 4 categories)
    1. numpy matrix multiplication
      • Matrix_new_weight = Matrix_old_weight  * Matrix_Similarity
    2. Cython complies into c code
  2. calculated similarity sum deviate the original vector weight too much
    1. use factor to decrease the weight –> W1 = W1 + factor * Wnew
  3. similarity of features becomes negative –> filter out

screenshot.png

Analysis on Similarity Matrix:

38% Positive; 54% Zero; 8% Negative

 

Max: 2057.21140249

Min: 0.0

Use scale factor to tune the vectors:

vector = W +scale_factor * Wsimilarity

Result

screenshot.png

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s