Bag of Words
Last updated
Last updated
Bag of words is currently a state of the art Method of measuring the similarity between 2 Texts
Given 2 Texts, the occurence of each word in both texts is counted. When in the same words occurred in two texts multiple times, bag of words will tell us that they are simmilar to each other
Given X Documents with N different words, the bag of Words algorithm projects them into an N dimensional Space (Hyperplane) .
In This Hyperplane, every Document is just a point. Now we can measure similarities between individual points with regular linear algebra aka euklidean distance .