Thinking about software, life, the universe and everything.

Computing Document Similarity with nltk

by hkelkar

We will explore techniques to determine the amount of similarity between documents. Specifically we will look at the intuition behind tf-idf and cosine similarity. With that as a foundation we will see how to compute these metrics with the natural language tool kit.

RSS - Posts
RSS - Comments