Computing Document Similarity with nltk

We will explore techniques to determine the amount of similarity between documents. Specifically we will look at the intuition behind tf-idf and cosine similarity. With that as a foundation we will see how to compute these metrics with the natural language tool kit.

Advertisements

Focus

For a software craftsman, focus is essential to build quality software. Interrupts are the enemy of focus. While not all interrupts are avoidable like meetings, interviews or even a colleague with a question. There are certain things that can be controlled like your inbox and distractions like news, facebook, twitter or any other form of content readily available. The question is how? A friend suggested a system which has worked great for me. It’s called the Pomodoro.

All a Pomodoro is a way to break down work   in chunks of  25 minutes before you take a break. I use this nifty web application called moosti that helps me keep time.  Ultimately it is a mental hack that allows me to give permission to myself to be present and to focus on the task at hand. With the timer counting down I tend to close all other tabs.

The added benefit of this system is it allows  me to measure how productive my day was. I just need to count the number of successful Pomodoro chunks in the day. I would love to know about how you focus on your tasks?

 

Learn to Type Before You Learn to Code.

Once you put yourself on the path of radical self-improvement you start looking at the fundamentals. For me it was the realization that I couldn’t touch type. Having grown-up in the era of instant messaging,  I learned typing the wrong way. While I was a fast typist my finger placement was completely random not relying on the home row that the keyboard provides. Bad techniques are hard to eliminate once imbibed. Now this is a generic enough skill that you should  learn even if you aren’t a programmer given that you will type something.

So this winter I started working through drills on gtypist. Gtypist is a shell based utility that teaches you to type the proper way. Ratatype is  a web based typing tutor that works well too. This was time well invested as  now I can indeed touch-type and spend almost no time looking at the keyboard, thus boosting productivity.

The nice thing is once you start learning your tools you make use of the cues that the tools provide. In case of a standard qwerty keyboard I discovered the home row and the raised bars on the f and j keys. Those two bars alone give your fingers the entire map of the keyboard. This also highlights the need to slow down, be mindful and make use of the full potential that your tools offer. If I were to start all over again I would learn to type first.

If you liked this do read Being Craftsman the book. http://itunes.apple.com/us/book/id1139490631