Learn to Type Before You Learn to Code.

Once you put yourself on the path of radical self-improvement you start looking at the fundamentals. For me it was the realization that I couldn’t touch type. Having grown-up in the era of instant messaging,  I learned typing the wrong way. While I was a fast typist my finger placement was completely random not relying on the home row that the keyboard provides. Bad techniques are hard to eliminate once imbibed. Now this is a generic enough skill that you should  learn even if you aren’t a programmer given that you will type something.

So this winter I started working through drills on gtypist. Gtypist is a shell based utility that teaches you to type the proper way. Ratatype is  a web based typing tutor that works well too. This was time well invested as  now I can indeed touch-type and spend almost no time looking at the keyboard, thus boosting productivity.

The nice thing is once you start learning your tools you make use of the cues that the tools provide. In case of a standard qwerty keyboard I discovered the home row and the raised bars on the f and j keys. Those two bars alone give your fingers the entire map of the keyboard. This also highlights the need to slow down, be mindful and make use of the full potential that your tools offer. If I were to start all over again I would learn to type first.

If you liked this do read Being Craftsman the book. http://itunes.apple.com/us/book/id1139490631

The Best Books I Read in 2013

Here is a roundup of the best books that I read in 2013. I was inspired to write this post by the one written by Bill Gates http://www.thegatesnotes.com/Personal/Best-Books-2013.  I  learned something from each of these books.

1. How to Fail at Almost Everything and Still Win Big: Kind of the Story of My Life by Scott Adams.
This is an autobiography of Scott Adams of Dilbert fame where he outlines how he built his cartoonist career from his failed corporate career. The real value in this book is of learning how to think long term and craft systems that increase the probability of success.

2. Choose Yourself! – James Altucher
James Altucher is an American hedge fund manager, entrepreneur. Choose yourself is an interesting take on the current world and how to succeed in it. This is also a case study of how he turned his life around, very impressive.

3. Anti-Fragile -Nassim Taleb
In Anti-Fragile, celebrated statistican and essayist Nassim Nicholas Taleb explores how uncertainity is to be embraced by looking at phenomena that gain from disorder.

4. Mastery – Robert Greene
In Mastery, Robert Greene explores the life and career’s of Masters like V.S Ramachandran,  Paul Graham and Michael Faraday etc. Gives a good insight into the way Master’s of a craft think and their journey towards Mastery.

I would love to know about the best books that you read this year.

How I built a weather decision engine? or The story of wearthejacket.com

So as I was stepping out of my apartment last week, I thought in california I really don’t care about the temperature outside, all I want to know is if it is cold enough to wear a jacket or warm enough not to. I decided to build a weather based decision engine which does just that, figure out where I am, check the temperature and give me a decision. I also wanted it to be blazing fast and scalable.
The end result was http://wearthejacket.com/[Update:shutting this service down today on October 9th 2014, after almost an year of  100% uptime. This was a fantastic learning experience]

The first step was to think how I could achieve this. After some thought I came up with this sketch.



I was aware of existing geolocation api’s  which translate an IP address to a location. That led me to
http://freegeoip.net/, which is a great api for a project like this with access to 10000 api calls per hour before getting throttled. This was more than sufficient for my needs.

The other component was getting the weather information. After googling for a bit, I came across forecast.io which though being a robust api had a free cap of 1000 calls per day and a nominal payment after that.
The hosting was on a AWS(Amazon web services) small ubuntu linux instance.
I decided to use tornado over apache mainly due to its low memory consumption during idle time and since I was going to write this in python. The decision to use Redis was a simple one as I would definitely need to cache some values as the end user requests came in.

This was the first pass I wrote.
1. User comes to webserver,
2. webserver queries geolocation api and obtains end users coordinates and location.
3. Based on the coordinates we query a weather api and
4. Finally take a decision based on the prevailing temperature. The decision algorithm being very simple based on a pre-set threshold below which wearing a jacket is advised. This is a todo for future  enhancement.

Note that I had not written the caching mechanism yet. There lies the fun part. At this point the decision was taking well over 1 second. Certainly not acceptable for a public facing web application.

Enter Redis. Redis is an in-memory key-value data store. That makes lookups blazing fast. The first thing that I needed to do was to cache the location information that we pulled from the geo-location api . The location information that the application needed were the latitude, longitude and actual city name. This made it a good candidate for the usage of a Redis Hash.

So the first mapping was

HMSET IP location val latitude val longitude val

Since this data will not change rapidly even with dynamic ips we can keep these mappings forever and over time as user queries come in build our own database of ip’s to locations.

For the weather information, I made an assumption that weather conditions will be similar over 10 mins at a given location[debatable, but will fulfill most needs.]

This is a simple redis key value pair location:apparentTemperature the only caveat being we want it to expire every 10 minutes(configurable).

This is done easily in Redis via the setex command, with the invocation

SETEX key <expiration_time_in_seconds> value.

Once the cache mechanism was in place the benchmarking showed dramatic improvements. Sub 30 ms response times after the first api call was made. The first api call to the application though was still remarkably slow. Then I started looking at individual api calls to the external services.

There lied the answer, the forecast.io api was spewing out an entire days worth of data.
The fix was to append the forecast.io api call with


which had the effect of only giving back the current prevailing conditions.

Once this was done came the part to write tests. Not true Test driven development but I was’t launching without baseline tests. This part took me the longest time but greatly increased confidence in the code for launch. As of now it has run for over a day serving requests across the globe. Always write tests, preferably before even the first line of code.

Benchmarking after that indicated a theoretical capacity of 1.5 million requests/ day. Not bad for a tiny server, and the best part is that it can be horizontally scaled.[though I doubt I will do that considering it takes $$$ to keep servers running.]
The components are modular so that you use individual components. Would love to know your thoughts on how this project can be enhanced and or design decisions that you would make.

One more thing
Building this has been a great learning experience and to enable others to learn/critique
The source code is released under the GPLV3 license.

Rolling up data with Awk

One of the basic things that one does when dealing with numeric data sets is to add them up for some given attribute. Here is a subset of sample data of baseball statistics via http://seanlahman.com/baseball-archive/statistics/
The file used for the purpose of this post [Managers] is a list of Wins and losses by Baseball team managers from the late 1800’s to 2012. Lets try to roll up the wins and losses per manager to calculate the total wins and losses for each team manager. Do download the data file to see the raw data. (Note that .key files  are just csv so named to get around a wordpress restriction and therefore can be opened with a text editor/openoffice/excel)

How can this be done?
Early 21st century method:
Use Excel to calculate totals manually(sigh), write a macro if you are more adept.

2014 method:
Write a python program Use the csv library in python to read the file, then keep a dictionary of form {category1:{attrib1:val1, attrib2:val2….attribn:valn},category2:{attrib1:val1,attrib2:val2,attrib3:val3…attribn:valn}}
Then as you pass over each row update the sums for each attribute while checking if the category you are referencing exists, if not create a entry in the dictionary and repeat till end of the file.

Lets see another way to do this right out of the 1970’s:
Say hello to Awk.  Awk is an interpreted language designed specifically for extracting and manipulating text. Awk natively interprets tabular data. How cool is that? The nice part is awk is shipped with any standard linux/unix distribution. For those still in the windows world, installing cygwin will get you awk.

The anatomy of an awk program is simple: pattern {action} filename with optional BEGIN and END patterns which refer to actions preceding and after the file is read.

To roll up the wins and losses per team manager from the data file that we have,  we use a concept called associative array. Wait associative what? An Associative array is a data structure which can be indexed by anything(typically a string). While this may not seem any different than a python dictionary, the magic lies in the fact that this is applied across the file without any need for iterating over the file explicitly. Lets see the actual code that will do this. Save the following script as sum_wins_and_losses.awk and apply a chmod 755 so that it can execute.

#!/bin/awk -f
   print "manager,total_wins,total_losses"
   for (i in total_wins){
   if(i != "")
   print i,total_wins[i],total_losses[i]

In the Begin block we define the field separator(FS) and the output field separator(OFS)  as a comma in addition to initializing arrays that we intend to use. The OFS determines how the data will be separated on output of the program.
By default awk interprets space separated files. Once the FS is established
you can refer to any column by its index ie. the first column of the data table can be referred to by $1, the second by $2 and so on. It is a good practice to assign these to variables .That enables you to make changes easily at a central point when there is a need to change the column position in the code. Typical use case would be to adapt the program for a file with additional columns, with the current columns appearing at different position’s.

The third block is where the magic begins, we index the arrays that we defined by the field that we want to roll up our data by.  In this case we use manager.


All that this snippet of code does is that if the manager is “foo”  the array bucket of total_wins indexed by “foo” will hold the total wins achieved by foo. This is so since the operation += wins is applied across the entire file and adds any wins achieved by foo to the same index. This is done for all unique managers and we are left with rolled up values of wins and losses by manager for the entire dataset.

Now for the finale , in the END block all we are doing is iterating over the indexed associative array and spewing out the rolled up data. This will be to the console.

The actual program can be executed by invoking the following snippet which redirects the output to a file.

awk -f sum_wins_and_losses.awk Managers.key >rolled_up_file.key

Open the rolled_up_file to see total Wins and Losses by the manager. Next time you are faced with manipulating tabular data, think awk!


1. http://www.grymoire.com/Unix/Awk.html.



If all you have is a hammer, everything looks like a nail -Abraham Maslow

Building quality software relies on a myriad of tools.  One essential pillar of craftsmanship software or otherwise is in the mastery of tools that you work with. Some tools that I think are important for a software craftsman:

The Mind:
The most important and essential tool that you have is your mind.
All software originates as a thought so having a clear thought process is necessary.
Unfortunately we have become accustomed to google, facebook and twitter. Being mindful of what you are trying to achieve is important. There is a quote from the TV show Sherlock that has stuck with me “People fill their heads with all kinds of rubbish. And that makes it hard to get at the stuff that matters. Do you see?” Treat your mind as a garden and only let in thoughts that should be nurtured, getting rid of weeds is essential.

The Operating System
I use xubuntu linux, which to me offers simplicity and power. Using Linux forces me to learn more about the underlying machine itself.
We must remember for all the clouds that are now available, ultimately it is a group of computers connected to a network. Quoting Larry Ellison “Google does not run on water vapor”. The choice of operating system that you go with will have a effect on how much of the underlying system you understand. Macs and Windows will abstract a lot of the underlying machine, which if you are an application developer may not be a bad thing  in terms of productivity gains. However gaining knowledge of underlying processes will be harder.

The Programming language:
There are a plethora of programming languages available and to persist with the thought that all languages are made equal is a fallacy.
Using the right language for the job at hand will go a long way in building successful products. Learning a different language than one which you use on a daily basis will create new ways of thinking. For instance try writing to a file first in Java, then in Python. While I am not getting in the argument of which is better, I do want to emphasize that languages have there own strengths.To deal with data in flat files, you are missing out if you do not use the trinity of awk,sed and grep(sed and grep being command line tools). They will provide in simplicitly for which you would be writing programs of hundreds of lines in a high level language.

The Text Editor
From notepad to vim to sublime text to even a IDE. The text editor is where you translate your thought to code. Structured information that can make computers do what you want it to do. Mastery of the text editor will determine how long it takes you to write code assuming that you can type.

Version Control:
Any Software that is not one time use should be maintained in a version control system. Git is my favorite, however depending on your work environment you could end up using svn or perforce. git has many subtleties that will be apparent with practice, know your git and never worry about losing source code.

The Debugger
While best avoided since you do want your mind to be the compiler, debuggers aid understanding both the program and data flow in complex projects. Mastering the debugger will go a long way in solving bugs in programs that you are unfamiliar with and sometimes in code that is familiar.

Databases are the foundation of the applications that we build.
We have SQL and not only SQL. For structured data SQL based databases are still golden.
Since real data is not always structured, that has forced the move to NOSQL databases.
Redis is the swiss-knife of data in key-value pairs. Learn about couchbase or mongodb for exposure to document based databases.

This list of tools is in no way comprehensive, Building mastery of tools is a long and arduous process. However taking tiny steps today go a long way and add up in a few years. More often than not the problem you are trying to solve will drive the tools that you will use. So try to solve as many different problems as you can, slowly but surely you will see a expanding toolkit.

The path I have laid for myself is to always be learning and teach what I  learn.  I would love to know your experiences in mastering the tools that you work with and ultimately in mastering your craft.

Two Questions for the Software Craftsman

To make a career move as a new Software Craftsman there are two questions you definitely need to ask :

1. Is it your time to learn or time to earn?
2. what are you willing to get good at?

I was lucky enough to ask both questions that originated at different sources, but then I do embrace randomness.  It is still my time to learn. What is interesting is once you are in the workforce, earn and learn are not mutually exclusive. So go ahead make the most of it. This is an opportunity for you to find out what do you really want to do. Once you decide that, don’t look back and be willing to invest the time it takes to become master of your craft.

Embrace Randomness

I think what makes life interesting is not predictability but randomness, I will give an example, On  Superbowl day  I met couple of friends at Crepevine in Palo Alo for brunch. While going back I took a wrong exit and ended up in the Stanford Campus. Now the roundabout there is almost a mile inside. Upon taking the U turn I noticed a museum midway that I had not on my earlier visits there.I pulled in and it turned out it was a museum established by none other than the Stanford family itself. I parked my car and went inside to see one of the best collection of artifacts and art that I have ever seen. (Added bonus entry and parking was free).So a wrong exit made my day richer.  Chance has a greater role in life than we like to admit.

Relating randomness to the Software world may seem strange but bear with me. The idea I seek to present is that random explorations may lead to serendipitous occurrences. Exploring things that you may not have encountered before goes a long way towards developing skills or adding new tools to your toolkit. For instance if you are a imperative style programmer, learning a functional language like Erlang  will expose you to a different style of thinking. If you  work with Mysql  then learning about a NoSql database like Redis will expose you to how unstructured data can be handled. It is of course difficult to predict how these random walks may be of immediate benefit but I think the real value lies in relating the disparate knowledge to a common thread. I think randomness is also what makes the Silicon Valley an interesting place to be. If you are a technology professional in the bay area chance encounters can lead to new opportunities or even just expose you to another dimension of the technology industry that you didn’t know existed.

One of the best ways to introduce randomness to your surfing habits is Stumbleupon, you can think of it as an gateway to the Internet which will recommend webpages based upon your interests. I came across PreyProject via stumbleupon where I went on to make my first ever open-source contributions which led me to eventually work at my current position. The other source for randomness I rely upon is hacker news.  In real life make use of meetup[not so random I admit]  or your favorite programming language’s user group[believe me even the most obscure language will have one here]. I think in case of real life the randomness begins once you show up which is not limited to showing up physically but also taking up projects that you care about. You like the girl you just met? Ask her out. So you want to write that killer app? Start working on it then, only then can randomness help you.

I think the charter I have laid down for myself is to embrace more randomness by showing up viz. building things I care about thus encountering problems that I haven’t seen before and in life in general. While I cannot vouch for all outcomes, I certainly hope it will be a interesting journey.

Good Algorithms beat Supercomputers

It is good to remember why well designed algorithms matter.  I decided to experiment with the well known Fibonacci number. I have this sticky note plastered on my monitor at work.

“Good Algorithms beat Supercomputers”. Lets take a glimpse into why that is the case.
The fibonacci recurrence is defined as:
F(N) = F(N-1) + F(N-2), N>1

and F(N) = N for N: 0 or 1

This will lead to a series of numbers in order 0,1,1,2,3,5,8,….
Translating this to code we have the naive algorithm as:

public static long naiveFibo(int n){
   if(n==1 || n==0){
    return n;
   return naiveFibo(n-1) + naiveFibo(n-2);

However this naive algorithm suffers from a fatal flaw. As the values of n get larger, we end up computing fibonacci values repeatedly for multiple m<n. This is computationally expensive as n becomes larger.

eg. Fib(5) = Fib(4) + Fib(3)

Fib(4) = Fib(3) + Fib(2)

Fib(3) = Fib(2) + Fib(1)

Fib(2)  = Fib(1) + Fib(0)

We can already see Fib(2) being computed thrice and Fib(3) twice with this input.

So what now?
Lets do an optimization. This particular technique is called memoization. In simple terms what we do now is store  the fibonacci value the first time it is computed in an array at that index.  eg arr[5] will hold the value for Fib(5).
What this does is each time we want to compute the Fibonacci value for a given n we first do a lookup in our array. If it does not exist we compute and store it. While this is a fairly well known technique that is found in any decent CS Textbook. It is good to experiment and see the true power of well designed algorithms. Lets see how the code looks like:

public static long memoizedFibo(int n){
 if (arr[n] == 0) {
   if (n == 0 || n == 1){
       arr[n] = n;
   }else {
       arr[n] = memoizedFibo(n - 1) + memoizedFibo(n - 2);
   return arr[n];

The client code follows in case you want to reproduce this on your machines:

public class Fibo {
  private static long[] arr = new long[100];

  public static void main(String[] args) {
  long startTime;
  double elapsedTime;
  for (int i = 0; i < 91; i++) {
    startTime = System.nanoTime();
    elapsedTime = (double) ((System.nanoTime() - startTime) / 1000000000);
    System.out.println("Time taken to compute Fib +" + i
+ " Memoized + " + elapsedTime + " Seconds");

for (int j = 0; j < 91; j++) {
    startTime = System.nanoTime();
    elapsedTime = (double) ((System.nanoTime() - startTime) / 1000000000);
    System.out.println("Time taken to Compute" + j +"Fib Naive" + elapsedTime + "Seconds");

// Add Fibonacci methods here


This seemingly simple change gives a massive speedup. I did some experimental runs on my 1.7 GHZ Core I5 with 4GB RAM.[Click on Image]

Continue reading

Being Craftsman

What does the word craftsman evoke? For me it is images of toolmakers building watches, shipbuilders working in tandem towards building massive vessels and of course developers cranking out code.

Progressing as a craftsman is a hard task. I think there are three key essentials for success.

1. The Guidance of Master Craftsmen/Women(henceforth referred to craftsman for the rest of this essay).

2. The burning desire to build, tweak and continuously refine.

3. The tenacity to work hard through the not so interesting parts.

Self learning is a good thing, but in the company of master craftsmen one can make giant leaps. As I rediscovered, working with the best evokes the desire to build better. There are techniques one can pick up by observation and osmosis.

While I was writing this post I rather serendipitously came across a documentary called “Jiro Dreams of Sushi” .(Available as of 30th Dec 2012 on Amazon Prime and Netflix streaming.) Jiro is a master Sushi chef who runs a Michelin 3 star restaurant in a Tokyo Subway.(Earning 3 stars from Michelin means that the place is so good that it is worthwhile to visit the country just for that restaurant.) Jiro at 85 years plus is someone who is at the peak of his craft and still aims to surpass it each day.

What resonated strongly was the idea that to achieve success you need to be so good that you cant be ignored. Further building on this is another idea that you need to satisfy your work rather than rooting for some dream passion. Satisfying the work that is given to you is a stepping stone towards building expertise and thus uncovering a deeper satisfaction of the work that you do.

It was also enlightening to see the tough regimen that aspiring sushi apprentices have to undergo before they can even handle fish often running into many years. Jiro’s son Yoshikazu a master Sushi Chef in his own right at 50 years still gets critiqued by his demanding father. The other takeaway was simplicity and the ability to develop an expertise in a niche area. There are no shortcuts and one has to commit to a lifetime of learning.

Finding a master is not easy. Why should anyone spend time to critique you? This is not a question that I have an answer for, just theories. Ultimately one needs to offer something of value. Finding a way to lighten your potential master’s load or solving a problem for him is one way to form a symbiotic relationship. The other motivation I think comes from great craftsmen themselves who want the craft to progress. In an environment like a Software company or even a Sushi bar like Jiro’s it is imperative to train apprentices to take upon future roles as craftsmen themselves. This is pure economics at play once the enterprise needs to run at scale.

There are times when apprentices get assigned dreary tasks. I think the most successful craftsmen have gone through this phase, tenaciously completing the assigned tasks, learning the domain along the way.Many problems are apparent only after doing the grunt work first.

I wish you a lifetime of learning. I would love to know  how you became a Better craftsman or even a Master Craftsman 🙂 ?

The Art of Debugging

It was a sunny day, I stepped in my car and started out for work. Within the first hundred feet of the car moving, I heard a muffled but regular “thup a thup a thup”. I stopped the music and it was still there, I pulled over and got out of the car. Did a quick visual, checked if my bumper was hitting the tires, saw nothing amiss and started again. The sound persisted “aargh!”. I pulled over again. I saw a elderly gentleman approaching, great luck. I lowered my windows and asked if he could see any obvious flaws as I would attempt to move the car slowly. Hoping for a affirmative, I was aghast when he gestured to me that he was extremely hard of hearing.I got down again and started looking at the tires again. Then there it was stuck on my front tire, some kind of a poly wrappng material. Each time the tire  rotated the portion that wasn’t stuck was hitting the area above the tire. I took it out and problem solved. I had just completed debugging my car. 

In fact debugging is very much a part of programming as it is in daily life. best advice I got out of college was “learn to debug”(a). It would be analogous to doctors figuring out what is wrong with a patient or a mechanic finding trouble with a car. The goals similar, but the objects and tools employed different. 

Java being my primary language I started out with these excellent eclipse debugger videos. http://eclipsetutorial.sourceforge.net/debugger.html

More recently I’ve been using winpdb(http://winpdb.org/) for my Python exploits.  These days I debug, a lot! I would think the best programmers will be extremely good at debugging as well.

Some of the techniques I’ve come to employ are:

1. Reduce the problem space.

2. Use print statements(for localized problems).

3. Use Debuggers  (which was my original intent in asking the elderly gentleman)

4. Logs: Logs give a long term state of the system when turned on. Of course its very easy to have them running into millions of lines which leads me to point 1. again.

The other use of debuggers I make is to understand the behaviour of unfamiliar programs. When a reading is not sufficient, It is quite handy to spin up the debugger and step through the program seeing the state of the data that it manipulates. I call it an art since no single technique is guaranteed to give a solution. How to debug problems consistently with a low time investment is certainly an art. I would love to know the techniques that you employ to debug programs and life in general.








a.What is Debugging? At its essence given a system with x as a known and expected behaviour and y as undesirable behaviour, the task of reconciling the difference and finding the root cause is debugging.