The Apprentice

To be a master, one must first be an apprentice. In the modern world,  typically your first job is where the apprenticeship begins. I would prefer working on a craft much earlier than that. But, we are limited by the systems created around us like school and university. With medieval guilds no longer around, we have to craft our own apprenticeship path. The first challenge is finding masters.  Some places you will find them are Universities, Companies and Open Source projects. The software industry as dynamic it is will not afford you the luxury of one master. People move on or you do. So be comfortable with the idea of having many masters over the course of your apprenticeship.

The big mistake I made when I was younger is to think that the golden key to opportunity is a degree from an elite college. A degree is just a tool that can open certain doors and maybe give you a head start. So don’t lose hope in case you didn’t go to one. The real world only cares if you can solve a business problem.

There are in fact two keys to opportunity, curiosity and a hunger to learn. Ask if it is your time to earn or time to learn? Most of the times it will be a time to learn. To learn one must have a beginner’s mind. Use your curiosity to learn things that you like. Don’t Panic if things don’t make sense, everyone begins somewhere. Learn from books and the internet, the greatest learning tool of our time. Teach what you learn. Leverage what you have learned to get in a guild[read company or open source project] that aligns with your curiosity. This creates a feedback mechanism for you. Learn from everyone, your peers, your superiors and even the interns. Then let the magic begin, with the right people around you will learn exponentially faster and have a meaningful apprenticeship. In the path to mastery, the journey is the destination. I would love to know about the path that you have taken.

Business problems

When I am posed a question as to what is it that I do, my reply is unequivocally I solve business problems. In fact most craft’s ranging from watch making to ship-building are solving a business problem. I would define a business problem as one which upon being solved either increases profits or decreases expenses for your organization.

for example:
writing software that predicts inventory consumption rates for spacely sprockets.[increase profits]
writing software that helps reduce the number of  delivery trips that big retailer has to make. [reduce expenses]

Building software is a lot of fun as the business problems that are solved by software have the potential to be massively scalable. Lurking in your organization is a problem waiting to be solved, maybe a process that can be automated or a missing hook to an api that returns valuable data. In fact you are running a software company even if you aren’t.

It is easy to be caught up into a title of being an Software Engineer. In reality we are solving a business problem, writing software is just the tool we use to achieve it. So, my questions to you are:
1. what are the business problems that you are solving for your company?
2. what are the ones that exist but you haven’t solved yet?

Would love to hear back about both.


For a software craftsman, focus is essential to build quality software. Interrupts are the enemy of focus. While not all interrupts are avoidable like meetings, interviews or even a colleague with a question. There are certain things that can be controlled like your inbox and distractions like news, facebook, twitter or any other form of content readily available. The question is how? A friend suggested a system which has worked great for me. It’s called the Pomodoro.

All a Pomodoro is a way to break down work   in chunks of  25 minutes before you take a break. I use this nifty web application called moosti that helps me keep time.  Ultimately it is a mental hack that allows me to give permission to myself to be present and to focus on the task at hand. With the timer counting down I tend to close all other tabs.

The added benefit of this system is it allows  me to measure how productive my day was. I just need to count the number of successful Pomodoro chunks in the day. I would love to know about how you focus on your tasks?


Learn to Type Before You Learn to Code.

Once you put yourself on the path of radical self-improvement you start looking at the fundamentals. For me it was the realization that I couldn’t touch type. Having grown-up in the era of instant messaging,  I learned typing the wrong way. While I was a fast typist my finger placement was completely random not relying on the home row that the keyboard provides. Bad techniques are hard to eliminate once imbibed. Now this is a generic enough skill that you should  learn even if you aren’t a programmer given that you will type something.

So this winter I started working through drills on gtypist. Gtypist is a shell based utility that teaches you to type the proper way. Ratatype is  a web based typing tutor that works well too. This was time well invested as  now I can indeed touch-type and spend almost no time looking at the keyboard, thus boosting productivity.

The nice thing is once you start learning your tools you make use of the cues that the tools provide. In case of a standard qwerty keyboard I discovered the home row and the raised bars on the f and j keys. Those two bars alone give your fingers the entire map of the keyboard. This also highlights the need to slow down, be mindful and make use of the full potential that your tools offer. If I were to start all over again I would learn to type first.

If you liked this do read Being Craftsman the book.

The Best Books I Read in 2013

Here is a roundup of the best books that I read in 2013. I was inspired to write this post by the one written by Bill Gates  I  learned something from each of these books.

1. How to Fail at Almost Everything and Still Win Big: Kind of the Story of My Life by Scott Adams.
This is an autobiography of Scott Adams of Dilbert fame where he outlines how he built his cartoonist career from his failed corporate career. The real value in this book is of learning how to think long term and craft systems that increase the probability of success.

2. Choose Yourself! – James Altucher
James Altucher is an American hedge fund manager, entrepreneur. Choose yourself is an interesting take on the current world and how to succeed in it. This is also a case study of how he turned his life around, very impressive.

3. Anti-Fragile -Nassim Taleb
In Anti-Fragile, celebrated statistican and essayist Nassim Nicholas Taleb explores how uncertainity is to be embraced by looking at phenomena that gain from disorder.

4. Mastery – Robert Greene
In Mastery, Robert Greene explores the life and career’s of Masters like V.S Ramachandran,  Paul Graham and Michael Faraday etc. Gives a good insight into the way Master’s of a craft think and their journey towards Mastery.

I would love to know about the best books that you read this year.

How I built a weather decision engine? or The story of

So as I was stepping out of my apartment last week, I thought in california I really don’t care about the temperature outside, all I want to know is if it is cold enough to wear a jacket or warm enough not to. I decided to build a weather based decision engine which does just that, figure out where I am, check the temperature and give me a decision. I also wanted it to be blazing fast and scalable.
The end result was[Update:shutting this service down today on October 9th 2014, after almost an year of  100% uptime. This was a fantastic learning experience]

The first step was to think how I could achieve this. After some thought I came up with this sketch.



I was aware of existing geolocation api’s  which translate an IP address to a location. That led me to, which is a great api for a project like this with access to 10000 api calls per hour before getting throttled. This was more than sufficient for my needs.

The other component was getting the weather information. After googling for a bit, I came across which though being a robust api had a free cap of 1000 calls per day and a nominal payment after that.
The hosting was on a AWS(Amazon web services) small ubuntu linux instance.
I decided to use tornado over apache mainly due to its low memory consumption during idle time and since I was going to write this in python. The decision to use Redis was a simple one as I would definitely need to cache some values as the end user requests came in.

This was the first pass I wrote.
1. User comes to webserver,
2. webserver queries geolocation api and obtains end users coordinates and location.
3. Based on the coordinates we query a weather api and
4. Finally take a decision based on the prevailing temperature. The decision algorithm being very simple based on a pre-set threshold below which wearing a jacket is advised. This is a todo for future  enhancement.

Note that I had not written the caching mechanism yet. There lies the fun part. At this point the decision was taking well over 1 second. Certainly not acceptable for a public facing web application.

Enter Redis. Redis is an in-memory key-value data store. That makes lookups blazing fast. The first thing that I needed to do was to cache the location information that we pulled from the geo-location api . The location information that the application needed were the latitude, longitude and actual city name. This made it a good candidate for the usage of a Redis Hash.

So the first mapping was

HMSET IP location val latitude val longitude val

Since this data will not change rapidly even with dynamic ips we can keep these mappings forever and over time as user queries come in build our own database of ip’s to locations.

For the weather information, I made an assumption that weather conditions will be similar over 10 mins at a given location[debatable, but will fulfill most needs.]

This is a simple redis key value pair location:apparentTemperature the only caveat being we want it to expire every 10 minutes(configurable).

This is done easily in Redis via the setex command, with the invocation

SETEX key <expiration_time_in_seconds> value.

Once the cache mechanism was in place the benchmarking showed dramatic improvements. Sub 30 ms response times after the first api call was made. The first api call to the application though was still remarkably slow. Then I started looking at individual api calls to the external services.

There lied the answer, the api was spewing out an entire days worth of data.
The fix was to append the api call with


which had the effect of only giving back the current prevailing conditions.

Once this was done came the part to write tests. Not true Test driven development but I was’t launching without baseline tests. This part took me the longest time but greatly increased confidence in the code for launch. As of now it has run for over a day serving requests across the globe. Always write tests, preferably before even the first line of code.

Benchmarking after that indicated a theoretical capacity of 1.5 million requests/ day. Not bad for a tiny server, and the best part is that it can be horizontally scaled.[though I doubt I will do that considering it takes $$$ to keep servers running.]
The components are modular so that you use individual components. Would love to know your thoughts on how this project can be enhanced and or design decisions that you would make.

One more thing
Building this has been a great learning experience and to enable others to learn/critique
The source code is released under the GPLV3 license.

Rolling up data with Awk

One of the basic things that one does when dealing with numeric data sets is to add them up for some given attribute. Here is a subset of sample data of baseball statistics via
The file used for the purpose of this post [Managers] is a list of Wins and losses by Baseball team managers from the late 1800’s to 2012. Lets try to roll up the wins and losses per manager to calculate the total wins and losses for each team manager. Do download the data file to see the raw data. (Note that .key files  are just csv so named to get around a wordpress restriction and therefore can be opened with a text editor/openoffice/excel)

How can this be done?
Early 21st century method:
Use Excel to calculate totals manually(sigh), write a macro if you are more adept.

2014 method:
Write a python program Use the csv library in python to read the file, then keep a dictionary of form {category1:{attrib1:val1, attrib2:val2….attribn:valn},category2:{attrib1:val1,attrib2:val2,attrib3:val3…attribn:valn}}
Then as you pass over each row update the sums for each attribute while checking if the category you are referencing exists, if not create a entry in the dictionary and repeat till end of the file.

Lets see another way to do this right out of the 1970’s:
Say hello to Awk.  Awk is an interpreted language designed specifically for extracting and manipulating text. Awk natively interprets tabular data. How cool is that? The nice part is awk is shipped with any standard linux/unix distribution. For those still in the windows world, installing cygwin will get you awk.

The anatomy of an awk program is simple: pattern {action} filename with optional BEGIN and END patterns which refer to actions preceding and after the file is read.

To roll up the wins and losses per team manager from the data file that we have,  we use a concept called associative array. Wait associative what? An Associative array is a data structure which can be indexed by anything(typically a string). While this may not seem any different than a python dictionary, the magic lies in the fact that this is applied across the file without any need for iterating over the file explicitly. Lets see the actual code that will do this. Save the following script as sum_wins_and_losses.awk and apply a chmod 755 so that it can execute.

#!/bin/awk -f
   print "manager,total_wins,total_losses"
   for (i in total_wins){
   if(i != "")
   print i,total_wins[i],total_losses[i]

In the Begin block we define the field separator(FS) and the output field separator(OFS)  as a comma in addition to initializing arrays that we intend to use. The OFS determines how the data will be separated on output of the program.
By default awk interprets space separated files. Once the FS is established
you can refer to any column by its index ie. the first column of the data table can be referred to by $1, the second by $2 and so on. It is a good practice to assign these to variables .That enables you to make changes easily at a central point when there is a need to change the column position in the code. Typical use case would be to adapt the program for a file with additional columns, with the current columns appearing at different position’s.

The third block is where the magic begins, we index the arrays that we defined by the field that we want to roll up our data by.  In this case we use manager.


All that this snippet of code does is that if the manager is “foo”  the array bucket of total_wins indexed by “foo” will hold the total wins achieved by foo. This is so since the operation += wins is applied across the entire file and adds any wins achieved by foo to the same index. This is done for all unique managers and we are left with rolled up values of wins and losses by manager for the entire dataset.

Now for the finale , in the END block all we are doing is iterating over the indexed associative array and spewing out the rolled up data. This will be to the console.

The actual program can be executed by invoking the following snippet which redirects the output to a file.

awk -f sum_wins_and_losses.awk Managers.key >rolled_up_file.key

Open the rolled_up_file to see total Wins and Losses by the manager. Next time you are faced with manipulating tabular data, think awk!





If all you have is a hammer, everything looks like a nail -Abraham Maslow

Building quality software relies on a myriad of tools.  One essential pillar of craftsmanship software or otherwise is in the mastery of tools that you work with. Some tools that I think are important for a software craftsman:

The Mind:
The most important and essential tool that you have is your mind.
All software originates as a thought so having a clear thought process is necessary.
Unfortunately we have become accustomed to google, facebook and twitter. Being mindful of what you are trying to achieve is important. There is a quote from the TV show Sherlock that has stuck with me “People fill their heads with all kinds of rubbish. And that makes it hard to get at the stuff that matters. Do you see?” Treat your mind as a garden and only let in thoughts that should be nurtured, getting rid of weeds is essential.

The Operating System
I use xubuntu linux, which to me offers simplicity and power. Using Linux forces me to learn more about the underlying machine itself.
We must remember for all the clouds that are now available, ultimately it is a group of computers connected to a network. Quoting Larry Ellison “Google does not run on water vapor”. The choice of operating system that you go with will have a effect on how much of the underlying system you understand. Macs and Windows will abstract a lot of the underlying machine, which if you are an application developer may not be a bad thing  in terms of productivity gains. However gaining knowledge of underlying processes will be harder.

The Programming language:
There are a plethora of programming languages available and to persist with the thought that all languages are made equal is a fallacy.
Using the right language for the job at hand will go a long way in building successful products. Learning a different language than one which you use on a daily basis will create new ways of thinking. For instance try writing to a file first in Java, then in Python. While I am not getting in the argument of which is better, I do want to emphasize that languages have there own strengths.To deal with data in flat files, you are missing out if you do not use the trinity of awk,sed and grep(sed and grep being command line tools). They will provide in simplicitly for which you would be writing programs of hundreds of lines in a high level language.

The Text Editor
From notepad to vim to sublime text to even a IDE. The text editor is where you translate your thought to code. Structured information that can make computers do what you want it to do. Mastery of the text editor will determine how long it takes you to write code assuming that you can type.

Version Control:
Any Software that is not one time use should be maintained in a version control system. Git is my favorite, however depending on your work environment you could end up using svn or perforce. git has many subtleties that will be apparent with practice, know your git and never worry about losing source code.

The Debugger
While best avoided since you do want your mind to be the compiler, debuggers aid understanding both the program and data flow in complex projects. Mastering the debugger will go a long way in solving bugs in programs that you are unfamiliar with and sometimes in code that is familiar.

Databases are the foundation of the applications that we build.
We have SQL and not only SQL. For structured data SQL based databases are still golden.
Since real data is not always structured, that has forced the move to NOSQL databases.
Redis is the swiss-knife of data in key-value pairs. Learn about couchbase or mongodb for exposure to document based databases.

This list of tools is in no way comprehensive, Building mastery of tools is a long and arduous process. However taking tiny steps today go a long way and add up in a few years. More often than not the problem you are trying to solve will drive the tools that you will use. So try to solve as many different problems as you can, slowly but surely you will see a expanding toolkit.

The path I have laid for myself is to always be learning and teach what I  learn.  I would love to know your experiences in mastering the tools that you work with and ultimately in mastering your craft.

Two Questions for the Software Craftsman

To make a career move as a new Software Craftsman there are two questions you definitely need to ask :

1. Is it your time to learn or time to earn?
2. what are you willing to get good at?

I was lucky enough to ask both questions that originated at different sources, but then I do embrace randomness.  It is still my time to learn. What is interesting is once you are in the workforce, earn and learn are not mutually exclusive. So go ahead make the most of it. This is an opportunity for you to find out what do you really want to do. Once you decide that, don’t look back and be willing to invest the time it takes to become master of your craft.