Business problems

When I am posed a question as to what is it that I do, my reply is unequivocally I solve business problems. In fact most craft’s ranging from watch making to ship-building are solving a business problem. I would define a business problem as one which upon being solved either increases profits or decreases expenses for your organization.

for example:
writing software that predicts inventory consumption rates for spacely sprockets.[increase profits]
writing software that helps reduce the number of  delivery trips that big retailer has to make. [reduce expenses]

Building software is a lot of fun as the business problems that are solved by software have the potential to be massively scalable. Lurking in your organization is a problem waiting to be solved, maybe a process that can be automated or a missing hook to an api that returns valuable data. In fact you are running a software company even if you aren’t.

It is easy to be caught up into a title of being an Software Engineer. In reality we are solving a business problem, writing software is just the tool we use to achieve it. So, my questions to you are:
1. what are the business problems that you are solving for your company?
2. what are the ones that exist but you haven’t solved yet?

Would love to hear back about both.

Focus

For a software craftsman, focus is essential to build quality software. Interrupts are the enemy of focus. While not all interrupts are avoidable like meetings, interviews or even a colleague with a question. There are certain things that can be controlled like your inbox and distractions like news, facebook, twitter or any other form of content readily available. The question is how? A friend suggested a system which has worked great for me. It’s called the Pomodoro.

All a Pomodoro is a way to break down work   in chunks of  25 minutes before you take a break. I use this nifty web application called moosti that helps me keep time.  Ultimately it is a mental hack that allows me to give permission to myself to be present and to focus on the task at hand. With the timer counting down I tend to close all other tabs.

The added benefit of this system is it allows  me to measure how productive my day was. I just need to count the number of successful Pomodoro chunks in the day. I would love to know about how you focus on your tasks?

 

Learn to Type Before You Learn to Code.

Once you put yourself on the path of radical self-improvement you start looking at the fundamentals. For me it was the realization that I couldn’t touch type. Having grown-up in the era of instant messaging,  I learned typing the wrong way. While I was a fast typist my finger placement was completely random not relying on the home row that the keyboard provides. Bad techniques are hard to eliminate once imbibed. Now this is a generic enough skill that you should  learn even if you aren’t a programmer given that you will type something.

So this winter I started working through drills on gtypist. Gtypist is a shell based utility that teaches you to type the proper way. Ratatype is  a web based typing tutor that works well too. This was time well invested as  now I can indeed touch-type and spend almost no time looking at the keyboard, thus boosting productivity.

The nice thing is once you start learning your tools you make use of the cues that the tools provide. In case of a standard qwerty keyboard I discovered the home row and the raised bars on the f and j keys. Those two bars alone give your fingers the entire map of the keyboard. This also highlights the need to slow down, be mindful and make use of the full potential that your tools offer. If I were to start all over again I would learn to type first.

If you liked this do read Being Craftsman the book. http://itunes.apple.com/us/book/id1139490631

The Best Books I Read in 2013

Here is a roundup of the best books that I read in 2013. I was inspired to write this post by the one written by Bill Gates http://www.thegatesnotes.com/Personal/Best-Books-2013.  I  learned something from each of these books.

1. How to Fail at Almost Everything and Still Win Big: Kind of the Story of My Life by Scott Adams.
http://www.amazon.com/How-Fail-Almost-Everything-Still-ebook/dp/B00COOFBA4/ref=dp_kinw_strp_1
This is an autobiography of Scott Adams of Dilbert fame where he outlines how he built his cartoonist career from his failed corporate career. The real value in this book is of learning how to think long term and craft systems that increase the probability of success.

2. Choose Yourself! – James Altucher
http://www.amazon.com/Choose-Yourself-James-Altucher/dp/1490313370/ref=sr_1_1?ie=UTF8&qid=1388292028&sr=8-1&keywords=james+altucher
James Altucher is an American hedge fund manager, entrepreneur. Choose yourself is an interesting take on the current world and how to succeed in it. This is also a case study of how he turned his life around, very impressive.

3. Anti-Fragile -Nassim Taleb
http://www.amazon.com/Antifragile-Things-That-Gain-Disorder/dp/1400067820/ref=sr_1_1?ie=UTF8&qid=1388292062&sr=8-1&keywords=antifragile
In Anti-Fragile, celebrated statistican and essayist Nassim Nicholas Taleb explores how uncertainity is to be embraced by looking at phenomena that gain from disorder.

4. Mastery – Robert Greene
In Mastery, Robert Greene explores the life and career’s of Masters like V.S Ramachandran,  Paul Graham and Michael Faraday etc. Gives a good insight into the way Master’s of a craft think and their journey towards Mastery.
http://www.amazon.com/Mastery-Robert-Greene/dp/014312417X/ref=sr_1_1?ie=UTF8&qid=1388292083&sr=8-1&keywords=mastery

I would love to know about the best books that you read this year.

How I built a weather decision engine? or The story of wearthejacket.com

So as I was stepping out of my apartment last week, I thought in california I really don’t care about the temperature outside, all I want to know is if it is cold enough to wear a jacket or warm enough not to. I decided to build a weather based decision engine which does just that, figure out where I am, check the temperature and give me a decision. I also wanted it to be blazing fast and scalable.
The end result was http://wearthejacket.com/[Update:shutting this service down today on October 9th 2014, after almost an year of  100% uptime. This was a fantastic learning experience]

The first step was to think how I could achieve this. After some thought I came up with this sketch.

architecture

architecture

I was aware of existing geolocation api’s  which translate an IP address to a location. That led me to
http://freegeoip.net/, which is a great api for a project like this with access to 10000 api calls per hour before getting throttled. This was more than sufficient for my needs.

The other component was getting the weather information. After googling for a bit, I came across forecast.io which though being a robust api had a free cap of 1000 calls per day and a nominal payment after that.
The hosting was on a AWS(Amazon web services) small ubuntu linux instance.
I decided to use tornado over apache mainly due to its low memory consumption during idle time and since I was going to write this in python. The decision to use Redis was a simple one as I would definitely need to cache some values as the end user requests came in.

This was the first pass I wrote.
1. User comes to webserver,
2. webserver queries geolocation api and obtains end users coordinates and location.
3. Based on the coordinates we query a weather api and
4. Finally take a decision based on the prevailing temperature. The decision algorithm being very simple based on a pre-set threshold below which wearing a jacket is advised. This is a todo for future  enhancement.

Note that I had not written the caching mechanism yet. There lies the fun part. At this point the decision was taking well over 1 second. Certainly not acceptable for a public facing web application.

Enter Redis. Redis is an in-memory key-value data store. That makes lookups blazing fast. The first thing that I needed to do was to cache the location information that we pulled from the geo-location api . The location information that the application needed were the latitude, longitude and actual city name. This made it a good candidate for the usage of a Redis Hash.

So the first mapping was

HMSET IP location val latitude val longitude val

Since this data will not change rapidly even with dynamic ips we can keep these mappings forever and over time as user queries come in build our own database of ip’s to locations.

For the weather information, I made an assumption that weather conditions will be similar over 10 mins at a given location[debatable, but will fulfill most needs.]

This is a simple redis key value pair location:apparentTemperature the only caveat being we want it to expire every 10 minutes(configurable).

This is done easily in Redis via the setex command, with the invocation

SETEX key <expiration_time_in_seconds> value.

Once the cache mechanism was in place the benchmarking showed dramatic improvements. Sub 30 ms response times after the first api call was made. The first api call to the application though was still remarkably slow. Then I started looking at individual api calls to the external services.

There lied the answer, the forecast.io api was spewing out an entire days worth of data.
The fix was to append the forecast.io api call with

?exclude=minutely,hourly,daily,alerts,flags

which had the effect of only giving back the current prevailing conditions.

Once this was done came the part to write tests. Not true Test driven development but I was’t launching without baseline tests. This part took me the longest time but greatly increased confidence in the code for launch. As of now it has run for over a day serving requests across the globe. Always write tests, preferably before even the first line of code.

Benchmarking after that indicated a theoretical capacity of 1.5 million requests/ day. Not bad for a tiny server, and the best part is that it can be horizontally scaled.[though I doubt I will do that considering it takes $$$ to keep servers running.]
The components are modular so that you use individual components. Would love to know your thoughts on how this project can be enhanced and or design decisions that you would make.

One more thing
Building this has been a great learning experience and to enable others to learn/critique
The source code is released under the GPLV3 license.
https://github.com/hvd/wearthejacket_oss

Rolling up data with Awk

One of the basic things that one does when dealing with numeric data sets is to add them up for some given attribute. Here is a subset of sample data of baseball statistics via http://seanlahman.com/baseball-archive/statistics/
The file used for the purpose of this post [Managers] is a list of Wins and losses by Baseball team managers from the late 1800’s to 2012. Lets try to roll up the wins and losses per manager to calculate the total wins and losses for each team manager. Do download the data file to see the raw data. (Note that .key files  are just csv so named to get around a wordpress restriction and therefore can be opened with a text editor/openoffice/excel)

How can this be done?
Early 21st century method:
Use Excel to calculate totals manually(sigh), write a macro if you are more adept.

2014 method:
Write a python program Use the csv library in python to read the file, then keep a dictionary of form {category1:{attrib1:val1, attrib2:val2….attribn:valn},category2:{attrib1:val1,attrib2:val2,attrib3:val3…attribn:valn}}
Then as you pass over each row update the sums for each attribute while checking if the category you are referencing exists, if not create a entry in the dictionary and repeat till end of the file.

Lets see another way to do this right out of the 1970’s:
Say hello to Awk.  Awk is an interpreted language designed specifically for extracting and manipulating text. Awk natively interprets tabular data. How cool is that? The nice part is awk is shipped with any standard linux/unix distribution. For those still in the windows world, installing cygwin will get you awk.

The anatomy of an awk program is simple: pattern {action} filename with optional BEGIN and END patterns which refer to actions preceding and after the file is read.

To roll up the wins and losses per team manager from the data file that we have,  we use a concept called associative array. Wait associative what? An Associative array is a data structure which can be indexed by anything(typically a string). While this may not seem any different than a python dictionary, the magic lies in the fact that this is applied across the file without any need for iterating over the file explicitly. Lets see the actual code that will do this. Save the following script as sum_wins_and_losses.awk and apply a chmod 755 so that it can execute.

#!/bin/awk -f
BEGIN{
   FS=",";
   OFS=",";
   total_wins[""]=0;
   total_losses[""]=0;
}
{
   manager=$1
   wins=$7;
   losses=$8;
}
{
   total_wins[manager]+=wins;
   total_losses[manager]+=losses;
}
END{
   print "manager,total_wins,total_losses"
   for (i in total_wins){
   if(i != "")
   {
   print i,total_wins[i],total_losses[i]
   }
  }
}

In the Begin block we define the field separator(FS) and the output field separator(OFS)  as a comma in addition to initializing arrays that we intend to use. The OFS determines how the data will be separated on output of the program.
By default awk interprets space separated files. Once the FS is established
you can refer to any column by its index ie. the first column of the data table can be referred to by $1, the second by $2 and so on. It is a good practice to assign these to variables .That enables you to make changes easily at a central point when there is a need to change the column position in the code. Typical use case would be to adapt the program for a file with additional columns, with the current columns appearing at different position’s.

The third block is where the magic begins, we index the arrays that we defined by the field that we want to roll up our data by.  In this case we use manager.

total_wins[manager]+=wins;

All that this snippet of code does is that if the manager is “foo”  the array bucket of total_wins indexed by “foo” will hold the total wins achieved by foo. This is so since the operation += wins is applied across the entire file and adds any wins achieved by foo to the same index. This is done for all unique managers and we are left with rolled up values of wins and losses by manager for the entire dataset.

Now for the finale , in the END block all we are doing is iterating over the indexed associative array and spewing out the rolled up data. This will be to the console.

The actual program can be executed by invoking the following snippet which redirects the output to a file.

awk -f sum_wins_and_losses.awk Managers.key >rolled_up_file.key

Open the rolled_up_file to see total Wins and Losses by the manager. Next time you are faced with manipulating tabular data, think awk!

References:

1. http://www.grymoire.com/Unix/Awk.html.

2.http://en.wikipedia.org/wiki/AWK

Tools

If all you have is a hammer, everything looks like a nail -Abraham Maslow

Building quality software relies on a myriad of tools.  One essential pillar of craftsmanship software or otherwise is in the mastery of tools that you work with. Some tools that I think are important for a software craftsman:

The Mind:
The most important and essential tool that you have is your mind.
All software originates as a thought so having a clear thought process is necessary.
Unfortunately we have become accustomed to google, facebook and twitter. Being mindful of what you are trying to achieve is important. There is a quote from the TV show Sherlock that has stuck with me “People fill their heads with all kinds of rubbish. And that makes it hard to get at the stuff that matters. Do you see?” Treat your mind as a garden and only let in thoughts that should be nurtured, getting rid of weeds is essential.

The Operating System
I use xubuntu linux, which to me offers simplicity and power. Using Linux forces me to learn more about the underlying machine itself.
We must remember for all the clouds that are now available, ultimately it is a group of computers connected to a network. Quoting Larry Ellison “Google does not run on water vapor”. The choice of operating system that you go with will have a effect on how much of the underlying system you understand. Macs and Windows will abstract a lot of the underlying machine, which if you are an application developer may not be a bad thing  in terms of productivity gains. However gaining knowledge of underlying processes will be harder.

The Programming language:
There are a plethora of programming languages available and to persist with the thought that all languages are made equal is a fallacy.
Using the right language for the job at hand will go a long way in building successful products. Learning a different language than one which you use on a daily basis will create new ways of thinking. For instance try writing to a file first in Java, then in Python. While I am not getting in the argument of which is better, I do want to emphasize that languages have there own strengths.To deal with data in flat files, you are missing out if you do not use the trinity of awk,sed and grep(sed and grep being command line tools). They will provide in simplicitly for which you would be writing programs of hundreds of lines in a high level language.

The Text Editor
From notepad to vim to sublime text to even a IDE. The text editor is where you translate your thought to code. Structured information that can make computers do what you want it to do. Mastery of the text editor will determine how long it takes you to write code assuming that you can type.

Version Control:
Any Software that is not one time use should be maintained in a version control system. Git is my favorite, however depending on your work environment you could end up using svn or perforce. git has many subtleties that will be apparent with practice, know your git and never worry about losing source code.

The Debugger
While best avoided since you do want your mind to be the compiler, debuggers aid understanding both the program and data flow in complex projects. Mastering the debugger will go a long way in solving bugs in programs that you are unfamiliar with and sometimes in code that is familiar.

Databases
Databases are the foundation of the applications that we build.
We have SQL and not only SQL. For structured data SQL based databases are still golden.
Since real data is not always structured, that has forced the move to NOSQL databases.
Redis is the swiss-knife of data in key-value pairs. Learn about couchbase or mongodb for exposure to document based databases.

This list of tools is in no way comprehensive, Building mastery of tools is a long and arduous process. However taking tiny steps today go a long way and add up in a few years. More often than not the problem you are trying to solve will drive the tools that you will use. So try to solve as many different problems as you can, slowly but surely you will see a expanding toolkit.

The path I have laid for myself is to always be learning and teach what I  learn.  I would love to know your experiences in mastering the tools that you work with and ultimately in mastering your craft.

Two Questions for the Software Craftsman

To make a career move as a new Software Craftsman there are two questions you definitely need to ask :

1. Is it your time to learn or time to earn?
2. what are you willing to get good at?

I was lucky enough to ask both questions that originated at different sources, but then I do embrace randomness.  It is still my time to learn. What is interesting is once you are in the workforce, earn and learn are not mutually exclusive. So go ahead make the most of it. This is an opportunity for you to find out what do you really want to do. Once you decide that, don’t look back and be willing to invest the time it takes to become master of your craft.

Embrace Randomness

I think what makes life interesting is not predictability but randomness, I will give an example, On  Superbowl day  I met couple of friends at Crepevine in Palo Alo for brunch. While going back I took a wrong exit and ended up in the Stanford Campus. Now the roundabout there is almost a mile inside. Upon taking the U turn I noticed a museum midway that I had not on my earlier visits there.I pulled in and it turned out it was a museum established by none other than the Stanford family itself. I parked my car and went inside to see one of the best collection of artifacts and art that I have ever seen. (Added bonus entry and parking was free).So a wrong exit made my day richer.  Chance has a greater role in life than we like to admit.

Relating randomness to the Software world may seem strange but bear with me. The idea I seek to present is that random explorations may lead to serendipitous occurrences. Exploring things that you may not have encountered before goes a long way towards developing skills or adding new tools to your toolkit. For instance if you are a imperative style programmer, learning a functional language like Erlang  will expose you to a different style of thinking. If you  work with Mysql  then learning about a NoSql database like Redis will expose you to how unstructured data can be handled. It is of course difficult to predict how these random walks may be of immediate benefit but I think the real value lies in relating the disparate knowledge to a common thread. I think randomness is also what makes the Silicon Valley an interesting place to be. If you are a technology professional in the bay area chance encounters can lead to new opportunities or even just expose you to another dimension of the technology industry that you didn’t know existed.

One of the best ways to introduce randomness to your surfing habits is Stumbleupon, you can think of it as an gateway to the Internet which will recommend webpages based upon your interests. I came across PreyProject via stumbleupon where I went on to make my first ever open-source contributions which led me to eventually work at my current position. The other source for randomness I rely upon is hacker news.  In real life make use of meetup[not so random I admit]  or your favorite programming language’s user group[believe me even the most obscure language will have one here]. I think in case of real life the randomness begins once you show up which is not limited to showing up physically but also taking up projects that you care about. You like the girl you just met? Ask her out. So you want to write that killer app? Start working on it then, only then can randomness help you.

I think the charter I have laid down for myself is to embrace more randomness by showing up viz. building things I care about thus encountering problems that I haven’t seen before and in life in general. While I cannot vouch for all outcomes, I certainly hope it will be a interesting journey.