May 29, 2017 Uncategorized Leave a comment

Why I love to Swim

by hkelkar

I enter the water, feel the change in temperature and dunk inside. Then I swim, propelling forward as I exert my body in this amazing medium that is water. Swimming is a way to summon my attention. Attention which is constantly stolen by the internet, by phones, by people. Swimming is a way to clear clutter from the mind. A garbage collector for the mind.

Each breath utilized to move forward. Each action taken with the purpose of floatation or movement. Swimming is meditation. Swimming is a reminder of life with each breath. Swimming is a reminder of death with each dive. Swimming is about living in the present and being in the moment.A good swim is energizing. That is why I love to swim.

December 31, 2015 Uncategorized Leave a comment

Being Craftsman – The Book

by hkelkar

Hello all after some radio silence, I am pleased to announce the launch of Being Craftsman

– A software developers runbook.

Enjoy and let me know your thoughts.

p.s here is an alternative pdf copy if you prefer that:

Being_Craftsman

October 18, 2014 Uncategorized Leave a comment

why I love Tennis?

by hkelkar

Tennis is civilized battle with the sword replaced with a racket. Tennis is about observing your opponent for that moment of hesitation. Tennis is about coming up from behind when you are down a couple of sets to win the match. Tennis is about focus and silencing your inner voice. Tennis is about living in the moment. Tennis is about believing that the next serve will be an ace, the forehand you hit will be a winner and when you barely get the ball across the net, you will stay alive to mount a counterattack. Tennis is about long term strategy and short term tactics.

Tennis is about asking questions, Can I best my opponent with a fierce forehand or a blazing serve? Or can he get the better of me in that time? Do I serve and volley or do I play from the baseline? Did that extra power on the return unsettle him? It is about adapting to your opponent and employing the right tools. There are many tools that can be applied: the serve, the forehand, the back-hand, the slice, the lob etc. It is about asking what new tools can I employ? Tennis is about statistical thinking, what are the odds of me making this shot? Okay I admit I’m not good at this. There is this constant process of learning that goes on. Tennis is about making decisions in a split second with incomplete information.

Tennis is about showmanship and making elegant shots. I relish the crack of the ball hitting the racket on a shot and the silence of a well executed serve. Tennis is about rooting for your opponent to make that hard return because it adds to the fun. A good game of tennis is energizing. Playing tennis is about loving the battle more than winning and that is why I love tennis.

October 12, 2014 Uncategorized 2 Comments

The Apprentice

by hkelkar

To be a master, one must first be an apprentice. In the modern world, typically your first job is where the apprenticeship begins. I would prefer working on a craft much earlier than that. But, we are limited by the systems created around us like school and university. With medieval guilds no longer around, we have to craft our own apprenticeship path. The first challenge is finding masters. Some places you will find them are Universities, Companies and Open Source projects. The software industry as dynamic it is will not afford you the luxury of one master. People move on or you do. So be comfortable with the idea of having many masters over the course of your apprenticeship.

The big mistake I made when I was younger is to think that the golden key to opportunity is a degree from an elite college. A degree is just a tool that can open certain doors and maybe give you a head start. So don’t lose hope in case you didn’t go to one. The real world only cares if you can solve a business problem.

There are in fact two keys to opportunity, curiosity and a hunger to learn. Ask if it is your time to earn or time to learn? Most of the times it will be a time to learn. To learn one must have a beginner’s mind. Use your curiosity to learn things that you like. Don’t Panic if things don’t make sense, everyone begins somewhere. Learn from books and the internet, the greatest learning tool of our time. Teach what you learn. Leverage what you have learned to get in a guild[read company or open source project] that aligns with your curiosity. This creates a feedback mechanism for you. Learn from everyone, your peers, your superiors and even the interns. Then let the magic begin, with the right people around you will learn exponentially faster and have a meaningful apprenticeship. In the path to mastery, the journey is the destination. I would love to know about the path that you have taken.

June 9, 2014 Uncategorized Leave a comment

Business problems

by hkelkar

When I am posed a question as to what is it that I do, my reply is unequivocally I solve business problems. In fact most craft’s ranging from watch making to ship-building are solving a business problem. I would define a business problem as one which upon being solved either increases profits or decreases expenses for your organization.

for example:
writing software that predicts inventory consumption rates for spacely sprockets.[increase profits]
writing software that helps reduce the number of delivery trips that big retailer has to make. [reduce expenses]

Building software is a lot of fun as the business problems that are solved by software have the potential to be massively scalable. Lurking in your organization is a problem waiting to be solved, maybe a process that can be automated or a missing hook to an api that returns valuable data. In fact you are running a software company even if you aren’t.

It is easy to be caught up into a title of being an Software Engineer. In reality we are solving a business problem, writing software is just the tool we use to achieve it. So, my questions to you are:
1. what are the business problems that you are solving for your company?
2. what are the ones that exist but you haven’t solved yet?

Would love to hear back about both.

Video March 30, 2014 Uncategorized Leave a comment

Computing Document Similarity with nltk

by hkelkar

We will explore techniques to determine the amount of similarity between documents. Specifically we will look at the intuition behind tf-idf and cosine similarity. With that as a foundation we will see how to compute these metrics with the natural language tool kit.

March 17, 2014 Uncategorized Leave a comment

Focus

by hkelkar

For a software craftsman, focus is essential to build quality software. Interrupts are the enemy of focus. While not all interrupts are avoidable like meetings, interviews or even a colleague with a question. There are certain things that can be controlled like your inbox and distractions like news, facebook, twitter or any other form of content readily available. The question is how? A friend suggested a system which has worked great for me. It’s called the Pomodoro.

All a Pomodoro is a way to break down work in chunks of 25 minutes before you take a break. I use this nifty web application called moosti that helps me keep time. Ultimately it is a mental hack that allows me to give permission to myself to be present and to focus on the task at hand. With the timer counting down I tend to close all other tabs.

The added benefit of this system is it allows me to measure how productive my day was. I just need to count the number of successful Pomodoro chunks in the day. I would love to know about how you focus on your tasks?

March 2, 2014 Uncategorized 2 Comments

Learn to Type Before You Learn to Code.

by hkelkar

Once you put yourself on the path of radical self-improvement you start looking at the fundamentals. For me it was the realization that I couldn’t touch type. Having grown-up in the era of instant messaging, I learned typing the wrong way. While I was a fast typist my finger placement was completely random not relying on the home row that the keyboard provides. Bad techniques are hard to eliminate once imbibed. Now this is a generic enough skill that you should learn even if you aren’t a programmer given that you will type something.

So this winter I started working through drills on gtypist. Gtypist is a shell based utility that teaches you to type the proper way. Ratatype is a web based typing tutor that works well too. This was time well invested as now I can indeed touch-type and spend almost no time looking at the keyboard, thus boosting productivity.

The nice thing is once you start learning your tools you make use of the cues that the tools provide. In case of a standard qwerty keyboard I discovered the home row and the raised bars on the f and j keys. Those two bars alone give your fingers the entire map of the keyboard. This also highlights the need to slow down, be mindful and make use of the full potential that your tools offer. If I were to start all over again I would learn to type first.

If you liked this do read Being Craftsman the book. http://itunes.apple.com/us/book/id1139490631

October 27, 2013 Uncategorized 2 Comments

How I built a weather decision engine? or The story of wearthejacket.com

by hkelkar

So as I was stepping out of my apartment last week, I thought in california I really don’t care about the temperature outside, all I want to know is if it is cold enough to wear a jacket or warm enough not to. I decided to build a weather based decision engine which does just that, figure out where I am, check the temperature and give me a decision. I also wanted it to be blazing fast and scalable.
The end result was http://wearthejacket.com/[Update:shutting this service down today on October 9th 2014, after almost an year of 100% uptime. This was a fantastic learning experience]

The first step was to think how I could achieve this. After some thought I came up with this sketch.

architecture

I was aware of existing geolocation api’s which translate an IP address to a location. That led me to
http://freegeoip.net/, which is a great api for a project like this with access to 10000 api calls per hour before getting throttled. This was more than sufficient for my needs.

The other component was getting the weather information. After googling for a bit, I came across forecast.io which though being a robust api had a free cap of 1000 calls per day and a nominal payment after that.
The hosting was on a AWS(Amazon web services) small ubuntu linux instance.
I decided to use tornado over apache mainly due to its low memory consumption during idle time and since I was going to write this in python. The decision to use Redis was a simple one as I would definitely need to cache some values as the end user requests came in.

This was the first pass I wrote.
1. User comes to webserver,
2. webserver queries geolocation api and obtains end users coordinates and location.
3. Based on the coordinates we query a weather api and
4. Finally take a decision based on the prevailing temperature. The decision algorithm being very simple based on a pre-set threshold below which wearing a jacket is advised. This is a todo for future enhancement.

Note that I had not written the caching mechanism yet. There lies the fun part. At this point the decision was taking well over 1 second. Certainly not acceptable for a public facing web application.

Enter Redis. Redis is an in-memory key-value data store. That makes lookups blazing fast. The first thing that I needed to do was to cache the location information that we pulled from the geo-location api . The location information that the application needed were the latitude, longitude and actual city name. This made it a good candidate for the usage of a Redis Hash.

So the first mapping was

HMSET IP location val latitude val longitude val

Since this data will not change rapidly even with dynamic ips we can keep these mappings forever and over time as user queries come in build our own database of ip’s to locations.

For the weather information, I made an assumption that weather conditions will be similar over 10 mins at a given location[debatable, but will fulfill most needs.]

This is a simple redis key value pair location:apparentTemperature the only caveat being we want it to expire every 10 minutes(configurable).

This is done easily in Redis via the setex command, with the invocation

SETEX key <expiration_time_in_seconds> value.

Once the cache mechanism was in place the benchmarking showed dramatic improvements. Sub 30 ms response times after the first api call was made. The first api call to the application though was still remarkably slow. Then I started looking at individual api calls to the external services.

There lied the answer, the forecast.io api was spewing out an entire days worth of data.
The fix was to append the forecast.io api call with

?exclude=minutely,hourly,daily,alerts,flags

which had the effect of only giving back the current prevailing conditions.

Once this was done came the part to write tests. Not true Test driven development but I was’t launching without baseline tests. This part took me the longest time but greatly increased confidence in the code for launch. As of now it has run for over a day serving requests across the globe. Always write tests, preferably before even the first line of code.

Benchmarking after that indicated a theoretical capacity of 1.5 million requests/ day. Not bad for a tiny server, and the best part is that it can be horizontally scaled.[though I doubt I will do that considering it takes $$$ to keep servers running.]
The components are modular so that you use individual components. Would love to know your thoughts on how this project can be enhanced and or design decisions that you would make.

One more thing
Building this has been a great learning experience and to enable others to learn/critique
The source code is released under the GPLV3 license.
https://github.com/hvd/wearthejacket_oss

October 15, 2013 Uncategorized data sets 5 Comments

Rolling up data with Awk

by hkelkar

One of the basic things that one does when dealing with numeric data sets is to add them up for some given attribute. Here is a subset of sample data of baseball statistics via http://seanlahman.com/baseball-archive/statistics/
The file used for the purpose of this post [Managers] is a list of Wins and losses by Baseball team managers from the late 1800’s to 2012. Lets try to roll up the wins and losses per manager to calculate the total wins and losses for each team manager. Do download the data file to see the raw data. (Note that .key files are just csv so named to get around a wordpress restriction and therefore can be opened with a text editor/openoffice/excel)

How can this be done?
Early 21st century method:
Use Excel to calculate totals manually(sigh), write a macro if you are more adept.

2014 method:
Write a python program Use the csv library in python to read the file, then keep a dictionary of form {category1:{attrib1:val1, attrib2:val2….attribn:valn},category2:{attrib1:val1,attrib2:val2,attrib3:val3…attribn:valn}}
Then as you pass over each row update the sums for each attribute while checking if the category you are referencing exists, if not create a entry in the dictionary and repeat till end of the file.

Lets see another way to do this right out of the 1970’s:
Say hello to Awk. Awk is an interpreted language designed specifically for extracting and manipulating text. Awk natively interprets tabular data. How cool is that? The nice part is awk is shipped with any standard linux/unix distribution. For those still in the windows world, installing cygwin will get you awk.

The anatomy of an awk program is simple: pattern {action} filename with optional BEGIN and END patterns which refer to actions preceding and after the file is read.

To roll up the wins and losses per team manager from the data file that we have, we use a concept called associative array. Wait associative what? An Associative array is a data structure which can be indexed by anything(typically a string). While this may not seem any different than a python dictionary, the magic lies in the fact that this is applied across the file without any need for iterating over the file explicitly. Lets see the actual code that will do this. Save the following script as sum_wins_and_losses.awk and apply a chmod 755 so that it can execute.

#!/bin/awk -f
BEGIN{
   FS=",";
   OFS=",";
   total_wins[""]=0;
   total_losses[""]=0;
}
{
   manager=$1
   wins=$7;
   losses=$8;
}
{
   total_wins[manager]+=wins;
   total_losses[manager]+=losses;
}
END{
   print "manager,total_wins,total_losses"
   for (i in total_wins){
   if(i != "")
   {
   print i,total_wins[i],total_losses[i]
   }
  }
}

In the Begin block we define the field separator(FS) and the output field separator(OFS) as a comma in addition to initializing arrays that we intend to use. The OFS determines how the data will be separated on output of the program.
By default awk interprets space separated files. Once the FS is established
you can refer to any column by its index ie. the first column of the data table can be referred to by $1, the second by $2 and so on. It is a good practice to assign these to variables .That enables you to make changes easily at a central point when there is a need to change the column position in the code. Typical use case would be to adapt the program for a file with additional columns, with the current columns appearing at different position’s.

The third block is where the magic begins, we index the arrays that we defined by the field that we want to roll up our data by. In this case we use manager.

total_wins[manager]+=wins;

All that this snippet of code does is that if the manager is “foo” the array bucket of total_wins indexed by “foo” will hold the total wins achieved by foo. This is so since the operation += wins is applied across the entire file and adds any wins achieved by foo to the same index. This is done for all unique managers and we are left with rolled up values of wins and losses by manager for the entire dataset.

Now for the finale , in the END block all we are doing is iterating over the indexed associative array and spewing out the rolled up data. This will be to the console.

The actual program can be executed by invoking the following snippet which redirects the output to a file.

awk -f sum_wins_and_losses.awk Managers.key >rolled_up_file.key

Open the rolled_up_file to see total Wins and Losses by the manager. Next time you are faced with manipulating tabular data, think awk!

References:

1. http://www.grymoire.com/Unix/Awk.html.

2.http://en.wikipedia.org/wiki/AWK

Thinking about software, life, the universe and everything.

Category Archives: Uncategorized

Why I love to Swim

Being Craftsman – The Book

why I love Tennis?

The Apprentice

Business problems

Computing Document Similarity with nltk

Focus

Learn to Type Before You Learn to Code.

How I built a weather decision engine? or The story of wearthejacket.com

Rolling up data with Awk