DataHack Radio Episode #5: Building High Performance Data Science teams with Kiran R
Very rarely do you come across someone who brings all of the following skills to the table:
- Great depth and breadth of technical skills – someone who has won Kaggle competitions and is up to speed with the latest developments in data science
- Business Sense – Very often, the danger which comes with the first skill is that people start looking at everything as a technical problem and not a business problem. An MBA by education (from one of the top colleges) and someone who is keen to solve business problems
- Excellent Communication skills – A regular speaker at several data science conferences. A person who knows how to articulate their thoughts, so that they deliver the right impact
- Always willing to give back – Always up for helping people, someone who has mentored a generation of data scientists coming from India and who feel passionately about building a data science ecosystem.
What do you do when you get some one with these skills?
Ladies and Gentleman – today we have a guest like this on the show – please give a round of applause to Kiran R, Director of the Data Sciences Center of Excellence at VMware.
He has also agreed to do a talk at DataHack Summit 2018 this year – so stay tuned.
Kiran is a computer science engineer from MSRIT and a post-graduate from the Indian Institute of Management. Kozhikode (IIM-K). He brings more than 14 years of experience building high performance teams for several leading organizations. He is currently leading the Data Sciences Center of Excellence at VMware.
Kiran is one of the few Grandmasters on Kaggle who has made his mark in the overall top 10 data scientists there. He is also a winner of the KDD 2014 Data Mining Cup. He is very passionate about data science, as you will discover in his conversation with Kunal.
This is a MUST-LISTEN podcast that spans the length and breadth of the various components in the world of data science.
This article is essentially a highlight reel of this incredibly knowledge-rich podcast. Happy listening!
You can subscribe to DataHack Radio or listen to previous episodes on any of the below platforms:
Kiran R’s Journey
Kiran’s first job was at Motorola as a software engineer writing C/C++ code. After spending a year there, he decided to pursue higher studies and completed his MBA from IIM Kozhikode in 2006. He took up an offer from Dell as an Analytics Manager and worked on various tools like SQL, Access and SAS (to build propensity models).
His brilliant work ethic and keen innovative mind led him to win Dell’s India Innovator of the Year award in 2012, handed to him directly by Michael Dell!
He set up his own proprietorship under the name ‘Chaotic Experiments’ which he used while participating in several Kaggle competitions. He regularly finished in the top 10 in various competitions and achieved his highest rank of 7 till date during this period. He also did some freelance work in data mining at the time.
From there, Kiran had stints at Amazon and Flipkart. He has been in his current role at VMware since the last four years. His role involves managing a team of data scientists that work across various domains including marketing, pricing and strategy.
One of the more fascinating things about Kiran’s career arc has been his choice to stay in a technical role even after getting his management degree. He was determined not to lose touch with his technical skills and carved out a path for himself accordingly.
Kaggle Competitions and Experience
Kiran started off his Kaggle journey back when there weren’t too many data science “enthusiasts” around. The way he used to approach these problems was to build a diverse set of models – a linear model was always a good start, random forest, gradient boosting and then an ensemble model.
He made a lot of acquaintances through the platform and these competitions always pushed him to get out of his comfort zone and learn new techniques and build frameworks. This is something he stressed on, because most data scientist jobs restrict you to only those tools and techniques which the business can support. Platforms like Kaggle and Analytics Vidhya allow you to spread your wings.
Competitions and Real-World Job
“If you have a hammer and only one tool, everything looks like a nail to you.”
Participating and learning through competitions helped Kiran a lot when it came to his professional job. There is obviously a difference between the two – like in the industry, you won’t get a ready-made dataset and you will need to convert the business problem into a data mining problem.
But competitions and platforms like Analytics Vidhya help you become an expert in techniques and tools, and finding out why and how things work (or don’t work). Concepts like overfitting can really help you in the industry and competitions also expose you to problems from various domains, which is an added advantage.
Kiran also used his experience to tell us that most problems in the industry are classification based. Regression and segmentation are used as well, but pale in comparison to classification problems. Some of the most widely used techniques are logistic regression and tree-based models.
Kiran’s take on how Data Science Works in the Industry
Kiran’s experience has mostly been in the B2C space (with the last 4 years in the B2B sector). He uses the following parameters to classify the data science work companies do:
- How advanced are the techniques being used?
- How easily does the company adopt these techniques and frameworks?
- How is the data science team organised?
Kiran used these pointers to tell us his take on how Amazon, Flipkart and Dell leverage data science. His insights are fascinating and will enrich your knowledge of how data science works in the real-world.
He also elaborated on how his current organization, VMware, uses data science. He went on tell us the difference between B2B and B2C, and how it’s easier to test your models and results in a B2B space than a B2C space (primarily because of the longer purchase cycle the former has).
Day-to-day Schedule as a Director, Data Sciences CoE at VMware
Kiran’s role is divided into 2 parts – 90% of it is a functional role, which is to lead the Data Sciences Center of Excellence. The other 10% is to play a sight leader role for enterprise information management (BI, Analytics, and Data Science).
His typical day involves a lot of operational activities, tons of people activities, project deliveries, data science technique brainstorming, interacting with stakeholders, among other things. He has a very hands-on leadership style which helps him juggle and manage these various aspects on a daily basis.
Hiring and Recruitment Strategy
Kiran believes hiring is the most important part of team building, because if you have the right people, the job will get done. When he hires people for his team, there are a few critical areas he looks at:
- Business problem solving skills
- Analytical ability
- Programming skills
- Technology understanding of analytics
- Culture fit for the organization
I highly recommend listening to the podcast to understand the details of these pointers which Kiran has elaborated on in detail. This will give you a brilliant insight into what an industry leader looks for when hiring a data scientist.
The Next 5 Years in Data Science
One thing that has stood out for Kiran is the pace at which data science has grown in recent years. The interest in this field is extremely high right now and he sees this continuing in the future.
Advancement in the type of models we make will increase and the time it takes to train models will decrease, according to Kiran. This will of course be helped by far better computational power (GPUs could well get cheaper) and we should see deep learning being used more and more on structured data.
Another change Kiran predicts in the near future is the increasing pressure on business leaders to adopt AI and data mining. As more and more techniques are developed and become interpretable, businesses will be forced into situations where they have the leverage AI to gain any competitive edge they can get.
Most of the proprietor software, like SAS, will go away according to Kiran. Open source languages like R and Python have already eaten away into the lead SAS had and will continue to gain popularity in the coming years. Open source is the driving force behind machine learning and more and more researchers and organizations are realizing this.
Advice to Graduates and Freshers in Data Science
Kiran firmly believes one needs to be good in programming to make a career in machine learning. If you can’t do programming, you will have an almighty struggle to get into this field. If you’re new to this field, Kiran’s advice is to pick up Python and get very familiar with it. You should also eventually learn more than one language which will help expand your skillset.
Learn the fundamentals of data science well. There are tons of excellent resources out there so there’s no excuse not to do this. Once you build your base, you can look at applying for a job in this domain. Another critical aspect, often overlooked, is debugging skills. If you can’t debug your program or code, you will end up getting frustrated with the shortcomings in your script.
Also, develop some complementary skills (like web development or application development) to go along with the data science techniques. As a data scientist, you need to understand how the technical aspects of how to put your model output into a deployment friendly format (among other things).
This was another quality addition to our DataHack podcast series. Kiran R’s incredible knowledge and his willingness to share it with the community was obvious in this conversation with Kunal. You will learn a LOT of new things about the data science domain.