Kaggle Grandmaster Series – Notebooks Grandmaster and Rank #2 Dan Becker’s Data Science Journey!

Analytics Vidhya Last Updated : 11 Dec, 2020

9 min read

” If I were explicitly trying to be in the top 1%, I might have given up before I got there. It’s such a hard goal that I’d have given up thinking I’d never get there”- Dan Becker

I am pretty sure this is a throwback moment for many of you when you dropped off from your Kaggle journey. Not able to reach the top 1% and think there is no meaning of you practicing.

Well, the Kaggle Grandmaster series is back with yet another interview, and this time we have Dan Becker with us.

data science interview dan becker

Dan is a Kaggle Notebooks Grandmaster and currently holds the 2nd rank in this criterion. His notebooks are not only widely referred to by DS beginners but they also are a part of free courses in Kaggle learn He is also a Kaggle Datasets and Discussions Expert.

Dan is the Founder of a Company called Decision.AI that helps data scientists translate their AI models into optimal business results. Before this has worked as a Data Science with Google too! Pretty amazing, right?

In this interview, we cover a range of topics, including:

Dan Becker’s Transition from Economics to Data Science
Dan’s Kaggle Journey from Scratch to becoming a Grandmaster
Dan’s Advice to the Beginners in Data Science

So, go through this interview and absorb all you can!

Dan Becker’s Education and Work

data science interview work

Analytics Vidhya (AV): Your educational background involves a Ph.D. in Econometrics. Could you please tell us how you transitioned from economic to DS and what were the challenges you faced in this journey?

Dan Becker(DB): I started the transition to DS after reading a newspaper article about a Kaggle competition with a $3Million grand prize. I made a submission using conventional econometric techniques, and I was in the bottom 10% of the leaderboard. I still remember the bad feeling in my stomach when I first saw that result. I thought I was so good at modeling, and it was hard to accept that I was at the bottom. But it inspired me to learn and improve. Each night for the next year, I would improve my submission or learn more about machine learning. I moved up a few spots at a time, and I finished in 2nd place out of 1353 teams in that competition.

By the end, I had really completed my transition to becoming a data scientist.

AV: You worked as a DS at one of the best companies in the world – Google! What kind of skill-sets knowledge does it take to land a Data Scientist role at such a big company?

DB: It varies even from one person at Google to the next depending on the exact role.

I’d been a data scientist for 7 or 8 years by the time I joined Google. It’s part of my personality to always worry about falling behind, so I’ve never stopped learning. As a result, I had a pretty broad understanding of data science topics. In terms of the job interview itself, Google loves algorithms questions. The book “Cracking the Coding Interview” is the best resource for job interviews at a lot of these big tech companies.

AV: Post Kaggle, you founded Decision.ai, a tool to help data scientists to translate their AI models into optimal business results. Could you elaborate on how an AI model translates to business models?

DB: Decision AI is a tool for analysts and data scientists to help get more business value from the machine learning models they already build. Supervised machine learning models make predictions, but there’s typically very little rigor around how those predictions are used. Let me give you one example:

A data scientist builds a model that predicts which financial transactions are fraudulent. For one example transaction, the model says it’s 5% likely to be a fraud. Now the question is what do you do about this. Some people use a simple threshold, perhaps rejecting all transactions that are at least 10% likely to be a fraud.

The way you translate a prediction into a real-world action is called the “decision function.” Now the question is “what’s the optimal decision function. For each transaction, you might want to account for how valuable the customer, because that informs how bad it is to inconvenience them with a denied transaction. You’d want to compare that to the cost of accepting a fraudulent transaction, which might depend on the transaction amount.

So there’s all this business context you should consider in your decision function. We can’t automate finding the exact decision function. But we provide a tool so data scientists can rigorously optimize how they make these decisions.

This isn’t unique to fraud. We see it in supply chain management, preventive maintenance, pricing, health care, and other places.

Across a lot of use cases, people are surprised when they realize how much better they can get at making decisions. In many cases, they start by thinking this isn’t the data scientists’ job, and someone else should do it. But then they use our tool and realize how much profit they can add by being rigorous with decision optimization, even if it involves collaboration with other stakeholders.

Dan’s Kaggle Journey from Scratch to becoming a Grandmaster

AV: You’re a Kaggle Notebooks Grandmaster and currently ranked 2, first of all, hats off to you. This is beyond Amazing!! Here’s a question a LOT of people would love to know about: What is your framework and strategy for creating an expert-level notebook? Is there a check-list?

DB: I don’t have a checklist. A lot of my notebooks are featured in Kaggle Learn courses, and that’s partly responsible for the attention they get.

In general, I divide notebooks into two categories:

One category of notebooks is educational. Those should be about a specific technique. For example, you could do a notebook about how to use Seaborn for data visualization. In that, I wouldn’t add a bunch of pandas or scikit-learn stuff, because other materials are just distracting.

Ideally, the notebook would explain your mental model for seaborn, rather than being just a long list of examples. That way, after reading your notebook, I could figure out how to do things for myself.

The second type of notebook is curiosity-driven. These will usually get fewer votes, but I personally like them. For example, I might wonder what the trend is in wildfires over time. I find a dataset, and then make a couple of graphs to start answering that question. Usually, in the first graph, I will bring up new questions. So I’ll create more graphs to answer those.

AV: That’s just amazing Dan. Now, what were the challenges you faced initially when started Kaggling and how did you overcome them?

DB: Initially, my challenge was that I wasn’t very good. I didn’t expect to end up in the top 1%, but I enjoyed improving. That helps me keep working every day. If I were explicitly trying to be in the top 1%, I might have given up before I got there. It’s such a hard goal that I’d have given up thinking I’d never get there.

Kaggle competitions also have more top competitors now when I started 10 years ago. I don’t think that’s a great path to professional growth for most people, and finding a community of people you can learn with seems more promising to me.

AV: You currently have more than 180 notebooks which are widely referred to by DS beginners. Did you plan to focus on Notebooks specifically, and what are your criteria to choose a topic for a notebook?

DB: I created some notebooks for the free courses at Kaggle Learn, so many of my notebooks are directly from that. Now that I’m not doing that, my notebooks are almost always driven by a curiosity about a real-world question.

AV: Since 180+ is a huge number, so which 5 notebooks are your favorite that you would recommend to our community?

DB: I created a machine learning explainability course at https://www.kaggle.com/learn/machine-learning-explainability and those are easily my favorite notebooks

AV: Since you’ve seen Kaggle grow from the start to what it is now, can you tell us a couple of milestones that you felt were a key part of your journey?

DB: Finishing in 2nd place in the Heritage Health Prize is easily my biggest personal milestone.

I also competed in the first competition that used deep learning techniques. This was before tools like Keras, PyTorch or TensorFlow existed. I used a library called PyLearn2. I also made my first couple of open-source contributions to PyLearn2 as part of doing that competition

Dan’s Advice to the Beginners in Data Science

data types MySql

AV: As an industry-leader in DS and ML, what advice would you give to beginners so that they can excel in the industry?

DB: I think it’s a mistake to learn a lot of theory first and then start doing projects. I see people who have spent years becoming data scientists and they still don’t know much about how things work in practice.

Instead, I’d favor learning the bare minimum you need to try a project like a Kaggle competition. Then learn more theory after you have the practical experience to understand where theory fits in.

Also, you absolutely need to learn how to use Git and to collaborate with other people.

Finally, learn to use Pandas well.

Most data scientists spend 10X more time manipulating and cleaning data than they do with fancy algorithms. Deep learning may be fun, but Pandas is more practically useful.

Most people I know who are trying to hire data scientists have lamented the shortage of data scientists who can work quickly with Pandas.

AV: Kaggle is widely used and accepted as a stepping stone to become a successful DS. What advice would you give to beginners so that they can fully leverage this platform?

DB: Some people come to Kaggle with the goal of achieving a certain rank to help them get a job. That approach is a mistake. Rankings won’t get you a job unless you win a competition or get very close to it. But 99.9% of participants won’t achieve that.

Fortunately, Kaggle is a great place to learn. I’d emphasize learning from others. Team up with people in competitions, or share your notebooks broadly to get feedback and advice from others.

Find datasets about topics you find interesting and create your own projects to share. Kaggle’s probably the best place in the world to learn by doing. If you don’t think you are ready for that, start with the courses on Kaggle Learn.

AV: It is usually seen that people participate in hackathons and even yield good results but when it comes to translating that into the industries/business, most people struggle with that. So based on your experience what advice would you give to them to overcome this gap?

DB: That’s a hard but important question. There are many parts to success in solving business problems that you don’t deal with in a hackathon or hobby project. If you can do it, getting a data science or analytics job will help by exposing you to these issues. I think that should be your #1 goal.

Aside from that, for each project, you should spend a little time understanding how decisions are made today and how you can help. If a decision is made by a person, you might start by creating some graphs they’d find useful. Then see if you can send them those graphs and start a conversation. This is less fun than making a machine learning model. But you know that no one is going to engage in a conversation with you based on your emailing them a model. So I’d just try to get closely involved with real decision-making processes. That’s still hard to do though.

AV: You’re someone who’s work everyone looks forward to. Can you name five Data Science experts whose work motivates you?

DB: I expect reinforcement learning will be hugely impactful in the future (even if it isn’t yet), so I enjoy reading about Sergey Levine’s research. It’s a little research-focused, but the BAIR blog is one of my favorites.

I hugely respect Thomas Wiecki and everyone who is making Bayesian approaches more broadly usable.

I collaborated with Tim Salimans in a Kaggle competition. He’s absolutely brilliant. We haven’t stayed in touch, but I’m always excited when I see his research.

Susan Athey is combining econometrics and machine learning in a way that I absolutely love.

Andrew Gelman is very insightful about how to use data. He’d call himself a “statistician,” but I don’t think the distinction between statistics and data science is very useful.

End Notes

That was a pretty heavy and inspirational interview. We hope you were able to absorb things said in this interview and it helps you in the course of your data science journey.

This is the seventh interview in the Kaggle Grandmaster Series. We recommend you go through some of the previous interviews too-

What did you learn from this interview? Are there other data science leaders you would want us to interview? Let me know in the comments section below!

Analytics Vidhya

Analytics Vidhya Content team

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Kaggle Grandmaster Series – Notebooks Grandmaster and Rank #2 Dan Becker’s Data Science Journey!

In this interview, we cover a range of topics, including:

Dan Becker’s Education and Work

Analytics Vidhya (AV): Your educational background involves a Ph.D. in Econometrics. Could you please tell us how you transitioned from economic to DS and what were the challenges you faced in this journey?

AV: You worked as a DS at one of the best companies in the world – Google! What kind of skill-sets knowledge does it take to land a Data Scientist role at such a big company?

AV: Post Kaggle, you founded Decision.ai, a tool to help data scientists to translate their AI models into optimal business results. Could you elaborate on how an AI model translates to business models?

Dan’s Kaggle Journey from Scratch to becoming a Grandmaster

AV: You’re a Kaggle Notebooks Grandmaster and currently ranked 2, first of all, hats off to you. This is beyond Amazing!! Here’s a question a LOT of people would love to know about: What is your framework and strategy for creating an expert-level notebook? Is there a check-list?

AV: That’s just amazing Dan. Now, what were the challenges you faced initially when started Kaggling and how did you overcome them?

AV: You currently have more than 180 notebooks which are widely referred to by DS beginners. Did you plan to focus on Notebooks specifically, and what are your criteria to choose a topic for a notebook?

AV: Since 180+ is a huge number, so which 5 notebooks are your favorite that you would recommend to our community?

AV: Since you’ve seen Kaggle grow from the start to what it is now, can you tell us a couple of milestones that you felt were a key part of your journey?

Dan’s Advice to the Beginners in Data Science

AV: As an industry-leader in DS and ML, what advice would you give to beginners so that they can excel in the industry?

AV: Kaggle is widely used and accepted as a stepping stone to become a successful DS. What advice would you give to beginners so that they can fully leverage this platform?

AV: It is usually seen that people participate in hackathons and even yield good results but when it comes to translating that into the industries/business, most people struggle with that. So based on your experience what advice would you give to them to overcome this gap?

AV: You’re someone who’s work everyone looks forward to. Can you name five Data Science experts whose work motivates you?

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Kaggle Grandmaster Series – Notebooks Grandmaster and Rank #2 Dan Becker’s Data Science Journey!

In this interview, we cover a range of topics, including:

Dan Becker’s Education and Work

Analytics Vidhya (AV): Your educational background involves a Ph.D. in Econometrics. Could you please tell us how you transitioned from economic to DS and what were the challenges you faced in this journey?

AV: You worked as a DS at one of the best companies in the world – Google! What kind of skill-sets knowledge does it take to land a Data Scientist role at such a big company?

AV: Post Kaggle, you founded Decision.ai, a tool to help data scientists to translate their AI models into optimal business results. Could you elaborate on how an AI model translates to business models?

Dan’s Kaggle Journey from Scratch to becoming a Grandmaster

AV: You’re a Kaggle Notebooks Grandmaster and currently ranked 2, first of all, hats off to you. This is beyond Amazing!! Here’s a question a LOT of people would love to know about: What is your framework and strategy for creating an expert-level notebook? Is there a check-list?

AV: That’s just amazing Dan. Now, what were the challenges you faced initially when started Kaggling and how did you overcome them?

AV: You currently have more than 180 notebooks which are widely referred to by DS beginners. Did you plan to focus on Notebooks specifically, and what are your criteria to choose a topic for a notebook?

AV: Since 180+ is a huge number, so which 5 notebooks are your favorite that you would recommend to our community?

AV: Since you’ve seen Kaggle grow from the start to what it is now, can you tell us a couple of milestones that you felt were a key part of your journey?

Dan’s Advice to the Beginners in Data Science

AV: As an industry-leader in DS and ML, what advice would you give to beginners so that they can excel in the industry?

AV: Kaggle is widely used and accepted as a stepping stone to become a successful DS. What advice would you give to beginners so that they can fully leverage this platform?

AV: It is usually seen that people participate in hackathons and even yield good results but when it comes to translating that into the industries/business, most people struggle with that. So based on your experience what advice would you give to them to overcome this gap?

AV: You’re someone who’s work everyone looks forward to. Can you name five Data Science experts whose work motivates you?

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques