Sports Analytics – Generating Actionable Insights using Cricket Commentary

Aravindpai Pai 25 Jun, 2020 • 10 min read


  • What is sports analytics? What are the different use cases of sports analytics? We answer these questions here
  • Understand how sports analytics can impact sports like cricket, football, and tennis
  • We’ll work on a fascinating sports analytics use case – analyzing India’s performance using cricket commentary data!



The scope of professional sports has changed over the years. I remember watching every minute of the 2003 Cricket World Cup and spending every waking minute tracking statistics, like the total runs scored, highest run-scorer, highest run rate, and so on.

It was fairly rudimentary stuff but enough to keep me glued to the screen. How times have changed since then!

Sports analytics is quickly becoming mainstream. Media outlets and leading sports websites regularly curate statistics, produce deep technical insights, and add a whole new level of analysis we haven’t seen before.

cricket analytics

We can now answer questions like the below ones with a high degree of confidence:

  • Sports analytics in cricket: What is the probability of the team winning a match while batting first and second?
  • Sports analytics in football: What is the expected outcome of a shot taken from the left side of the penalty area?
  • Sports analytics in tennis: Where should you place your serve based on the return statistics of your opponent

And so on. Honestly, the sky is the limit when it comes to sports analytics use cases. I’m a sports lover and I’m always looking out for applications where I can apply my analytics and machine learning knowledge to improve the team strategy as well as fan experience.

I’ll introduce you to the awe-inspiring world of sports analytics in this article. We will look at the different types of sports analytics, why this field is important, and we’ll also work on a use case of sports analytics – analyzing cricket commentary to generate insights.


Table of Contents

  1. What is Sports Analytics?
  2. Importance of Sports Analytics
  3. Sports Analytics Case Study: Analyzing Cricket Commentary
    1. Team Performance
    2. Batting Performance
    3. Bowling Performance
    4. Boundary Analysis


What is Sports Analytics?

Sports Analytics is all about analyzing and extracting useful insights from sports data.

I would broadly dive Sports Analytics into 2 categories:

  1. Descriptive Sports Analytics
  2. Predictive Sports Analytics

Let us discuss each category here.


Descriptive Sports Analytics

Descriptive Sports Analytics is about summarizing the sports data in the form of numbers. In other words, to come up with important statistics. This might sound like a simple concept but it’s a very powerful one.

The thought behind descriptive sports analytics plays a crucial role in team tactics.

Let’s take cricket for example. Here, we can analyze how frequently a batsman gets out to a specific bowler. This number will decide the bowling strategy of a team.

Here is an awesome video that analyzes the dismissals of Virat Kohli against Adam Zampa:

This is the reason why Adam Zampa was brought back into the attack whenever Virat Kohli was at the crease during Australia’s tour of India in 2020. In this series, Virat Kohli lost his wicket to Zampa in two out of three matches!

Another interesting use case in cricket is to analyze the team’s probability of winning a match while batting first as well as second. This influences the captain who wins the toss and has to make a decision – bat or bowl first.

Ind vs Pak sports analytics


Predictive Sports Analytics

Predictive Sports Analytics is about making predictions using sports data. One such use case in cricket is to predict the number of runs a batsman scores against an opponent in a particular match. This would help the team management and captain select the best team for every match.

In a sport like football, predictive sports analytics helps to understand the chances of scoring a goal from any location on the pitch.

You can think of similar use cases for your favorite sport and let me know in the comments section below the article.


Importance of Sports Analytics

Sports Analytics is a game-changer – there’s no other way to put it. Using analytics in sports directly impacts the decision making of a team and can alter the future of the franchise or club (or country). It can easily change the outcome of the match.

Sports Analytics can be a Game Changer.

There is a lot of scope for analytics in sports. In this section, I am going to discuss a few use cases of analytics in different sports, like cricket, football, and tennis.


Sports Analytics in Cricket

In cricket, we can analyze the strong and weak zone of a player. This would help the opponent and player understand the strengths and weaknesses of how he plays.

  • Opponents can develop a strategy to bowl against a player (like Adam Zampa against Virat Kohli)
  • The player can invest more time on his weakness to improve his game

Here is an awesome video that showcases the weak zone of Virat Kohli:


Sports Analytics in Football

The footballing world has been slow to adopt analytics but it’s quickly gathering pace now. We’re seeing the mainstream media using analytics numbers, such as expected goals and expected assists to analyze players and matches.

You should definitely keep an eye out on the Expected Goals (xG) metric. xG basically tells us the probability of a shot converting into a goal. This varies from player to player and from what position the shot is being taken. It’s quite a fascinating concept and you can read more about it here.

Another example of analytics in football is analyzing team formation while the match is going on. This would help the opponent to understand the team strategy and play according to it.


Sports Analytics in Tennis

In tennis, we can identify the combination of shots a player usually plays to win a point. This can be of great use to prepare a strategy against the opponent as well.

tennis analytics

I’m sure you must have seen the statistics that come up on screen after the end of each set at a tennis Grand Slam. Features like the number of first serves returned, the placement of the serve, the bounce of the serve and where the opponent picked it up – these are all examples of sports analytics in tennis.


Sports Analytics Case Study: Analyzing Cricket Commentary

Let’s take up a real-world case study now to understand how sports analytics works. I am going to delve into my personal passion, cricket, for this case study.

I’m an avid follower of text commentary in cricket. An insightful commentator describes the events happening on the ground in good detail, right? There is a lot of online cricket commentary available on many sports websites like CricBuzz, ESPN Cricinfo, etc. This is a gold mine that can reveal many interesting and valuable insights into a team and player.


About the Dataset for Sports Analytics

I have collected the commentary of the last 4 years of the T20 matches played by India. Download a sample dataset from here. It’s time to analyze the commentary and find some appealing insights. Let’s do it!



Let us first read the dataset and understand the different columns in the dataset:

cricket analytics

Team performance

After this section, you will be able to answer the below questions:

  1. What is the team average when batting first and second?
  2. How frequently does the Indian team win a match?
  3. What is the probability of India winning a match against a particular team?
  4. What is the target to be set by the Indian team to win the match?
  5. How many times has the Indian team defended a low scoring target?
  6. Which was the most successful year for Team India?

Ready? Let’s get our hands dirty now!


The total number of T20s India played in the last 4 years:

Total no. of T20s India played in last 4 years:54

No. of T20s India played each year

cricket analyticsInferences:

  1. India has played the most number of T20s in 2018 and the least number of T20s in 2017 & 2019. This is because of the ICC Champions trophy’ 17 and ICC World Cup, 19 tournaments

Team Average Score (Batting First & Second):

Batting First Team Average :180.0
Batting Second Team Average:156.0

Team Average Innings wise over the years:

cricket analytics


  1. The highest batting first average was close to 180 in 2018. On an average, India scored 180+ runs while batting first in 2018.  So, we can infer that India had the best batting line-up during 2018
  2. Batting Second average is always less than the batting first average over all the years. From this, we can infer that the target set by the Indian team is higher than the opponents

Overall Winning % (Batting First & Second):

Over all Winning %      : 66.66
Batting First Winning % : 59.0
Batting Second Winning %: 76.0

Winning % against different teams:



  1. India wins almost every match when playing against Bangladesh
  2. The team records the lowest winning percentage against New Zealand. India loses 60% of the matches since New Zealand are well known as the best spin playing team

Batting First Winning Score:

sports analyticsInferences:

  1. Probability of India winning a match after scoring:
    1. Less than 120 runs (<120) is around 0.33
    2. Between 120 and 180 runs (120-180) is around 0.4
    3. Greater than 180 runs (>180) is 0.75
  2. Glad to see that India has defended low scoring games too

Winning % over the years:

  1. Team performance has increased over the years and then drops down in 2019
  2. India records the highest winning percentage (82%) in 2018. The team won most of the matches played during 2018
  3. India lost half of the T20s played in 2019. Hence, the lowest winning percentage (50%) in 2019. One of the possible reasons could be due to the lack of senior players as the team opened up the doors for youngsters after ICC World Cup 2019


Batting performance

In this section, I will focus on the batting performance of team India in terms of the strike rate. We’ll also discuss how India’s performance has evolved over a period of time.

Strike rate can be defined as the average number of runs scored per 100 balls. The higher the strike rate, the better the batsman is.

Let’s find out the phases where team India can improve its batting.

Overall batting strike rate of Indian team:

Strike rate of Indian team is 138.66

Team batting strike rate over the years:



  1. India had the highest batting striking rate in 2018. The batsmen were in great touch!

Team batting strike rate across different phases of a match:



  1. The strike rate of the Indian team reaches around 150+ in the last 5 overs. And around 125+ in power play and middle overs

Team batting strike rate across different phases of a match over the years:



  1. In 2018, India recorded the highest batting strike rate across all the 3 phases (Powerplay, middle overs, and the last 5 overs)
  2. The highest batting strike rate, close to 175, was recorded in 2018 during the last 5 overs. Reminds me of Dhoni & Hardik Pandya’s powerful hitting

Bowling performance

In this section, let’s unleash the bowling performance of team India in terms of Economy rate, Bowling Strike rate, and Bowling Average. And also how the performance has evolved over time.

  • Economy rate is defined as the average number of runs conceded per over
  • Bowling strike rate can be defined as the average number of balls conceded for a wicket
  • Bowling Average is the average number of runs conceded for a wicket

Its time to analyze the bowling performance of the Indian team.

Team India Economy rate in different phases of a match:



  1. Indian bowlers concede around 7-8 runs per over. All credit to the Indian bowlers for such a healthy number!

Team India bowling performance across different phases of a match:




  1. On an average, Indian bowlers pick 2 wickets in the last 5 overs as bowlers concede around 13 balls for each wicket
  2. Indian bowlers concede around 27+ runs for a wicket in the middle overs which can be improved

Team bowling strike rate across different phases of a match over the years:



  1. Team India’s bowling performance was very poor in 2019 as the bowlers considered 33+ balls on an average to take a wicket in the middle overs
  2. India had the best death bowling attack in 2016

Boundary Analysis

In this section, we will be analyzing the average number of balls conceded by team India to score a boundary and also its evolution over the years.

Avg no. of balls to hit 4: 9
Avg no. of balls to hit 6: 19

Avg number of balls to score boundary over the years:



  1. Team India improved power hitting over the past few years. In 2019, the team cleared six for every 16 balls

Avg number of balls to score boundary across different phases of the match:



  1. India clears 4 for every over in the power play and the last 5 overs
  2. Team India smashes only 1 six in power play as the batsmen concede 24+ balls for a six

Avg number of balls to score 4 across different phases over the years:



  1. In 2019, India conceded the most number of balls (around 14) to hit 4 during the middle overs

Avg number of balls to score 6 across different phases over the years:



  1. Indian openers, middle order, and finishers improved the ability to clear six over the past years. That’s amazing!


End Notes

Unquestionably, Descriptive Sports Analytics has a far-reaching role to play in a team winning strategy compared to Predictive Sports Analytics. In this article, you have learned the importance of Sports Analytics and how analytics can impact different sports. We also analyzed team India’s performance over the past 4 years in T20 cricket.

Kindly leave your queries/feedback in the comments sections, I will reach out to you. Have fun implementing these ideas for your favorite sport!

Aravindpai Pai 25 Jun 2020

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers


VB 28 Feb, 2020

Aravind, First of kudos on this great article. As an avid cricket fan it is great to learn on what analytics can do for this great sport. With going back to 1500 as its starting days the extent of tech and analytics in cricket dwarfs when compared to baseball and football. So, this definitely is valuablethe questions that you have posed to be answered by this analysis - do i really need commentary data? Cant i just use the scorecard of every game? That analysis may be straight forward as opposed to mining the commentary data.

Shubham Kulkarni
Shubham Kulkarni 30 Mar, 2020

Hi,I am Shubham Kulkarni and just like you, I am a big cricket fan. I also analyze Cricket and make videos on Youtube. I am deeply interested in Cricket Analytics and have done a few things with stats. I studied some of the best Test and ODI players and came up with a formula to decide who are the better ones. I have used some really basic stuff and would like to use some more analytics to do the same. I have some plans in place using all this to create some amazing stuff. Can we work together and try producing some amazing stuff? Is it possible?

pRASUn 05 Aug, 2020

Great work bro ! love it . How can I get the whole 4 years dataset ?

Aditya Matele
Aditya Matele 18 Oct, 2020

Can I get the dataset of this dataset? As I am a student and I would like to work on this dataset for a mini project.

Hmda Plots in Pharma City
Hmda Plots in Pharma City 24 May, 2022

nice blog and explanation my favorite game cricket

Hoverboard 360 Coupon
Hoverboard 360 Coupon 27 Jun, 2022

Amazing stuff... totally loved it! Thanks for sharing

Probuddha 16 Jul, 2022

Can you help as to how to scrape the commentary datas??

Gabriel J. Kelly
Gabriel J. Kelly 12 Aug, 2022

sports are very important in life's schedule. personally, I really like cricket and I am the captain of my team. good post for the sports lover keep it up buddy good post!!

PaultheRummyBoy 17 Aug, 2022

This will be beneficial to me and my friends! Author. Appreciate it. Continue on!

Sweta Parikh
Sweta Parikh 28 Aug, 2022

Nice article! Thanks for sharing this informative post. Keep posting!

Vishwajeet Joshi
Vishwajeet Joshi 11 Sep, 2022

Hello, I intend to use regular expressions on commentary data. However, I'm unable to scrape the commentary data from espncricinfo. Could you please help me understand how you managed to scrape commentary data? Any help would be highly appreciated.

  • [tta_listen_btn class="listen"]