End-to-End Data Analysis Using Microsoft Excel
This article was published as a part of the Data Science Blogathon.
Introduction
We will be analyzing fitness information in this article completely in Excel. Excel has some inbuilt functions which we could use to make our analysis stand out.
About the Data Analysis
The data is publicly available on Kaggle. It includes fitness information collected by the contributor on Kaggle on their Redmi fit and synthetically generated data. The data columns are the following.
Day, Workout Type, Distance Covered (in km), Duration of the workout (in minutes), Total number of active calories burned (kcal), Steps count per day, Average speed (km/hour), Average Stride Rate, Maximum number of steps taken per minute in the entire workout, Average time is taken to cover a kilometer, Maximum time taken to cover a kilometer, Minimum time taken to cover a kilometer, Maximum BPM during the workout, Minimum BPM during the workout, Maximum volume of oxygen used by the body while exercising, Anaerobic, Aerobic, Intensity, Light.
After downloading the CSV file from Kaggle I opened it in Excel and converted the available sheet into a table; a shortcut to do the same is, pressing Ctrl + t.
Converting it to a table provides ease in calculations and referring to the information in different sheets of the same Excel workbook.
As we do describe() in python to get an overview of the maximum, minimum, different quartile ranges, the same can be done in Excel with the use of some simple formulas.
I have used inbuilt excel functions to create the statistical summary, the attached video will provide you with the functions I used to generate the summary.
Data Analysis
Data analysis is asking questions about the data that would help in making better our understanding of it. As part of this article, we would ask questions about the data and visualize its results in Excel.
Which Workout Burns the Most Calories Given the Time Spent and Distance Covered for Each?
We have created a pivot table keeping the workout type as a column and calories, time, and distance as values.
We have kept the workout type to be a column, all the other features in the pivot table would be grouped by it.
Cricket burns the most calories even when the time spent on it is lesser than the time spent on the treadmill or pool swimming.
We can visualize the same using a bar graph.
Go to Insert –> Select a bar graph,
After selecting the type of bar graph we get the next visualization.
When you have a non-technical audience, this graph could have a better impact than looking at numbers in a pivot table.
Calories burning do not seem to affect by the time and distance spent. We have workouts that like outdoor cycling take almost the same amount of time but the calories spent are lesser.
Let us further ask some other questions.
Month-wise Calories Spent
We see a split of 70-30, 70% of calories spent are in January whereas 30% in February.
Does Heart Rate get Affected by Aerobic or Anaerobic Exercises?
We see a pattern here the heart rate line graph follows a similar trajectory as the anaerobic line graph. The heart rate seems to be affected by how anaerobic the exercise is.
Can the Average Speed at Which the Workout is Performed Determine its Intensity?
We plot a stacked chart in Excel to understand if the magnitude of a workout can be determined by the speed of the workout.
Count of Workouts Performed Each Day
We had already discussed that some of the data is generated artificially, which looks right, it seems to be difficult to complete daily more than 20 workout exercises.
Is the Amount of Oxygen Related to any Other Features in the Table?
The amount of oxygen used in a workout seems to be reflective of the intensity of the workout, the doughnut chart shows the same visually.
The speed at which the workout is done is linearly related to the amount of oxygen that is spent during it, a graphical representation is
End Notes
About the Author
My name is Alifia and I am working as an Analyst and like to interrogate data and share my findings through technical articles. You can read other articles published by me on Analytics Vidhya here. You can reach out to me here.
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.