# Mathematics for Data Science

Janvi Kumari 14 Jun, 2024

## Introduction

Mathematics is a way of uncovering possible insights or information from data as done in the field of Data Science. So data science is a vast and a type of mixed field of statistical analysis, computer science, and domain expertise. But it is still the underlying mathematics used in data science that provides essential techniques and tools for working with, and learning from, data. In this article we will cover Math needed for Data Science So, let’s start.

#### Overview

• Master statistics concepts like mean, median, mode, variance, and standard deviation.
• Understand inferential statistics for drawing conclusions beyond collected data.
• Learn about probability, random variables, and probability distributions.
• Gain insights into linear algebra, including vectors, matrices, and operations like transpose and inverse.
• Explore calculus topics such as differentiation, integration, and their applications in data science.

## Statistics

Statistics provide the first datagnosis for the data science Datagnosis that is a sophisticated tool and technique of Data Analysis, Data Collection, And Data Interpretation.

Let us now explore types of statistics.

#### Descriptive Statistics

This includes few parameters to consider. Let us explore them:

• Mean: The MEAN is the arithmetic average of the data points and is defined as the SUM of all data points of the given list of data points divided by the number of data points.
• Median: The middle value in the sorted data set.
• Mode: The highest frequency in the data set.
• Variance and standard deviation: variance and standard deviation tell us about the spread of our data points in the dataset. They are measures of the data dispersion.

Example:

Consider this the dataset: [2,3,4,4,5,5,7,9]

Mean= (2+3+4+4+5+5+7+9)/8 = 4.875

Median = 4.5 (4+5)/2

Mode= 4

#### Inferential Statistics

Inferential statistics provides conclusions that extend beyond the data collected in the study. The key idea here is this:

• Statistical Hypothesis: To test assumptions regarding the population parameter.
• Confidence Interval: Interval of values within the population parameter is expected to be found.
• Regression Analysis: Relation between the dependent and independent variables are modeled.

Example:

Using a t-test to check if the mean of a sample is significantly different from a known population mean

## Probability

Probability is a fundamental concept in data science, involving uncertainty and randomness. It is crucial for understanding events and outcomes in datasets. The Central Limit Theorem explains this. Probability distributions like binomial, Poisson, and normal are essential for modeling real-world phenomena and making statistical inferences.

#### Random Variables (Discrete & Continuous)

• Discrete random variable: A random variable which can only take some certain, particular values is known as a discrete random variable. For example, the quantity of students in the classroom.
• Continuous Random Variable: The value of a continuous random variable is immeasurable, example of continuous random variable is a waiting time between two phone calls. For Example: A person’s Height

#### Central Limit Theorem

The main general purpose theorem behind this is Central Limit Theorem (CLT) which states that the distribution of sum of large number of independent, identically distributed random variables approaches normal distribution with mean of distribution equal to summation of mean of random variables and variance equals to summation of variances of random variables.

#### Probability Distributions

The person should be also familiar with the other distributions because Binomial, Poisson, Normal Distribution.

## Linear Algebra

Apart from these points, it is also useful for the data scientists to know about linear algebra that enables him to understand the data structure and algorithms underpinning machine learning.

• Vectors: An ordered list of numbers.
• Matrix: The set of numbers in an array, placed in rows and columns. Matrices are a whole new topic in itself and so if you are taking this tip, you better learn most of the matrices; like transpose, inverse, trace, determinant, and dot product of the matrix.

## Calculus

Differential Calculus, Integral Calculus, Maxima, Minima, the Mean value theorem, the Product rule, the chain rule, Taylor’s series, derivatives, the gradients of matrices, Backpropagation, The Gradient Descent algorithm, higher-order derivatives, the Multivariate Taylor series, the Fourier transformations, area under the curve in Calculus.

## Geometry and Graph

You need to know how to handle the angles, measurements, and proportions of regular objects and also be familiar with multiple types of plots.

## Conclusion

Thus with this article, we can have an idea on what Mathematics is required to master data science. These were the few basic concepts of mathematics which is the backbone of data science one should have a really good understanding of these topics in order to learn data science.

Q1. What is the role of statistics in data science?

A. Statistics provides tools for data analysis, including measures like mean, median, mode, variance, and standard deviation to understand and interpret data.

Q2. What are the types of statistics used in data science?

A. Descriptive statistics (mean, median, mode, variance, standard deviation) and inferential statistics (hypothesis testing, confidence intervals, regression analysis) are commonly used.

Q3. Why is probability important in data science?

A. Probability helps quantify uncertainty and randomness in data, essential for making predictions and decisions based on data analysis.

Janvi Kumari 14 Jun, 2024