Playing Data with Statistics

Suchithra Ponmudi 27 Mar, 2021

5 min read

This article was published as a part of the Data Science Blogathon.

Introduction

Instead of starting with, Statistics is everything and is everywhere, let me put it in some other useful statement. “Where there is Data, Statistic Flags”

We can do a lot with the data like analyzing, predicting, and so on. The most important thing is choosing the model that should be suitable for both the business problem and the data. For some budding statisticians, they will have confusion on which model suits on which data for the general analysis. Here, I am sharing some key points, where to use the adequate model. Before getting into the modeling part, first, we get some information about the basics of statistical terms. What I mean as the basic statistical terms are Descriptive, Predictive, and Prescriptive.

Descriptive, Predictive, and Prescriptive Statistics:

Source

Without changing any values, looking into the data either by graphical or table is Descriptive, in statistics. In the Descriptive model, the values or data with converted to a diagrammatic representation. This makes the statistician or the audience, understand the overall view of the dataset. After getting the idea about the data, we can proceed to the main appropriate model. Not going to make any changes in the data but converting to a diagrammatic representation and the data is as it is. Too simple you are, Descriptive.

Predictive, as the name defines, we are going to predictive something with the data we hold. The keyword that needs to pop, when you see something related to predictive is “How much/ How many”. The expectation of something is on the term predictive. In some business problem, we can frame the keyword as, how much the profit will be in next week? Or how many products can be expected to sell next month? If any business problem suits the keyword, then go for the predictive modeling. Unlike Descriptive, Predictive needs some effort.

Prescriptive is the next phase of Predictive. Rather than stopping at “How much? Or How many?”, we can further develop with the question, “How to make something happen?”. For instance, to increase the profit to 30%, What we can do? Or What we can do to sell 20% more products? Or What are the possible ways to achieve the customer’s need? This prescriptive type of model comes under the suggestion category. Giving suggestions is not that much easier as we think. Of course, a lot of work and analysis should be taken part for giving the worthful suggestion. Prescriptive is way more meaningful!!

Model or Analysis:

Different types of data can be adopted with different types of statistical analysis. Some of the major models in statistics will be discussed here. And they all have many sub-divisions. But in this article, let’s have some notes, only on the major model and when to use them.

^Source

Correlation

There will a finding between everything in the data world. Using the correlation technique is one of the ways for comparing and get how strongly one variable is related to the other variable. Correlation analysis helps in finding the relationship or association between two or more variables. Variables are nothing but factors. Correlation analysis works both in quantitative and qualitative data. Simply, the Correlation value lies between -1 and 1. How close the correlation value to 1, is that much strong positive correlation is present between the variables, and how close the correlation value to -1, is that much strong negative relationship between the variables. 0 defines no correlation between the variables/factors.

When one variable increases with the other variable, shows a positive correlation. (E.g., As sales increases profit will also increase). When one variable increases, the other tends to decrease, which is the negative correlation. (E.g., As absenteeism increases production decreases). This model sometimes also helps us in reducing the variable count. Like, we can choose the variable that has a strong relationship for doing more research. So, we can work with the really needed variables.

Regression:

Regression analysis is useful when we need to find the dependencies of one variable on the other. This will also work for qualitative and quantitative data. The regression value lies between 0 and 1. 1 defines the perfect fit and 0 defines no fitting at all. This gives us how well one variable is dependent on the other variable. Using the regression analysis, we can perform predictive modeling.

Source

For example, a best educational institution is dependent on the qualified staff, environment, technology usage, etc., for such data we can prefer using the Regression analysis and this is a more appropriate model. This gives us how much a dependent variable is dependent on other independent variables. As per our example, the dependent variable is the institute and the independent variables are staffs, environment, technology usage, etc. So, do you want to know how one variable is dependent on another, then one of the best options to go for is Regression?

Survival Analysis:

If the data contains time duration and some incident occurrence, we can go for survival analysis. Survival analysis is most useful in the clinical trial data and in product life expectancy (E.g., how much time will an expected incident takes to happen). The most common term used in the survival analysis is censoring and is one of the major problems in survival analysis. Censoring means missing complete information on the survival time (i.e., we have the information on duration but not complete information).

Survival analysis should have a bivariate variable. Considering industrial data, we will have the time period of how long a machine can work. The dataset will contain the time and whether the machine failed or not as an event. Here, the event will be denoted as 0 and 1. In this case, prefer Survival Analysis. When we talk about survival analysis, the value lies between 0 and 1, which is plotted with the time period. Using the plotted graph, we can interpret the data.

Depending on the data and business problem, choosing the analysis may vary. There are a lot of packages to do the analysis so it will not be that much complicated for you to do a statistical model. But, based on the output, drawing the inference is most important and is where more attention to be provided. Hope you found some idea about, when and where to use the respective model.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

blogathon statistics

S

Suchithra Ponmudi 27 Mar, 2021

Beginner Statistics

Frequently Asked Questions

Responses From Readers

Robert Sasaka 28 Mar, 2021

This was a very insightful article. Kindly send me more of them. Am trying to research on the use of logistic regression to predict the spread of Corona virus.

1