Instead of starting with, Statistics is everything and is everywhere, let me put it in some other useful statement. “**Where there is Data, Statistic Flags**”

We can do a lot with the data like analyzing, predicting, and so on. The most important thing is choosing the model that should be suitable for both the business problem and the data. For some budding statisticians, they will have confusion on which model suits on which data for the general analysis. Here, I am sharing some key points, where to use the adequate model. Before getting into the modeling part, first, we get some information about the basics of statistical terms. What I mean as the basic statistical terms are Descriptive, Predictive, and Prescriptive.

Without changing any values, looking into the data either by graphical or table is Descriptive, in statistics. In the **Descriptive** model, the values or data with converted to a diagrammatic representation. This makes the statistician or the audience, understand the overall view of the dataset. After getting the idea about the data, we can proceed to the main appropriate model. Not going to make any changes in the data but converting to a diagrammatic representation and the data is as it is. Too simple you are, Descriptive.

**Predictive**, as the name defines, we are going to predictive something with the data we hold. The keyword that needs to pop, when you see something related to predictive is “How much/ How many”. The expectation of something is on the term predictive. In some business problem, we can frame the keyword as, how much the profit will be in next week? Or how many products can be expected to sell next month? If any business problem suits the keyword, then go for the predictive modeling. Unlike Descriptive, Predictive needs some effort.

**Prescriptive** is the next phase of Predictive. Rather than stopping at “How much? Or How many?”, we can further develop with the question, “How to make something happen?”. For instance, to increase the profit to 30%, What we can do? Or What we can do to sell 20% more products? Or What are the possible ways to achieve the customer’s need? This prescriptive type of model comes under the suggestion category. Giving suggestions is not that much easier as we think. Of course, a lot of work and analysis should be taken part for giving the worthful suggestion. Prescriptive is way more meaningful!!

Different types of data can be adopted with different types of statistical analysis. Some of the major models in statistics will be discussed here. And they all have many sub-divisions. But in this article, let’s have some notes, only on the major model and when to use them.

There will a finding between everything in the data world. Using the correlation technique is one of the ways for comparing and get how strongly one variable is related to the other variable. Correlation analysis helps in finding the relationship or association between two or more variables. Variables are nothing but factors. Correlation analysis works both in quantitative and qualitative data. Simply, the Correlation value lies between -1 and 1. How close the correlation value to 1, is that much strong positive correlation is present between the variables, and how close the correlation value to -1, is that much strong negative relationship between the variables. 0 defines no correlation between the variables/factors.

When one variable increases with the other variable, shows a positive correlation. (E.g., As sales increases profit will also increase). When one variable increases, the other tends to decrease, which is the negative correlation. (E.g., As absenteeism increases production decreases). This model sometimes also helps us in reducing the variable count. Like, we can choose the variable that has a strong relationship for doing more research. So, we can work with the really needed variables.

Regression analysis is useful when we need to find the dependencies of one variable on the other. This will also work for qualitative and quantitative data. The regression value lies between 0 and 1. 1 defines the perfect fit and 0 defines no fitting at all. This gives us how well one variable is dependent on the other variable. Using the regression analysis, we can perform predictive modeling.

For example, a best educational institution is dependent on the qualified staff, environment, technology usage, etc., for such data we can prefer using the Regression analysis and this is a more appropriate model. This gives us how much a dependent variable is dependent on other independent variables. As per our example, the dependent variable is the institute and the independent variables are staffs, environment, technology usage, etc. So, do you want to know how one variable is dependent on another, then one of the best options to go for is Regression?

If the data contains time duration and some incident occurrence, we can go for survival analysis. Survival analysis is most useful in the clinical trial data and in product life expectancy (E.g., how much time will an expected incident takes to happen). The most common term used in the survival analysis is censoring and is one of the major problems in survival analysis. Censoring means missing complete information on the survival time (i.e., we have the information on duration but not complete information).

Survival analysis should have a bivariate variable. Considering industrial data, we will have the time period of how long a machine can work. The dataset will contain the time and whether the machine failed or not as an event. Here, the event will be denoted as 0 and 1. In this case, prefer Survival Analysis. When we talk about survival analysis, the value lies between 0 and 1, which is plotted with the time period. Using the plotted graph, we can interpret the data.

Depending on the data and business problem, choosing the analysis may vary. There are a lot of packages to do the analysis so it will not be that much complicated for you to do a statistical model. But, based on the output, drawing the inference is most important and is where more attention to be provided. Hope you found some idea about, when and where to use the respective model.

*The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.*

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Become a full stack data scientist##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

Understanding Cost Function
Understanding Gradient Descent
Math Behind Gradient Descent
Assumptions of Linear Regression
Implement Linear Regression from Scratch
Train Linear Regression in Python
Implementing Linear Regression in R
Diagnosing Residual Plots in Linear Regression Models
Generalized Linear Models
Introduction to Logistic Regression
Odds Ratio
Implementing Logistic Regression from Scratch
Introduction to Scikit-learn in Python
Train Logistic Regression in python
Multiclass using Logistic Regression
How to use Multinomial and Ordinal Logistic Regression in R ?
Challenges with Linear Regression
Introduction to Regularisation
Implementing Regularisation
Ridge Regression
Lasso Regression

Introduction to Stacking
Implementing Stacking
Variants of Stacking
Implementing Variants of Stacking
Introduction to Blending
Bootstrap Sampling
Introduction to Random Sampling
Hyper-parameters of Random Forest
Implementing Random Forest
Out-of-Bag (OOB) Score in the Random Forest
IPL Team Win Prediction Project Using Machine Learning
Introduction to Boosting
Gradient Boosting Algorithm
Math behind GBM
Implementing GBM in python
Regularized Greedy Forests
Extreme Gradient Boosting
Implementing XGBM in python
Tuning Hyperparameters of XGBoost in Python
Implement XGBM in R/H2O
Adaptive Boosting
Implementing Adaptive Boosing
LightGBM
Implementing LightGBM in Python
Catboost
Implementing Catboost in Python

Introduction to Clustering
Applications of Clustering
Evaluation Metrics for Clustering
Understanding K-Means
Implementation of K-Means in Python
Implementation of K-Means in R
Choosing Right Value for K
Profiling Market Segments using K-Means Clustering
Hierarchical Clustering
Implementation of Hierarchial Clustering
DBSCAN
Defining Similarity between clusters
Build Better and Accurate Clusters with Gaussian Mixture Models

Introduction to Machine Learning Interpretability
Framework and Interpretable Models
model Agnostic Methods for Interpretability
Implementing Interpretable Model
Understanding SHAP
Out-of-Core ML
Introduction to Interpretable Machine Learning Models
Model Agnostic Methods for Interpretability
Game Theory & Shapley Values

Deploying Machine Learning Model using Streamlit
Deploying ML Models in Docker
Deploy Using Streamlit
Deploy on Heroku
Deploy Using Netlify
Introduction to Amazon Sagemaker
Setting up Amazon SageMaker
Using SageMaker Endpoint to Generate Inference
Deploy on Microsoft Azure Cloud
Introduction to Flask for Model
Deploying ML model using Flask

This was a very insightful article. Kindly send me more of them. Am trying to research on the use of logistic regression to predict the spread of Corona virus.

Useful information and the analysis was pretty good (survival analysis )