Shanthababu Pandian — December 16, 2020
Beginner Machine Learning

This article was published as a part of the Data Science Blogathon.

What is Machine Learning?

Machine Learning: Machine Learning (ML) is a highly iterative process and ML models are learned from past experiences and also to analyze the historical data. On top, ML models are able to identify the patterns in order to make predictions about the future of the given dataset.

 

machine learning testimonials

 

Why is Machine Learning Important?

Since 5V’s are dominating the current digital world (Volume, Variety, Variation Visibility, and Value), so most of the industries are developing various models for analyzing their presence and opportunities in the market, based on this outcome they are delivering the best products, services to their customers on vast scales.

machine learning 5Vs

 

What are the major Machine Learning applications?

Machine learning (ML) is widely applicable in many industries and its processes implementation and improvements. Currently, ML has been used in multiple fields and industries with no boundaries. The figure below represents the area where ML is playing a vital role.

machine Learning Applications

 

Where is Machine Learning in the AI space?

Just have a look at the Venn Diagram, we could understand where the ML in the AI space and how it is related to other AI components.

As we know the Jargons flying around us, let’s quickly look at what exactly each component talks about.

 

machine Learning in AI space

 

How Data Science and ML are related?

 

realtion with Data science

Machine Learning Process, is the first step in ML process to take the data from multiple sources and followed by a fine-tuned process of data, this data would be the feed for ML algorithms based on the problem statement, like predictive, classification and other models which are available in the space of ML world. Let us discuss each process one by one here.

machine learning process

Machine Learning – Stages: We can split ML process stages into 5 as below mentioned in the flow diagram.

  1. Collection of Data
  2. Data Wrangling
  3. Model Building
  4. Model Evaluation
  5. Model Deployment

Identifying the Business Problems, before we go to the above stages. So, we must be clear about the objective of the purpose of ML implementation. To find the solution for the given/identified problem. we must collect the data and follow up the below stages appropriately.

Identify the stages

 

Collection of Data

Data collection from different sources could be internal and/or external to satisfy the business requirements/problems. Data could be in any format. CSV, XML.JSON, etc., here Big Data is playing a vital role to make sure the right data is in the expected format and structure.

data collection

Data Wrangling and Data Processing: The main objective of this stage and focus are as below.

 

Data Processing (EDA):

  1. Understanding the given dataset and helping clean up the given dataset.
  2. It gives you a better understanding of the features and the relationships between them
  3. Extracting essential variables and leaving behind/removing non-essential variables.
  4. Handling Missing values or human error.
  5. Identifying outliers.
  6. The EDA process would be maximizing insights of a dataset.

 

Feature engineering:

  1. Handling missing values in the variables
  2. Convert categorical into numerical since most algorithms need numerical features.
  3. Need to correct not Gaussian(normal). linear models assume the variables have Gaussian distribution.
  4. Finding Outliers are present in the data, so we either truncate the data above a threshold or transform the data using log transformation.
  5. Scale the features. This is required to give equal importance to all the features, and not more to the one whose value is larger.
  6. Feature engineering is an expensive and time-consuming process.
  7. Feature engineering can be a manual process, it can be automated
machine learning steps

Training and Testing:

  1. The training data is used to make sure the machine recognizes patterns of the data, cross-validation of data is used to ensure better accuracy and
    the efficiency of the algorithm which is used to train the machine.
  2. Test data is used to see how well the machine can predict new answers based on its training.
  3. The train-test split procedure is used to estimate the ML performance of algorithms when they are used to make predictions on data that is not
    used to train the model.
train and test

 

Training

  1. Training data is the data set on which you train the model.
  2. Train data from which the model has learned the experiences.
  3. Training sets are used to fit and tune your models.

Testing

  1. Test data is the data which is used to check if the model has
    learnt good enough from the experiences it got in the train data set.
  2. Test sets
    are “unseen” data to evaluate your models.

Train data: It trains our machine learning algorithm
Test data: After the training the model, test data is used to test its efficiency and performance of the model

The purpose of the random state in train test split: Random state ensures that the splits that you generate are reproducible. The random state that you provide is used as a seed to the random number generator. This ensures that the random numbers are generated in the same order.

data split

 

Data Split into Training/Testing Set

  1. We used to split a dataset into training data and test data in the machine learning space.
  2. The split range is usually 20%-80% between testing and training stages from the given data set.
  3. A major amount of data would be spent on to train your model
  4. The rest of the amount can be spent to evaluate your test model.
  5. But you cannot mix/reuse the same data for both Train and Test purposes
  6. If you evaluate your model on the same data you used to train it, your model could be very overfitted. Then there is a question of whether models can predict new data.
  7. Therefore, you should have separate training and test subsets of your dataset.

MODEL EVALUATION: Each model has its own model evaluation mythology, some of the best evaluations are here.  

  1. Evaluating the Regression Model.
    1. Sum of Squared Error (SSE)
    2. Mean Squared Error (MSE)
    3. Root Mean Squared Error (RMSE)
    4. Mean Absolute Error (MAE)
    5. Coefficient of Determination (R2)
    6. Adjusted R2
  2. Evaluating Classification Model.
    1. Confusion Matrix.
    2. Accuracy Score.
    3. AUC and ROC.|

Deployment of an ML-model simply means the integration of the finalized model into a production environment and getting results to make business decisions.

Deployment

So, Hope you are able to understand the Machine Learning end-to-end process flow and I believe it would be useful for you, Thanks for your time.

Leave a Reply Your email address will not be published. Required fields are marked *