- Need of data in machine learning
Introduction to Drawdata
- Importing important modules
- Draw varieties of plots
- End Notes
“ Data is the new oil ”
— Clive Humby
When we say data is everything, will it? if you belonging to the technical field then you definitely said “yes”, because if there is a river then there is always find water. Worlds need more data to perform fundamental tasks. Let’s take a brief intro on data:
Data can be defined as the units of information, either it was numeric or collected from any type of observation. in simple words, we can say that the collection of facts, numbers, measurements, or observations can refer as data. Most of us are confused about data and information, let’s solve this data is an individual unit but the information is the collection or group of data.
We are belonging to the data science community and we definitely knew the meaning of what Machine Learning is, But you know why there is a need for data to create a dataset for training ML models. If not then will discuss-
Need of data in machine learning
We know that machine learning data analysis uses algorithms to continuously improve itself over time, but it is quite important that the quality of data is also necessary for performing algorithms on ML models for better accuracy.
” You just need to understand the data to truly understand how machine learning works “
For building the machine learning model we need a collection of data, for that we create a group of data called a dataset. Datasets are the collection of cases that all share a common attribute. Creating the ML models helps to understand between data points.
In any data science project life cycle, you probably notice EDA, feature selection, model building but you unnoticed how data will create, we generally used a dataset that is already created. we do not probably want to use so many complex datasets for machine learning models, rather then we want to use a simple dataset. Have you have any knowledge on how data is generated, if not then don’t worry we will discuss below:
This is a beautiful open-source python library that allows generating data by just drawing it, In simple words, we can say that we only have to draw data, and then it will generate automatically. It is very easy to use and also has a user-friendly environment. It allows users to create varieties of datasets which have different shapes and sizes which is used for machine learning model.
This library is only used in the jupyter notebook, so let’s discuss the techniques of Drawdata here below:
Using Drawdata library into jupyter you need to install it first:
We install two libraries drawdata and pandas, you only need to execute the following code in the command prompt:
pip install drawdata pip install pandas
Importing important modules
Import pandas as pd form drawdata import draw_scatter from drawdata import draw_line from drawdata import draw_histogram
After importing the library we will now take understanding how data is being drawn:
Draw varieties of plots
1. Scatter draw
scatter_plot = draw_scatter() scatter_plot
In this Gif, you can see that there are certain options at the top which is used to draw the scatter pattern into the draw section.
2. Line draw
line_plot = draw_line() line_plot
As here we draw lines of different colors to represent the variety of data into the dataset.
3. Histogram draw
hist_draw = draw_histogram() hist_draw
In the above image, there is a variety of data in the form of a histogram, by this, we can draw fake data to visualize histogram.
This is the way to create different datasets using the library drawdata, but how we save this data into a dataframe to create a machine learning model. see below:
You have to click on the above options which formate you wish to store data, see below:
After this, you can use pandas to read the clipboard to get your drawn data into a dataframe.
df = pd.read_clipboard(sep=",") df
So, this is the resultant dataset that we have created using the drawdata library.
This article was suggested by my friend, it was pretty much an amazing library that I have an interest in. Now it’s time to build your own dataset by drawing.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.