4 Use Cases All Data Scientist Should Learn
This article was published as a part of the Data Science Blogathon
Illustrations of how to address traditional machine learning algorithm queries.
- Credit Card Fraud Detection
- Customer Segmentation
- Customer Churn Prediction
- Sales Forecasting
If you are an authorized data scientist, you may have observed any of these problems previously. However, if you are comparatively new, these use cases can prepare different data science concepts that you folks can apply beyond multiple industries.
Regrettably, data science problems usually are not well-developed so swiftly at companies. Alternatively, the use case will evolve over several conflicts depending on the necessities and expectations of the plan.
It is necessary to provide insight into prevailing use cases that can be squeezed and applied to more innovative use cases. Sometimes, you will confront entirely new situations not printed about in articles or examined at universities.
However, the charm of data science is that it is scalable and appropriate across diverse problems with a comparatively low amount of effort.
Let’s explore four use cases you can each apply straight to your job or squeeze to use for later applications — including potential characteristics of the model, as well as the algorithm practised itself.
UseCase#1-Credit Card Fraud Detection
In this case, we would be formulating a supervised model to categorize it into either fraud or no fraud. Ideally, you would have a good quantity of examples of what noise does and does not seem like in your data.
The following step is to acquire or create several characteristics that explain what a scam looks like and suspected behavior, so the algorithm can efficiently discern among the two labels.
Here are desirable points you could practice in your Random Forest algorithm:
- monetary amount
- transaction information
- transaction class
Here is an example code to use:
#after extraction the train and test dataset rf = RandomForestClassifier() rf.fit(X_train, y_train) pred = rf.predict(X_test)
You can originate with a few characteristics and strengthen new features, such as sums or per traits (ex: money spent/day, etc.)
As opposed to the above illustration, this situation would use unsupervised learning, preferably than classification, to use clustering.
A conventional clustering algorithm would be K-Means. This problem is unsupervised because you do not own labels, and you would not understand what to group, but you would desire to find patterns of new combinations based on their shared points.
In this example, the particular purpose of using this model is to find patterns about somebody who buys specific products.
That way, you can build a targeted marketing campaign nominated just for these consumers.
Here are desirable features you could practice in your K-Means algorithm:
- products purchased
- their position
- product or retailer location
- spending rate
- product manufacturers
Here is a sample code to practice:
#after extracting data and features km = KMeans( init="random", n_clusters=6 ) km.fit(X) preds = km.fit_predict(X)
This algorithm is often practiced in the e-commerce industry, marketing, and anywhere with consumer data and marketing — management.
UseCase#3-Customer Churn Prediction
This scenario could profit from a family of machine learning algorithms. This query is also comparable to the credit card fraud detection query. We want to collect features about the consumers with a predefined label, precisely churn or no-churn.
You can practice Random Forest again or a complex algorithm, for illustration, XGBoost. This situation is, accordingly, a classification problem, which is practicing supervised learning.
We will be prognosticating customer churn for users on a website to purchase a product or many products.
Here are desirable characteristics you could employ in your XGBoost algorithm:
- login measure
- date highlights (month, day, etc.)
- product records
- product heterogeneity
- the extent of product use
- regularity of product use
- login time
- amount customer emailed consumer service
- amount client conversed with a chatbot
- if they mentioned the product
These characteristics can designate if someone is more prominent of a life-long user versus a short-time. Unique features like referral will undoubtedly prove if they like the output.
Product diversity could go each way in the classification if they ordered four separate products but did or did not apply them added times.
Here is sample code to execute once you have your inputs and features ready:
xgb = XGBClassifier() xgb.fit(X_train, y_train) pred = xgb.predict(X_test)
Possibly the most diverse from the preceding three use cases are forecasting transactions. In this sample, we can use deep learning to predict future purchases of a commodity.
The algorithm used is named LSTM, which is for Long Short-Term Memory.
Here are desirable points you could practice in your LSTM algorithm:
- sales outlay
Here is the execution of code to use with your input data and features:
lstm= Sequential() lstm.add(LSTM(4, batch_input_shape=(1, X_train.shape, X_train.shape))) lstm.add(Dense(1)) lstm.compile(loss='mean_squared_error') lstm.fit(X_train, y_train) preds = lstm.predict(X_test)
This article conferred everyday use cases with conventional algorithms that comprise different problems using data science. For instance, we looked at:
- Credit Card Fraud Detection — using Random Forest
- Customer Segmentation — using K-Means
- Customer Churn Prediction — using XGBoost
- Sales Forecasting — using LSTM
I hope you noticed my article both exciting and relevant. Please feel open to comment below if you employed machine learning algorithms for those use cases.