MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 months

Type: HTTP

Build a Decision Tree in Minutes using Weka (No Coding Required!)

Aniruddha Bhandari 22 Aug, 2023

9 min read

Learn how to build a decision tree model using Weka
This tutorial is perfect for newcomers to machine learning and decision trees, and those folks who are not comfortable with coding

Introduction

“The greater the obstacle, the more glory in overcoming it.”

– Moliere

Machine learning can be intimidating for folks coming from a non-technical background. All machine learning jobs seem to require a healthy understanding of Python (or R).

So how do non-programmers gain coding experience? It’s not a cakewalk!

Here’s the good news – there are plenty of tools out there that let us perform machine learning tasks without having to code. You can easily build algorithms like decision trees from scratch in a beautiful graphical interface. Isn’t that the dream? These tools, such as Weka, help us primarily deal with two things:

Quickly build a machine learning model, like a decision tree, and understand how the algorithm is performing. This can later be modified and built upon
This is ideal for showing the client/your leadership team what you’re working with

This article will show you how to solve classification and regression problems using Decision Trees in Weka without any prior programming knowledge!

But if you are passionate about getting your hands dirty with programming and machine learning, I suggest going through the following wonderfully curated courses:

Introduction
Classification vs. Regression in Machine Learning
Understanding Decision Trees
What is Weka? Why Should You Use Weka for Machine Learning?
Exploring the Dataset in Weka
Classification using Decision Tree in Weka
Decision Tree Parameters in Weka
Visualizing your Decision Tree in Weka
Regression using Decision Tree in Weka
Frequently Asked Questions
End Notes

Classification vs. Regression in Machine Learning

Let me first quickly summarize what classification and regression are in the context of machine learning. It’s important to know these concepts before you dive into decision trees.

A classification problem is about teaching your machine learning model how to categorize a data value into one of many classes. It does this by learning the characteristics of each type of class. For example, to predict whether an image is of a cat or dog, the model learns the characteristics of the dog and cat on training data.

A regression problem is about teaching your machine learning model how to predict the future value of a continuous quantity. It does this by learning the pattern of the quantity in the past affected by different variables. For example, a model trying to predict the future share price of a company is a regression problem.

You can find both these problems in abundance on our DataHack platform.

Now, let’s learn about an algorithm that solves both problems – decision trees!

Understanding Decision Trees

Decision trees are also known as Classification And Regression Trees (CART). They work by learning answers to a hierarchy of if/else questions leading to a decision. These questions form a tree-like structure, and hence the name.

For example, let’s say we want to predict whether a person will order food or not. We can visualize the following decision tree for this:

Each node in the tree represents a question derived from the features present in your dataset. Your dataset is split based on these questions until the maximum depth of the tree is reached. The last node does not ask a question but represents which class the value belongs to.

The topmost node in the Decision tree is called the Root node

The bottom-most node is called the Leaf node

A node divided into sub-nodes is called a Parent node. The sub-nodes are called Child nodes

If you want to understand decision trees in detail, I suggest going through the below resources:

What is Weka? Why Should You Use Weka for Machine Learning?

” Weka is a free open-source software with a range of built-in machine learning algorithms that you can access through a graphical user interface! “

WEKA stands for Waikato Environment for Knowledge Analysis and was developed at the University of Waikato, New Zealand.

Weka has multiple built-in functions for implementing a wide range of machine learning algorithms from linear regression to neural network. This allows you to deploy the most complex of algorithms on your dataset at just a click of a button! Not only this, Weka gives support for accessing some of the most common machine learning library algorithms of Python and R!

With Weka you can preprocess the data, classify the data, cluster the data and even visualize the data! This you can do on different formats of data files like ARFF, CSV, C4.5, and JSON. Weka even allows you to add filters to your dataset through which you can normalize your data, standardize it, interchange features between nominal and numeric values, and what not!

I could go on about the wonder that is Weka, but for the scope of this article let’s try and explore Weka practically by creating a Decision tree. Now go ahead and download Weka from their official website!

Exploring the Dataset in Weka

I will take the Breast Cancer dataset from the UCI Machine Learning Repository. I recommend you read about the problem before moving forward.

Let us first load the dataset in Weka. To do that, follow the below steps:

Open Weka GUI
Select the “Explorer” option.
Select “Open file” and choose your dataset.

Your Weka window should now look like this:

You can view all the features in your dataset on the left-hand side. Weka automatically creates plots for your features which you will notice as you navigate through your features.

You can even view all the plots together if you click on the “Visualize All” button.

Now let’s train our classification model!

Classification using Decision Tree in Weka

Implementing a decision tree in Weka is pretty straightforward. Just complete the following steps:

Click on the “Classify” tab on the top
Click the “Choose” button
From the drop-down list, select “trees” which will open all the tree algorithms
Finally, select the “RepTree” decision tree

” Reduced Error Pruning Tree (RepTree) is a fast decision tree learner that builds a decision/regression tree using information gain as the splitting criterion, and prunes it using reduced error pruning algorithm.”

You can read about the reduced error pruning technique in this research paper.

“Decision tree splits the nodes on all available variables and then selects the split which results in the most homogeneous sub-nodes.”

Information Gain is used to calculate the homogeneity of the sample at a split.

You can select your target feature from the drop-down just above the “Start” button. If you don’t do that, WEKA automatically selects the last feature as the target for you.

The “Percentage split” specifies how much of your data you want to keep for training the classifier. The rest of the data is used during the testing phase to calculate the accuracy of the model.

With “Cross-validation Fold” you can create multiple samples (or folds) from the training dataset. If you decide to create N folds, then the model is iteratively run N times. And each time one of the folds is held back for validation while the remaining N-1 folds are used for training the model. The result of all the folds is averaged to give the result of cross-validation.

The greater the number of cross-validation folds you use, the better your model will become. This makes the model train on randomly selected data which makes it more robust.

Finally, press the “Start” button for the classifier to do its magic!

Our classifier has got an accuracy of 92.4%. Weka even prints the Confusion matrix for you which gives different metrics. You can study about Confusion matrix and other metrics in detail here.

Decision Tree Parameters in Weka

Decision trees have a lot of parameters. We can tune these to improve our model’s overall performance. This is where a working knowledge of decision trees really plays a crucial role.

You can access these parameters by clicking on your decision tree algorithm on top:

Let’s briefly talk about the main parameters:

maxDepth – It determines the maximum depth of your decision tree. By default, it is -1 which means the algorithm will automatically control the depth. But you can manually tweak this value to get the best results on your data
noPruning – Pruning means to automatically cut back on a leaf node that does not contain much information. This keeps the decision tree simple and easy to interpret
numFolds – The specified number of folds of data will be used for pruning the decision tree. The rest will be used for growing the rules
minNum – Minimum number of instances per leaf. If not mentioned, the tree will keep splitting till all leaf nodes have only one class associated with it

You can always experiment with different values for these parameters to get the best accuracy on your dataset.

Visualizing your Decision Tree in Weka

Weka even allows you to easily visualize the decision tree built on your dataset:

Go to the “Result list” section and right-click on your trained algorithm
Choose the “Visualise tree” option

Your decision tree will look like below:

Interpreting these values can be a bit intimidating but it’s actually pretty easy once you get the hang of it.

The values on the lines joining nodes represent the splitting criteria based on the values in the parent node feature
In the leaf node:
- The value before the parenthesis denotes the classification value
- The first value in the first parenthesis is the total number of instances from the training set in that leaf. The second value is the number of instances incorrectly classified in that leaf
- The first value in the second parenthesis is the total number of instances from the pruning set in that leaf. The second value is the number of instances incorrectly classified in that leaf

Regression using Decision Tree in Weka

Like I said before, Decision trees are so versatile that they can work on classification as well as on regression problems. For this, I will use the “Predict the number of upvotes” problem from Analytics Vidhya’s DataHack platform.

Here, we need to predict the rating of a question asked by a user on a question and answer platform.

As usual, we’ll start by loading the data file. But this time, the data also contains an “ID” column for each user in the dataset. This would not be useful in the prediction. So, we will remove this column by selecting the “Remove” option underneath the column names:

We can make predictions on the dataset as we did for the Breast Cancer problem. RepTree will automatically detect the regression problem:

The evaluation metric provided in the hackathon is the RMSE score. We can see that the model has a very poor RMSE without any feature engineering. This is where you step in – go ahead, experiment and boost the final model!

Frequently Asked Questions

Q1. How to use Weka for decision tree?

A. To use Weka for decision trees:

1. Load your dataset (ARFF format).
2. Choose “Explorer” interface.
3. Select “Classify” tab, pick “J48” (C4.5 algorithm).
4. Load data, set target attribute.
5. Click “Start” to build tree.
6. Evaluate results using “Classifier output.”

Tweak settings like pruning, confidence factor, etc. for better results. Weka simplifies decision tree creation and analysis for data mining tasks.

Q2. What is J48 decision tree in Weka?

A. J48, implemented in Weka, is a popular decision tree algorithm based on the C4.5 algorithm. It creates decision trees by recursively partitioning data based on attribute values. J48 employs information gain or gain ratio to select the best attribute for splitting. It handles categorical and numeric attributes, supports pruning to prevent overfitting, and is widely used for classification tasks due to its simplicity and effectiveness.

End Notes

And just like that, you have created a Decision tree model without having to do any programming! This will go a long way in your quest to master the working of machine learning models.

If you want to learn and explore the programming part of machine learning, I highly suggest going through these wonderfully curated courses on the Analytics Vidhya website:

Aniruddha Bhandari 22 Aug, 2023

Beginner Classification Machine Learning Regression Structured Data

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Learn Retrieval-Augmented Generation (RAG): learn how it works, the RAG framework, and use LlamaIndex for advanced systems.

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

Google (11)

_gcl_au

SID

SAPISID

__Secure-#

APISID

SSID

HSID

DV

NID

1P_JAR

OTZ

Facebook (2)

_fbp

fr

LinkedIn (6)

bscookie

lidc

bcookie

aam_uuid

UserMatchHistory

li_sugr

Microsoft (2)

MR

ANONCHK

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables