Concept Learning: Key to Better and Faster Decisions!

Mbali Kalirane 16 Mar, 2023

12 min read

Introduction

You’ve probably wondered how banks are able to decide precisely who gets a loan and who doesn’t. Despite dealing with thousands of different applicants, their systems are always equipped to make the appropriate decisions. Not only that, but they can make those decisions very quickly and accurately.

By evaluating the characteristics of loan applicants, banks can decide precisely who they accept and reject. But how are they able to do this for every customer, given that each customer is different?

Well, to be able to classify each customer, banks rely on a general rule. And to the foundation of this general rule is concept learning.

Concept learning plays an important part in many decision-making systems in the world today. Concept learning lays the foundation for many of the decision-making applications we use today, whether taking a medical test, using our credit card or taking out a bank loan. The importance of concept learning lies in its ability to help systems make decisions intelligently. It helps systems make quick and accurate decisions without being explicitly programmed for each scenario.

But how is concept learning able to do all of this?

Well, follow me in this blog, as I’ll help you understand how concept learning works. I will explain the basics of concept learning, how it works, and how it can be applied in the real world. And by the end of this article, you’ll have a more practical understanding of concept learning.

Learning Objectives:

Understand what Concept Learning is, how it works, and how it can be applied in the real world.
Learn about how concept learning relates to feature spaces
Learn about different types of hypotheses in concept learning
Understand how concept learning is applied to the Find-S Algorithm

This article was published as a part of the Data Science Blogathon.

What is Concept Learning?

Concept learning is the task of inferring a Boolean-valued function from a set of training examples. The purpose of inferring this function is to use it as a general rule for classifying unseen data.

Concept learning is based on a type of learning called inductive learning. In inductive learning, the learner learns by example. In other words, the learner discovers the rules of a particular concept by learning the examples of that concept. For example, if a student teaches himself algebra, the more he practices different types of examples and solutions, the more he will understand the general rules of algebra. The idea is the same for concept learning: a machine is taught the different examples of a concept, and by learning these examples, it will “discover” the general rule(s) that apply to that concept.

Concept learning thus involves learning a function (which is a rule) from a set of training examples.

What is the Aim of Concept Learning?

So, with all this being said, concept learning aims to find a function or rule that truly represents the particular concept being learned. The function must be a true representation of the concept so that it can be able to make accurate classifications of unseen data. By “true representation”, it means that the function must be able to approximate the true value of a target concept. The target concept refers to what we’re trying to classify. A Boolean-valued function, denoted c(x), can take on two or more possible categories. The aim is generally to determine the category of the target concept that a certain object belongs to.

According to the Inductive Learning Hypothesis, if a function can approximate the target concept well enough over training examples, then it will be able to approximate the target concept well for unseen examples.

For example, suppose an algebra learner has gained an understanding of the general rules of algebra based on the examples they’ve practiced. In that case, they’ll be able to apply those rules to solve any new problems that they encounter. Similarly, in concept learning, an inferred function will be able to approximate and classify new data based on how well it has learned in the past.

How Concept Learning Works?

Concept learning works in two ways. It works by:

Inferring a function from a set of training examples.
Searching to find the function that best fits the training examples.

How Concept Learning Infers a Function?

Let’s go back to the bank loan example.

Suppose that the bank wants to classify customers according to five features:

Gender
Age
Income
Dependents
Loan Amount

And depending on these features, each applicant will be classified into one of two categories: Loan Approved or Not Approved.

To do this, the bank needs to use a training set consisting of example loan applicants. This training set will be used to infer a function that will act as a decision rule for classifying the applicants.

Let’s consider the training sample shown below:

Each row in the training set represents a single applicant (known as an instance). Each applicant has five features with differing values, and has the possible outcomes, “Yes” or “No,” depending on the values of those features.

Each customer is either a negative or positive example. The applicants whose loan application is accepted are the positive examples, while the applicants whose loan application is rejected are the negative examples.

These examples are determined according to the preferences and needs of the bank. The bank may have determined that the best customers have a combination of certain features.

The positive examples are the examples of applicants that the bank has deemed acceptable for awarding a loan. These applicants have a combination of features that have been determined as desirable to the bank in terms of the applicant being able to repay the loan without much trouble.

The negative examples are the examples of applicants that the bank deems unacceptable for awarding a loan. These applicants have a combination of features that the bank sees as undesirable. These are applicants that the bank deems will have difficulty repaying the loan.

A Visual Representation of the Training Set: The Feature Space

The five features: gender, age group, income, dependents, and loan amount together make up what’s called the feature space. A feature space is a space containing a collection of our features, and it’s used for categorizing our data. The size of a feature space depends on the number of features in the training set. For example, if there are two features, the feature space will be two-dimensional. In our case, there are five features, so our feature space is five-dimensional.

The feature space can be thought of as a visual representation of the training set. It shows how the data is classified relative to its features. However, the larger the dimension of the feature space, the more difficult it is to visualize.

So, to better visualize how concept learning works, let’s suppose that the bank is classifying customers according to only two features: age and income. Then we have a two-dimensional feature space as shown below:

From this feature space, we can see a good visual representation of the training set. It shows where each variable lives. Here, we have two classes: Loan Approved and Not Approved, and two features: age and income. We can see how the individual applicants are classified into each class concerning their features.

Inferring a Decision Boundary from the Feature Space

Notice that there exists a pattern between the two classes of the feature space. Concept learning infers a function depending on the pattern in the data between the two classes. In other words, concept learning involves learning the pattern in the data and creating a function based on what has been learned. This function acts as a decision boundary that distinguishes between the two classes of data. It is what is used to approximate the target concept.

For example, consider the decision boundary separating the two classes:

Conditions for Effective Concept Learning

Good Feature Space

The feature space that we are using is what we can call a good feature space. This is because the classes are as separate as possible, making the decision boundary easy to learn. The classes need to be as separate as possible for the decision boundary to be learned properly. If the classes are highly mixed or overlapping, this is an example of a poor feature space, and the decision boundary won’t be learned adequately enough.

Generalized Decision Boundary

A poor feature space will likely result in the decision boundary either overfitting or underfitting. An ideal feature space ensures that the decision boundary can generalize.

How Concept Learning Searches for the Best Function?

Concept learning can also be viewed as a search, where the goal is to find the function that best fits the training set.

In this case, concept learning aims to find a generalized decision boundary. But multiple possible generalized decision boundaries exist, so it aims to find the best-generalized decision boundary.

To find the best-generalized decision boundary, we have to search through a space of multiple generalized decision boundaries. This space is called a hypothesis space. And a hypothesis refers to a single possible decision boundary.

Representation of Hypotheses

Each hypothesis in the hypothesis space depends on a certain number of features. In our case, each hypothesis depends on two features. Each feature has a value associated with it, and each value is represented using a constraint. A constraint is an indication of the importance of each feature in each hypothesis. There are three types of constraints to represent the values of each feature. These are the single-value constraint, the specific constraint, and the general constraint.

The single-value constraint. This constraint represents the fixed value associated with a particular feature. Only that value is acceptable. These types of constraints affect the final outcome. For example, age = “20”.
The general constraint, denoted by: “?”. The general constraint means that any value is acceptable for a particular feature. As a result, the value of the constraint has no effect on the final outcome. For example, if the applicant’s gender is irrelevant in being awarded a loan, gender=”?”.
The specific constraint is denoted by “0”. The specific constraint means that no value is acceptable for a particular feature. For example, if the bank doesn’t want the applicant to have any debt, then debt=”0”.

A hypothesis is often represented as a vector of constraints. For example, suppose that the bank prefers applicants older than 18, with an income greater than 4000, regardless of the applicant’s gender, the loan amount, or the number of dependents. This hypothesis would be represented as:

< “?”, >18, “>4K”, “?”, “?”>

It is a vector of constraints, where the constraints can be interpreted as follows:

The gender of the applicant doesn’t matter, and so gender is represented using a general constraint. Having a general constraint means that applicants of any gender are a positive example. The same applies to the loan amount and the number of dependents.
Income has to be greater than 4000, and age has to be greater than 18 for the applicant to be accepted. Income and age are thus represented as single-value constraints. A single-value constraint means that only applicants of age greater than 18 or income higher than 4000 are positive examples.

Two more types of hypotheses are used in searching for the best hypothesis. These are the most general hypothesis and the most specific hypothesis.

The Most General Hypothesis is named as such because it uses the general constraint for every feature in the hypothesis. The most general hypothesis is thus denoted as follows:

< “?”, “?”, “?”, “?”, “?”>

This hypothesis implies that any value is acceptable for any feature and that each applicant is a positive example.

The Most Specific Hypothesis uses the specific constraint for every feature in the hypothesis. The most specific hypothesis is represented as follows:

< “0”, “0”, “0”, “0”, “0”>

This hypothesis implies that none of the features’ values are acceptable and that none of the applicants is a positive example.

Finding the Best Hypothesis

There are many methods for searching for the best hypothesis. One such method is the Find-S method. The Find-S algorithm helps search for the best hypothesis (called the maximally specific hypothesis). The idea behind this method is to compare feature values in the most specific hypothesis to those of each positive example in the training set.

The algorithm starts by searching for each positive example. What it’s looking for is whether the value of the feature in the positive example is the same as the corresponding value of the feature in the hypothesis. If the values are the same, then the algorithm will move on to the next positive example. If the values are different, then the algorithm will change the value of the feature in the most specific hypothesis to that of a general constraint, “?”. The algorithm continues this process until it reaches the last positive example in the training set. Then this leads to the maximally specific constraint.

For a more practical example, let’s look at the steps of the algorithm.

Steps of the Find-S Algorithm

Let h be the most specific hypothesis.
For each positive example in the training set and each feature in the examples, if the value of the feature equals the corresponding value of the most specific hypothesis, do nothing.
Otherwise, replace the hypothesis value with the general constraint, “?”.

Example of the Find-S Algorithm

Let’s consider another example.

Suppose that you want to play a sport and want to decide on which day you enjoy the sport. Each day has six features, sky, temperature, humidity, wind, water, and forecast, as shown in the training set:

To start with the Find-S algorithm, choose any positive example in the training dataset and initialize it as the most specific hypothesis (Let’s choose row 1):

Compare the values of the features to the first positive training example in row 1:

The values of the training example and the most specific hypothesis are the same, so we do nothing.

We move on to the next positive training example (in row 2), and compare it to the most specific hypothesis:

The values of the features for humidity are different, so we replace the feature in S1 with “?”. So now we have:

Row 3 is a negative example, so we ignore it. We then move on to the next positive training example (in row 4), and compare it to the most specific hypothesis, S2:

The values of the features for water are different, so we replace the feature in S2 with “?”. So now we have:

So now we have reached the last positive example, and we have the maximally specific hypothesis: <Sunny, Warm, ?, Strong, ?, Same>

How is Concept Learning Applied to the Real World?

As seen with the bank loan example, concept learning plays an important part in automated decision-making. Concept learning answers many business questions and enables organizations to take appropriate steps in their business. It helps organizations make quick and accurate classifications with large amounts of data.

In addition to bank loans, some other applications of concept learning are:

Spam Filtering

Answers the question: Is this email spam or not spam?
It uses an email’s characteristics to determine whether the email will be in spam or not.

Customer Purchasing

Answers the question: Is the customer likely to buy or not buy?
A customer’s characteristics, such as past purchase behavior, are used to determine whether they will buy or not.

School Admissions

Answers the question: Is the student likely to pass or fail?
A student’s characteristics, such as academic scores, determine whether they will pass or fail.

Medical Diagnoses

Answers the question: Is the patient likely to have the disease or not?
A patient’s characteristics, such as symptoms, age, and medical history, are used to determine whether they have a certain disease or not.

Summary of the Concept Learning Task

The training examples, denoted D, are the set of positive and negative examples of the target function.

Each training example in the training set is referred to as an instance, denoted X.

Each training example has features or attributes, such as (Gender, Age, Income, and Dependents, Loan Amount).

Each training example is associated with a target concept, c(x). The target concept is the function we are searching for.It is a Boolean-valued function, for example, Award Loan: X > {0,1}

The hypothesis, h, is a vector of attributes or features.

A constraint represents each feature in the hypothesis.

“?” – the general constraint – any value is acceptable.
“0” – the specific constraint – no value is acceptable.
The single value constraint – a specific fixed value

The types of hypotheses are represented as

< “?”, “?”, “?”, “?”, “?”, “?”,> – most general hypothesis
< “0”, “0”, “0”, “0”, “0”, “0”,> – most specific hypothesis

Conclusion

In conclusion, concept learning provides an efficient way of extracting knowledge from data. It helps machines quickly and accurately learn tasks from large amounts of data, with little human intervention required. Because of this, concept learning is an essential part of many real-world operations and processes, where accuracy, speed, and cost reduction are important. It is thus extensively used in the decision-making processes of many businesses and organizations that work with copious amounts of data.

Key Takeaways

Concept learning is the task of inferring (learning) a Boolean-valued function from a dataset to classify unseen data.
Concept learning aims to find a function (h) that approximates the target concept c(x).
Concept learning involves learning the pattern in the data and creating a function based on what has been learned.
Concept learning can also be viewed as a search, searching for the hypothesis that most satisfies all training examples.
Concept learning is a fundamental part of many automated decision-making learning processes.

If you have any questions, feel free to contact me on LinkedIn.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.