An Introduction to Particle Swarm Optimization (PSO) Algorithm
- PSO is a stochastic optimization technique based on the movement and intelligence of swarms.
- In PSO, the concept of social interaction is used for solving a problem.
- It uses a number of particles (agents) that constitute a swarm moving around in the search space, looking for the best solution.
- Each particle in the swarm looks for its positional coordinates in the solution space, which are associated with the best solution that has been achieved so far by that particle. It is known as pbest or personal best.
- Another best value known as gbest or global best is tracked by the PSO. This is the best possible value obtained so far by any particle in the neighborhood of that particle.
- Group Optimization and Ensemble Learning
- The problem of optimization
- The mathematical formulation of an Optimization Problem
- An Intuition of PSO
- Particle Swarm Optimization Algorithm
- Analysis of the Algorithm
- Neighborhood Topologies
- Types of PSO
- Contour plot
- Difference between PSO and Genetic Algorithm
- Advantages and disadvantages of PSO
Introduction to Particle Swarm Optimization
I am sure each one of us in our lifetime has heard from our well-wisher’s, “Be with good company. It helps you to cultivate good quality.” When we speak about a ‘good company,’ we discuss the unequal distribution of good qualities among group members to achieve a better common goal. It is the reason we always say ‘Work as a Team.’ Particle Swarm Optimization(PSO) Algorithm is based on that. In 1995, Kennedy and Eberhart wrote a research paper based on the social behavior of animal groups, where they had stated that sharing information among the group increases survival advantage. Like while a bird searching for food randomly can optimize her searching if she works with the flock. The advantage of working is mutual sharing of the best information, which can help a flock to discover the best place to hunt.
Group optimization and Ensemble Learning
Many of you have heard about ‘No Free Lunch (NFL) in machine learning. It speaks that no single model works best for all possible situations. We can also say that all optimization algorithms perform equally well when averaged across all potential problems. The last statement that I have written isn’t self-explanatory with the example of flock of bird. Why do we need optimization in machine learning or deep learning? To train a model, we must define a loss function to measure the difference between our model predictions. Our objective is to minimize or optimize this loss function so that it will be closer to 0. Maybe you have heard about a term called ‘Ensemble Learning.’ If you have not, then let me explain you. ‘Ensemble’ is a French word—meaning ‘Assembly.’ It speaks about learning in a group or crowd. It is like you are trying to train a model with the help of multiple algorithms. So, what type of benefit are we going to get here? A single base learner is a weak learner. But, when we combine all these vulnerable learners, they become strong learners. They become strong learners because their predictive power, accuracy, precision are high. And the error rate is less. We call this type of combined model ‘Meta-learning’ in machine learning. It refers to learning algorithms that can learn from other learning algorithms. It decreases variance, decreases bias, and improves prediction. Now, when you achieve that, that’s your ultimate ‘Nirvana’ moment as a data analyst.
The problem of optimization
Now let’s come back to our PSO model. The concept of swarm intelligence inspired the POS. Here we are speaking about finding the optimal solution in a high-dimensional solution space. It talks about Maximizing earns or minimizing losses. So, we are looking to maximize or minimize a function to find the optimum solution. A function can have multiple local maximum and minimum. But, there can be only one global maximum as well as a minimum. If your function is very complex, then finding the global maximum can be a very daunting task. PSO tries to capture the global maximum or minimum. Even though it cannot capture the exact global maximum/minimum, it goes very close to it. It is the reason we called PSO a heuristic model.
Let me give you an example of why the finding of global maximum/minimum is problematic. Check the below function :
If we plot this function, it looks like below
We can see that we have one global maximum and one global minimum. If we consider the function based on an interval in X-axis value from -4 to 6, we will have a maximum that will not be our global maximum. It is a local maximum. So we can say that finding out the global maximum may depend upon the interval. It is something like we try to observe a portion of a continuous function. Also, one thing to note while describing a dynamic system or entity, you can not have a static function. The function that I have defined here is fixed. Data analytics is data-hungry. To train a model or to find a suitable mathematical function, you must have enormous data. It is impossible to have all the data. Meaning it’s challenging to get the exact global minimum or maximum. Well, for me, it’s a limitation of Mathematics. Fortunately, we have Statistics that advocate sampling, and from there, it can optimize some value like global maximum or minimum concerning the original function. But again, you won’t get the exact global maximum or minimum. You will get some values that will be closer to the actual global maximum or minimum.
Also, when we describe a mathematical function based on some real-life scenario, we must explain it with multiple variables or higher-dimensional vector space. The growth of bacteria in a jar may depend upon temperature, humidity, the container, the solvent, etc. For this type of function, it’s more challenging to get the exact global maximum and minimum. Check the below function. And see if we add more variables than how difficult it becomes to get global maximum and minimum.
z=f(x, y)=sin x2+siny2+sinxsiny
The mathematical formulation of an Optimization Problem :
In the optimization problem, we have a variable represented by a vector X=[x1x2x3…xn] that minimizes or maximizes cost function depending on the proposed optimization formulation of the function f(X). X is known as position vector; it represents a variable model. It is an n dimensions vector, where n represents the number of variables determined in a problem. We can call it latitude and the longitude in the problem of choosing a point to land by a flock of birds. The function f(X) is called the fitness function or objective function. The job of f(X) is to assess how good or bad a position X is; that is, how perfect a certain landing point a bird thinks after finding a suitable place. Here, the evaluation, in this case, is performed through several survival criteria.
An Intuition of Particle Swarm Optimization
The movement towards a promising area to get the global optimum.
- Each particle adjusts its traveling velocity dynamically, according to the flying experiences it has and its colleagues in the group.
- Each particle tries to keep track of :
- It’s best result for him/her, known as personal best or pbest.
- The best value of any particle is the global best or gbest.
- · Each particle modifies its position according to:
- its current position
- its current velocity
- the distance between its current position and pbest.
- The distance between its current position and gbest.
Particle Swarm Optimization Algorithm
Lets us assume a few parameters first. You will find some new parameters, which I will describe later.
f: Objective function, Vi: Velocity of the particle or agent, A: Population of agents, W: Inertia weight, C1: cognitive constant, U1, U2: random numbers, C2: social constant, Xi: Position of the particle or agent, Pb: Personal Best, gb: global Best
The actual algorithm goes as below :
1. Create a ‘population’ of agents (particles) which is uniformly distributed over X.
2. Evaluate each particle’s position considering the objective function( say the below function).
z=f(x, y)=sin x2+siny2+sinxsiny
3. If a particle’s present position is better than its previous best position, update it.
4. Find the best particle (according to the particle’s last best places).
5. Update particles’ velocities.
6. Move particles to their new positions.
7. Go to step 2 until the stopping criteria are satisfied.
Analysis of the Particle Swarm Optimization Algorithm
A neighborhood must be defined for each particle. This neighborhood determines the extent of social interaction within the swarm and influences a particular particle’s movement. Less interaction occurs when the neighborhoods in the swarm are small. For small neighborhoods, the convergence will be slower, but it may improve the quality of solutions. The convergence will be faster for more prominent neighborhoods, but the risk that sometimes convergence occurs earlier.
For Star topology, each particle is connected with other particles. It leads to faster convergence than other topologies, Easy to find out gbest. But it can be biased to the pbest.
For wheel topology, only one particle connects to the others, and all information is communicated through this particle. This focal particle compares the best performance of all particles in the swarm, and adjusts its position towards the best performance particle. Then the new position of the focal particle is informed to all the particles.
For Ring Topology, when one particle finds the best result, it will make pass it to its immediate neighbors, and these two immediate neighbors pass it to their immediate neighbors until it reaches the last particle. Here the best result found is spread very slowly.
Types of Particle Swarm Optimization
It is a graphical technique to represent 3 -Dimensional surface in 2- dimensional Plot using variable Z in the form of slices known as contours. I hope the below example can give you the intuition.
Let’s draw a graph of circle z=x2+y2 at fixed heights ‘z’ , z =1,2,3 etc.
Taken from Wikipedia
To give you intuition, let Plot the function below in the contour plot.
z=x2+y2 its actual plotting and the contour plotting will look like below:
Here we can see the function in the region of f(x,y). We can create ten particles at random locations in this region, together with a random velocity which is sampled over a normal distribution with mean 0 and standard deviation 0.1, as follows:
The actual outcome will be like :
PSO found best solution at f([0.01415657 0.65909248])=0.4346033028251361
Global optimal at f([0.0, 0.0])=0.0
For details coding part, I’ll highly recommend you to visit the link: https://machinelearningmastery.com/a-gentle-introduction-to-particle-swarm-optimization/
Also, there is a library available called as pyswarms; please check the link to know more :
Difference between PSO and Genetic Algorithm
Genetic Algorithms (GAs) and PSOs are both used as cost functions, they are both iterative, and they both have a random element. They can be used on similar kinds of problems. The difference between PSO and Genetic Algorithms (GAs) is that GAs it does not traverse the search space like birds flocking, covering the spaces in between. The operation of GAs is more like Monte Carlo, where the candidate solutions are randomized, and the best solutions are picked to compete with a new set of randomized solutions. Also, PSO algorithms require normalization of the input vectors to reach faster “convergence” (as heuristic algorithms, both don’t truly converge). GAs can work with features that are continuous or discrete.
Also, In PSO, there is no creation or deletion of individuals. Individuals merely move on a landscape where their fitness is measured over time. This is like a flock of birds or other creatures that communicate.
Advantages and disadvantages of Particle Swarm Optimization
- Insensitive to scaling of design variables.
- Easily parallelized for concurrent processing.
- Derivative free.
- Very few algorithm parameters.
- A very efficient global search algorithm.
- PSO’s optimum local searchability is weak
The most exciting part of PSO is there is a stable topology where particles are able to communicate with each other and increase the learning rate to achieve global optimum. The metaheuristic nature of this optimization algorithm gives us lots of opportunities as it optimizes a problem by iteratively trying to improve a candidate solution. Applicability of it will increase more with the ongoing research work in Ensemble Learning.
Image Source( by: Amir Cohen/Reuters) : https://www.newscientist.com/article/dn27643-wave-motion-shows-how-bird-flocks-have-to-be-just-the-right-size/