Your Guide to Getting Started with Geospatial Analysis using Folium (with multiple case studies)
- Geospatial analysis is taking on an ever more important role in the industry
- The Folium library in Python helps us analyze the location and geospatial data with ease and lets us create interactive maps
I can’t imagine where we would be without maps today (both literally and figuratively!). Maps are the guiding force for us to explore the world, get to our destination every day, and have helped us explore and evolve our understanding of the world.
I personally love working with location data to map out places of interest. This strategy is widely used in the industry these days, from marketing campaigns to applications like Uber and Zomato. So the question is – how do we get started with working with location data to perform geospatial analysis?
I started exploring the different Python libraries and packages available for performing geospatial analysis. I discussed Uber’s kepler.gl here in that regard. During this wonderful journey, I came across Folium – a wonderful Python library used for plotting maps.
In this article, we’ll learn how we can use Folium for creating maps in Python with two wonderful projects:
- Visualizing COVID-19 Cases in India over time
- Visualizing bus traffic in New York City
Table of contents
- What is Folium?
- Installing Folium on our Machine
- Plotting Maps with Folium
- Layers and Tiles in Folium
- Plotting Markers on the Map
- Plotting Paths using Folium
- Plugins in Folium
- Folium Project #1: Visualizing COVID-19 Hotspots in India
- Folium Project #2: Visualizing Bus Traffic in New York City
- What’s Next?
What is Folium?
So the million-dollar question first – what in the world is Folium?
It has the power of Leaflet.js and the simplicity of Python, which makes it an excellent tool for plotting maps. Folium is designed with simplicity, performance, and usability in mind. It works efficiently, can be extended with a lot of plugins, has a beautiful and easy-to-use API.
Installing Folium on our Machine
Folium comes installed on Google Colab and Kaggle kernels. But, if you don’t have Folium installed by default on your system, you can install it using the following command:
pip install folium
For this article, I am using Folium 0.10.1, which is the latest version of Folium at the time of writing this article. I suggest you use the same version as me so that all your code works properly. You can install this specific version by running the below command:
pip install folium == 0.10.1
Right, let’s start plotting maps!
Plotting Maps with Folium
Plotting maps with Folium is easier than you think. Folium provides the folium.Map() class which takes location parameter in terms of latitude and longitude and generates a map around it. So, let’s plot a map of Delhi with latitude and longitude as 28.644800 and 77.216721 respectively:
You can see that these maps are interactive. You can zoom in and out by clicking the positive and negative buttons in the top-left corner of the map. You can also drag the map and see different regions.
Let’s try to customize this map now. First, we’ll reduce the height and width of the map, and then we’ll change the zoom level.
We can resize our map by using the branca library in Python. It is a spinoff from Folium that hosts the non-map specific features. We can use its Figure class for resizing our maps and pass the desired width and height in pixels:
Next, let’s move to our second problem and change the zoom level. By default, you can zoom in and out as much as you want. Also, the starting zoom level is fixed to 10 (the map is plotted with a 10x zoom). Now, you might be thinking – how can this be a problem?
Well, imagine you want to plot a map of a small area. Here, you need to mention the appropriate starting zoom level so that the map focuses only on that region. Also, if you don’t want them to zoom in or out much and lose focus on the map, then you can restrict it. For doing this, Folium gives us three parameters – zoom_start, min_zoom, and max_zoom.
So, let us again plot the map of Delhi but this time of fixed size and with a restricted zoom level:
You can see here that I cannot zoom in and out after a certain level. Notice here that you need to pass the map as a child to the fig object for resizing the map.
Layers and Tiles in Folium
A tileset is a collection of raster and vector data broken up into a uniform grid of square tiles. Each tileset has a different way of representing data in the map. Folium allows us to create maps with different tiles like Stamen Terrain, Stamen Toner, Stamen Water Color, CartoDB Positron, and many more. By default, the tiles are set to OpenStreetMap.
Each tileset shows different features of a map and is suitable for different purposes. For example, Stamen Terrain features hill shading and natural vegetation colors. It showcases advanced labeling and linework generalization of dual-carriageway roads. And, CartoDB Dark Matter shows the CartoDB Positron map in dark mode.
Since now we know that each tileset provides information in a different way and serves a different purpose, we can layer them over one another to get more information by just plotting a single map. We can do this by adding different layers of tiles to a single map:
Here, you can see that I have added five different tile layers to a single map and now I have 6 different layers of tilesets. I have also added LayerControl() to the map which provides an icon on the top-right corner of the map for switching between the different layers.
We can also layer other objects over a single map. Later in this article, we’ll layer paths of three different vehicles on a single map.
Plotting Markers on the Map
Markers are the items used for marking a location on a map. For example, when you use Google Maps for navigation, your location is marked by a marker and your destination is marked by another marker. Markers are among the most important and helpful things on a map.
Folium gives a folium.Marker() class for plotting markers on a map. Just pass the latitude and longitude of the location, mention the popup and tooltip and add it to the map.
Plotting markers is a two-step process. First, you need to create a base map on which your markers will be placed, and then add your markers to it:
Here, I have created three markers on the map of Delhi. The text when you hover over a marker is known as tooltip and the content when you click on a marker is known as a popup. If you look closely here, I have customized tooltips and popups of markers 2 and 3. You can also customize them according to your needs by passing the content as HTML to the class.
Now, normally when we book a ride on Uber or order food from Zomato, we see very different and beautiful markers. How can we create something like that?
There are two things that you can customize to change the appearance of a marker. First, you can change the icon of the marker, and second, you can change the shape of the marker. Folium gives the folium.Icon() class which can be used for creating custom icons for markers. You can choose icons from a wide range of options.
Icon() takes three arguments – color, prefix and icon. Color is used for changing the color of the marker, the prefix is used for selecting the icon provider (fa for Fontawesome and glyphicon for Glyphicons), and the icon is used for selecting the icon name.
Now, let’s learn how you can change the shape of the marker. Folium has the folium.features.CustomIcon() class that can be used for creating custom markers. It takes the path of the image and icon size in pixels as arguments and creates an icon object. This icon object can be passed to the folium.Icon() class as an icon for creating custom markers:
Here, you can see that I have created a marker without any icon. I have done this by setting the icon parameter to none. I have also created a custom marker of the shape of the logo of Analytics Vidhya. Similarly, you can create a marker of any image you want.
Plotting Paths using Folium
As I said earlier, we’ll plot the path of 3 vehicles on the map. We have three arrays for three vehicles containing latitudes and longitudes of their paths:
We’ll use these latitudes and longitudes to plot paths for the three vehicles. Here’s how we’ll do it:
- Create a base map
- Create a feature group for each vehicle
- Then create paths and add them to the feature groups
- Finally, add these feature groups and layer control to the map
Let’s do it:
Now, what is a feature group? A feature group is something where we can put things and treat them as a single layer. It can be created using the folium.FeatureGroup() class. We’ll put our paths in these feature groups.
Next, I’ll draw paths of vehicles and add them to the feature groups. For this, I’ll use the PolyLine class from the vector_layers of Folium. The vector_layers wraps Polyline, Polygon, Rectangle, Circle, and CircleMarker features of Leaflet.js which are used for creating different shapes on the map.
The Polyline class is used for creating polyline overlays on the map and we’ll do the same with it. It takes the array of latitudes and longitudes, popup, tooltip, color, and weight as arguments and draws a polyline on the map.
Weight determines how thick or thin the polyline will be. There are also other arguments related to plotting a path and you can read about them here.
Next, we’ll add these feature groups to our base map and we’ll also add layer control for controlling the layers:
We can clearly see the path of all three vehicles. Notice on the top-right corner we have the layer control option through which we can switch between different layers. We can see the information about each path by clicking on it. You can also try other functions of the vector_layers and create different overlays on maps.
Plugins in Folium
If Folium is Thanos then plugins are the infinity stones. Similar to how infinity stones took the power of Thanos to another level, plugins do the same for Folium.
Folium’s plugins wrap some of the most popular Leaflet.js external plugins. These plugins make Folium very powerful. There are so many plugins for Folium like AntPath, MeasureControl, and Draw which make mapping a piece of cake.
We’ve covered a lot of theoretical aspects so far with a bit of code thrown in. It’s time to get our hands dirty and fire up Python. Here’s a spoiler – we’ll use the TimeSliderChoropleth and HeatMapWithTime plugins for Folium in the real-world projects we’ll work on.
Folium Project #1: Visualizing COVID-19 Hotspots in India
COVID-19 is the most challenging crisis that humanity has faced since the second world war. We have seen how it has affected everyone’s life. Almost no country is left untouched from the COVID-19 outbreak. Due to the lack of making the right decision at the right time, many countries have already faced the wrath of this invisible enemy and many more will in the upcoming weeks.
Although India took the right decision & locked down the whole country in the early stages, the number of cases has started to rise. Therefore, in this project, we’ll visualize the COVID-19 hotspot states in India using Folium.
Setting up the Problem Statement
Before starting any project, it is important to clear the problem statement in our mind. It helps in making the problem unambiguous and solving it in the best possible way. Therefore, let’s first take a look at the problem statement:
Problem Statement: “ To visualize the COVID-19 affected states & UTs of India through time.”
This analysis will help us in distinguishing between states that are COVID-19 hotspots and states that are not. It will also help us in knowing how the conditions changed throughout the time in the different states of India.
Understanding the data
The COVID-19 dataset contains the state-wise count of COVID-19 patients including the count of dead and recovered patients from 29 January 2020 to 2 June 2020. And, the GeoJSON is a format of encoding geographic data structures, therefore, it contains geographical information about states and UTs of India in terms of polygons and multi-polygons.
So, let’s start our project by importing the required libraries:
Let’s import the COVID-19 dataset and take a look at it:
As we can see here the dataset contains 9 columns, which contain information like date, name of the state or UT, count of infected, dead, and recovered patients.
Now, let’s take a look at our GeoJSON dataset. For this, I am using the geopandas library. It is an amazing and very powerful library used for working with geographical data in Python. Geopandas combines many amazing libraries like NumPy, pandas, shapely, fiona, pyproj, descartes, geopy, matplotlib, and many others which makes working with geospatial data very easy.
As you can see the GeoJSON contains geographical data of different states and UTs of India in the form of polygons and multi-polygons. A polygon is used for representing a single entity like states such as Bihar and multi-polygons are used for representing states and UTs which are divided into various parts on a geographic level.
Let’s check if the data is correct by plotting it. Plotting a GeoJSON is very easy using geopandas, you just need to call the plot() function:
Yeah! Our data is correct. You can clearly see the boundaries of all the states and UTs in the above figure. Now, let’s move to the data pre-processing section.
In this project, our end goal is to visualize the change in the condition of COVID-19 cases in the different states and UTs of India. For this, we will use the TimeSliderChoropleth plugin of Folium. This is a great plugin that helps in visualizing the changes w.r.t to time. It provides a slider on the top of the choropleth map which you can slide to visualize the change in a quantity over a period of time.
For plotting the map using this plugin, we require two things – the GeoJSON of the place and the data. But here is a catch, it requires the data only in a specific format. The GeoJSON must be string serialized according to the area name, and the data must be given as dictionary in the following format:
Take a look below – I have simplified the meaning of each term in the dictionary:
Now, you know that before plotting the map we need to bring our data into this format. So, let’s start pre-processing the data.
First, we’ll pre-process the COVID-19 dataset and then we’ll move to the GeoJSON. Let’s start by checking if our dataset contains any null values:
Great! We do not have any null values in our dataset. Next, let’s take a look at the data type of columns in this dataset:
Everything seems to be correct but the ‘Date’ column is of type object. Let’s convert it to DateTime. One more thing – the dates mentioned in the dataset are the dates on which the information was updated (not the dates on which data was collected). Therefore, we’ll shift all the dates one day back:
The column containing the name of states and UTs is very important for plotting so let’s take a look at the contents of the ‘Name of State / UT’ column:
There are a few things wrong in the name of the states and UTs. First, Union Territory of Ladakh and Ladakh both are present in the dataset but in reality, both are the same. Similar is the case with Jammu and Kashmir and Chandigarh. Second, the spelling of Telangana is incorrect. Also, the name of Dadar and Nagar Haveli is wrong. Let’s correct these mistakes:
Here, I have used lambda functions and regular expressions to correct the names of states & UTs. If you want to learn more about lambda functions and regular expressions, you can read the below articles:
- What are Lambda Functions? A Quick Guide to Lambda Functions in Python
- Beginners Tutorial for Regular Expressions in Python
As we saw earlier, we need to transform our data into a specific format. For that, we have to create an area name indexed dictionary. But I have found that TimeSliderChoropleth does not work well with area name indexes containing whitespaces. Therefore, I have prepared a dictionary for label encoding the states and UTs. I will use this dictionary and map the ‘Name of State / UT’ column to a number:
In this project, we are visualizing the change in COVID-19 cases in different states and UTs of India. Therefore, for this purpose, the number of active cases is a better measure. We can calculate the number of active cases in a state or UT on a particular day by subtracting the number of dead and recovered patients from the total number of confirmed cases:
Now, let’s assign a color to each day of every state and UTs based on the total number of active cases. For this, I’ll divide the range between minimum and maximum active cases into 10 equal parts and assign each part a color:
In this project, I am using the red color for marking the states with active COVID-19 cases and green color for those states and UTs which previously had active COVID-19 cases but do not have any on that particular day. Keep in mind the red color becomes darker with the increasing number of active COVID-19 cases:
We require only three columns of the dataset for further analysis. Therefore, I am dropping other columns and considering only Date, state_id and color column:
Now, we have successfully color-coded all the states and UTs. But, you have missed something here. Can you guess it?
If you have read the previous sections carefully, then you know that our dataset contains data only about the states and UTs that have the COVID-19 cases. It does not include the information about states and UTs which never had any cases of COVID-19.
We can add this information by finding the set difference between the set of states having COVID-19 cases on a particular date and the set of total states. We’ll represent states and UTs without any previous cases of COVID-19 cases using blue color:
I mentioned earlier that TimeSliderChoropleth takes time in milliseconds and also takes the whole dataset in the form of an area name indexed dictionary. That’s what we’ll do next:
Let’s take a look at the dictionary for a state:
Finally, we have successfully pre-processed the COVID-19 virus dataset. Now, it is time to pre-process the GeoJSON. We don’t have to do much here. Similar to what we did with the previous dataset, we just need to label encode the column containing states and UTs names:
Hurray! We have pre-processed both the datasets. Now, let’s move to the next section and plot these datasets and see the results.
Plotting changes in COVID-19 cases in India using TimeSliderChoropleth plugin
Your wait is finally over! The process of plotting a map using this plugin is very easy and requires only three steps:
- Import the plugin
- Create a base map
- Add the plugin to map
So, let’s import the TimeSliderChoropleth plugin. You can read more about it here.
Now, let’s create a basemap of India. I have used CartoDB positron tileset here because it a very clean tileset and the colors of states and UTs will be in contrast to it:
Let’s add our plugin to the map and see the results:
We can clearly see that as of 2nd June 2020, Lakshadweep and Daman and Diu never had any cases of COVID-19. There is also some good news from Andaman and Nicobar Islands. Earlier they had few cases of COVID-19 but there are no more new cases and now it is clean.
But don’t start celebrating yet. Conditions in all other states and UTs have deteriorated over time. Especially in states like Maharashtra, Tamil Nadu and Delhi, conditions have become worse. These are the major hotspots of COVID-19 in India with the highest number of COVID-19 cases. Rajasthan, Madhya Pradesh, Uttar Pradesh, and Gujarat will join Delhi and Maharashtra in the upcoming days if the trend continues.
Great! You completed the first project of this article. Now, let’s move on to our second project where we’ll visualize bus traffic in New York City using Folium.
Folium Project #2: Visualizing Bus Traffic in New York City
In today’s world, the human population is increasing day by day and with this increasing population, the number of vehicles is also increasing. This rise in the number of vehicles on the road has created many new challenges for governments around the world.
Let’s just talk about New York here. New York City is one of the world’s leading metropolises for art, fashion, food, and theater. It is situated on one of the world’s largest natural harbors. New York City is composed of five boroughs – Brooklyn, Queens, Manhattan, the Bronx, and Staten Island, each of which is a county of the State of New York. But it is the most densely populated major city in the United States.
Due to this high population, it has now become the third most congested city in the world in terms of traffic and the second-worst in the United States. Road accidents have shot up in the past few years. Also, the travel time from one place to another has increased drastically.
Therefore, it becomes important for the government and road planning authorities to do the analysis of traffic on roads. In this project, we’ll visualize the traffic in New York due to buses using Folium.
Understanding the Problem Statement
Before moving further, let’s first understand our problem statement:
Problem Statement: “To visualize the traffic due to buses in New York City throughout the day.”
It is a very important analysis of road traffic. It will help us in identifying the areas of high and low traffic at different instances of time. It will also help us in understanding the flow of traffic of public transport in New York City. Government and road planning authorities can use the results of this visualization to make better decisions and improve the structure of roads.
Understanding the Data
For this project, I am using the New York City Bus Data taken from here. This dataset contains the latitude and longitude of various buses recorded roughly at an interval of every 10 minutes. This dataset also includes information such as departure time, arrival time, next stop, and time to reach the destination. This dataset was recorded for buses in June, August, October, and September 2017.
Let’s take a look at our dataset:
This is the dataset for June 2017 only. It contains 17 columns and more than 6.7 million rows. We’ll use this dataset for our analysis but first, we need to pre-process what we have.
Pre-processing the dataset
In this project, we’ll visualize the change in road traffic conditions in New York City due to buses throughout the day. We can do this using a heatmap. Heatmaps are a great tool for visualizing the density of a quantity in an area through the change in intensity of a color. But we want to visualize the traffic over time.
Do you know how can we do this?
We can do this by using the HeatMapWithTime plugin of Folium. It is a great plugin that helps in creating heatmaps for a period of time. It creates an animation of heatmaps that we can pause, play, loop and also play at different speeds. So, let’s pre-process our data.
Here, we’ll only take data for one day of the month – 1 June 2017. We are doing this because we want to visualize the traffic throughout a day not throughout a month:
Our dataset contains 17 columns and more than 0.24 million rows. Also for plotting heatmaps, we only need information such as data recording time, latitude, and longitude of buses and vehicle reference numbers for selecting a vehicle. Therefore, I am dropping other columns of the dataset:
Now, let’s check if our dataset contains any redundant rows because it can affect the results of our analysis:
We can see that our dataset contains many redundant rows. So let’s drop these duplicate rows and keep only the first one:
Next, let’s check if there are any null values present in our dataset:
Great! We do not have any null values. Now, let’s add another column to our dataset containing the hour because we want to generate a heatmap for every hour.
There is a problem here – not all buses have their locations recorded at the same point in time. We can tackle this problem by taking the last recorded position of every bus in an hour. For this, we can first find the last location recorded time of a bus in an hour, and then we can find the latitudes and longitudes for these timestamps. So, let’s find the last recorded time for every bus in an hour:
We have the last recorded time for every bus in an hour. Let’s find the location of buses for these timestamps. We can do this by joining df2 and df together on hour, VehicleRef and RecordedAtTime columns.
If you want to know more about joining dataframes in Pandas, you can read this article: Joins in Pandas: Master the Different Types of Joins in Python.
There is just one more thing left to do before we move onto the next section. The HeatMapWithTime plugin takes the data in a specific format. It takes the latitudes and longitudes in the form of nested arrays:
Let’s convert the position of buses into this format:
Hurray! We have successfully pre-processed our dataset. Now, it is time to move to the next section of the tutorial where we’ll visualize the bus traffic in New York City.
Visualizing Bus Traffic through the HeatMapWithTime Plugin
Finally, we have arrived at the much-awaited section of the project. In this, we’ll visualize the bus traffic in New York City. Similar to COVID-19 project’s TimeSliderChoropleth plugin, plotting through HeatMapWithTime is a three-step process:
- Import the plugin
- Create a base map
- Add the plugin to map
So, let’s import the plugin and create our base map:
Here, I haven’t mentioned the tileset therefore, it will be OpenStreetMap. Now, let’s add our plugin to this base map:
We can see that after 10:00 PM, the traffic starts to reduce in New York City but the city again becomes crowded at 8:00 AM. Also, the buses run 24/7. Manhattan is the most crowded borough of New York followed by Brooklyn. Staten Island is the borough with the least traffic in New York.
You can always save your maps by using the save() function:
Congratulations! You have completed your second project and are mastering Folium.
You are now familiar with the concepts of Folium. You can leverage its power and create beautiful maps. Start now by adding a legend to the COVID-19 TimeSliderChoropleth map and take it from there.
If you want to know more about different plugins and classes of Folium and how to use them, you can read this gallery of Jupyter Notebooks by Folium here.
If you found this article informative and helpful, then do share it with your network. You can also comment below with your queries and feedback.