Data visualization is probably the most important and typically the least talked about area of data science.
I say that because how you create data stories and visualization has a huge impact on how your customers look at your work. Ultimately, data science is not only about how complicated and sophisticated your models are. It is about solving problems using data based insights. And in order to implement these solutions, your stakeholders need to understand what you are proposing.
One of the challenges in creating effective visualizations is to create images which speak for themselves. This article will tell one of the ways to do so using animated GIF images (Graphics Interchangeable format). This would be particularly helpful when you want to show time / flow based stories. Using animation in images, you can plot comparable data over time for specific set of parameters. In other words, it is easy to understand and see the growth of certain parameter over time.
Let me show this with an example
Example – GDP vs. Life expectancy over time
Let us say you want to show how GDP and life expectancy have changed for various continents / countries over time. What do you think is the best way to represent this relationship?
You can think of multiple options like:
- Creating a 3D plot with GDP, life expectancy and time on 3 plots and draw lines for each continent / country. The problem is that human eye is really bad as interpreting 3D visualizations in 2D. Especially so, if there is too much data. So, this option would not work.
- Creating 2 plots side by side – one showing GDP over time and other life expectancy over time. While this is a 2 dimensional plot, we have left a lot for user to interpret. The person need to pick a country and see its movement on each plot and then correlate them. Again, I would ask this from my stakeholders.
Now, let us look at this using an animated plot using .gif file:
The recent development of gganimate package had made this possible and easier. By the end of this article, you will be able to make your own .gif file and create your own customised frame to compare different parameters on global or local scale.
Please install the following packages:
In addition to the above libraries in R, you will also need Image Magick Software in your system. You may download and install the same from Image Magick
Get the Data
This article is an attempt to make .gif file on earthquake data from 1965-2016. It is better to plot year wise global seismic activity rather than a static look of all the values on the map. The data set for earthquake is available on Kaggle.
The data set contains data for global seismic activity from 1965 to 2016. Please visit the above link and scroll down to get the .csv file.
Earthquake magnitude of 7 points on Richter Scale from 1965-2016
The dataset had been modified and only seismic value of 7 points on richter scale has been considered for the study.
From the .csv file we have only selected few parameters for the sake of simplicity.
- Type is the type of seismic activity
- Depth is the distance of the epicenter from the seal level.
- Magnitude is the reading on the richter scale
- ID is the event ID of the seismic activity
We are all set to start coding in R. I have used RStudio environment. You are free to use any environment you prefer.
## Read the datatset and load the necessary packages library(plyr) library(dplyr) library(ggmap) library(ggplot2) library(gganimate) EQ=read.csv("eq.csv",stringsAsFactors = FALSE) names(EQ) ## Only Select the data with magnitude greater than or equal to 7. EQ<-EQ%>%filter(Magnitude>=7)
Split the Date into year, month and date
This is done in order to get the frame which is important for the plot. In other words, The core of the approach is to treat frame
(as in, the time point within an animation) as another dimension, just like x, y, size, color, or so on. Thus, a variable in your data can be mapped to frame just as others are mapped to x or y.
## Convert the dates into character in order to split the coloumn into "dd" "mm" "yy"" columns EQ$Date<-as.character(EQ$Date) ## Split the date and create a list for the same list<-strsplit(EQ$Date,"-") ## Convert the list into dataframe library(plyr) EQ_Date1<-ldply(list) colnames(EQ_Date1)<-c("Day","Month","Year") ## Column bind with the main dataframe EQ<-cbind(EQ,EQ_Date1) names(EQ) ## Change the Date to numeric EQ$Year=as.numeric(EQ$Year) ## Get the world map for plot and load the necessary package library(ggmap) world<-map_data("world") ## Remove Antarctica region from the world map world <- world[world$region != "Antarctica",] map<-ggplot()+geom_map(data=world,map=world,aes(x=long,y=lat,map_id=region),color='#333300',fill='#663300') #Plot points on world Map p <- map + geom_point(data = EQ, aes(x = Longitude, y = Latitude, frame = Year, cumulative = TRUE,size=EQ$Magnitude), alpha = 0.3, size = 2.5,color="#336600")+ geom_jitter(width = 0.1) +labs(title = "Earthquake above 7 point on richter scale")+theme_void() # Plot .gif file using gganimate function gganimate(p)
Speed up projection in .gif using animation package
As we can see that plot has too many years from 1965 to 2016. Thus, in order to speed up the visualization, we can use the animation package to fast forward using ani.option()
library(animation) ani.options(interval=0.15) gganimate(p)
This article was an introductory tutorial to the world of animated map. Readers can try this and apply the same in other projects. Some of the example are,
- The same technology can be used to compare the heat map for the weather data across nation
- Flood or other natural disaster in a particular location over a period of time.
- Can be used to see the growth of metro in city using delaunay triangle. Please see the interesting article posted by Page Piccinini in r-Bloggers, Metro Systems over Time or you can directly access her page from her official site Page Piccinin.
Hope you found the article useful. If you have any questions, please feel free to ask in comments below.
Aritra Chatterjee is a professional in the field of Data Science and Operation Management having experience of more than 5 years. He aspires to develop skill in the field of Automation, Data Science and Machine Learning.
This post was received as part of our blogging competition – The Mightiest Pen. Check out other competitions here.