Building additional features & variables through open data sources

Tavish Srivastava 13 Apr, 2015 • 5 min read

Power of Analytics

Recently, while travelling, I met a few people who perceived analytics as a passive industry. They considered it to be a limited growth industry. On the contrary, I always wonder about the enormous source of accessible data available at our fingertips – big thanks to the search engines! Exploitation of twitter feeds for sentimental analysis is no longer a tough row to hoe.

Let’s understand this unseen power of analytics by an example.

Suppose, you run an international chain of retail stores, say Bresco. You run all sorts of loyalty programs to collect customer data. You have also tied up with commercial banks. As a result, you get all the necessary data about your customers ranging from bank account details, card details, demographic information, food preferences etc. Now, the collected data can help you in creating a ‘virtual image’ of the customers. Based on the image, predicting what type of food they would have next, or their future purchases can do wonders for your store.

In this article, we’ll look at freely available sources of information and discuss how they can be used in context of analytics.

 

Social Media

Social media buttons

Social Networks bring out two very critical pieces of information, we could not have known otherwise.

First, the unrealized customer preference. Using the behavioural information of customers social media, we can predict what customer prefers. This information can aid existing information about the customer as well. For instance, if a customer transact a lot on restaurants, we can say that the customer is a Foodie and likes to visit different restaurants. But this might be just a requirement of his job and not his preference. Yet, if such an inference comes out from his social network, we can be more certain of what customer really likes and what not.

Customer preference can be carved out from the customers network (if he has more people who have been referred to as Foodies, this person might be a foodie as well), the photos he has been tagged / Check ins in (if he is tagged in multiple restaurants, he might be a Foodie), his comments, hash tags etc. Social Media can bring out such information, which can help us make our products more customer centric.

ALSO SEE: Here is an article which can give you a kick start using Twitter Sentiment analysis.

 

Second, the customer network information. Social media can bring out the type of people network a customer owns. Imagine, we have a social media management team who can resolve 10,000 customer complaints in a day. But, we started getting 1,00,000 complaints everyday on social media. How should we prioritize addressing these complaints? A very simple way to do this is to quantitatively assess the network strength of the customer and choose the stronger ones. For instance, complaints coming from person X will be more important than person Y, if the people X interacts with are more influential than those of Y.

ALSO SEE: Here is an article which can give you a kick start using network analysis.

 

Google API

Google can help us create features in multiple ways. Here we will take help of Google in two different ways:

First, the direct information which can be extracted from Google. A few example are as follows:

  1. Google Maps can serve as a revolutionary step to measure distances between subject places. In case of Bresco (refer above), we have the location of our stores and address of customers. Using this set of information and customer preferences collected through social network, we can recommend customer, the most suitable offer in our nearest outlet.  Till now, centroid to centroid distances between store location and customer location are the most commonly used methodology, which were highly inaccurate, given the area of each pin code is reasonably big.
  2. Google Spreadsheets and other shared drives can be directly accessed using APIs. Many public survey results can be found shared on these Google drives which can be accessed using the API.
  3. Google + is again another social network which can be harnessed to bring out relevant customer information.
  4. Google Trends can also be used as an input to many time series models to understand the popularity of different products/ topics and interest.

Second, the capability of Google being leveraged directly in our analytics projects. Google has always been the undisputed leader in data science. We can leverage Google’s strong algorithms directly. Below are a few ways:

1. Google has the facility of auto-correcting spellings. In text mining concepts, this is like an unnatural power which can be directly leveraged. For example, I have a list of cricketers from the year 1970 to 2015. I want to aggregate all the records made by all cricketers. But, the information is manually typed, hence requires cleaning. One of the record states Mahendra Singh Thoni! Should we combine this with Mahendra Singh Dhoni’s record or not? Of course the answer is yes, but we cannot go to each record and check. So we make an automated system which uses Google API and search for the keyword and picks up the top 5-10 searches. If all these searches respond to a single key (which in this case is MS Dhoni) we will impute the information by new key. Here is a video which can help you write python codes to bring out all the search links for a keyword.

MS Dhoni

2. Google also has the capability to know the popularity of different pages. Using this we can check the popularity of different pages in different countries. This can dictate us a few key trends for each country.

3. Google’s capability to recognize language can also be exploited to impute information in countries like Germany or Japan, where information is directly fed in the local language. Though, this can be translated using Google Translator to standardize the entire data.

 

Video Sharing Website

Just like Google in search, Youtube is the undisputed worldwide leader in video sharing websites. Youtube API can be used to find the popularity of videos and thereby the popularity of the topic of videos. All the likes, dislikes, comments information can tied up together to understand the trends in preference of customers.

ALSO SEE: Here is an article which will get you kick started with harnessing You tube information.

 

End Notes

My objective of writing this article was to ignite interest in upcoming data sources which can be readily used in different industries without much investments. The information sources stated above are easily accessible and carry massive potential of transforming analytics industry.

Did you find the article useful? Share with us all the new sources of information which you have used in your projects. Also share with us any links of related video or article to leverage these data sources.  Do let us know your thoughts about this article in the box below.

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

Tavish Srivastava 13 Apr 2015

Tavish Srivastava, co-founder and Chief Strategy Officer of Analytics Vidhya, is an IIT Madras graduate and a passionate data-science professional with 8+ years of diverse experience in markets including the US, India and Singapore, domains including Digital Acquisitions, Customer Servicing and Customer Management, and industry including Retail Banking, Credit Cards and Insurance. He is fascinated by the idea of artificial intelligence inspired by human intelligence and enjoys every discussion, theory or even movie related to this idea.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear