Learn everything about Analytics

Geo-Searching & Analytics Using AWS Cloud Search

SHARE
, / 0

One of the common challenge faced by Analytics professionals is geo & radius related questions such as:

  • How to get the cheapest houses in a 25 mile radius?
  • How can I get the top performing stores within 25 miles of towns which have population more than 50,000 people
  • How can I get top 50 MSAs for incomes by population where the MSA is defined as 50 miles from the central zip code

In all these scenarios, we need to implement the radius search to get us an answer. We can use our favorite analytical package such as R, SAS or even PG SQL’s. However, when we move to implement these type of analytics to a live application, which requires fast response, the infrastructure engineering could become tricky.

 

Solution on the cloud

AWS Cloudsearch, a cloud search service from Amazon Web Services, could help in some of these problems. AWS Cloud search is a hosted search platform, which can be used to search large collections of data such as web pages, document files, forum posts, or product information. Search indexing technologies such as Lucene have existed for a long time but AWS’s Cloud search relies on the AWS platform, which is used to power Amazon’s own shopping search engine. This means that the kinks that are required to be ironed out from the dev server to live operations have been worked out.
AWS Cloudsearch indexes and searches both structured data and plain text. Some of the features are:

 

  • Full text search
  • Boolean search
  • Prefix searches
  • Range searches
  • Faceting

We can get search results in JSON or XML, sort and filter results based on field values, and sort results alphabetically, numerically, or according to custom expressions.

 

Steps to build a search domain on AWS:

We can follow these broad steps to build a search domain in AWS Cloud Search:

  • Create and configure a search domain.
  • Upload the data you want to search to your domain.
  • Searching Your Data with AWS CloudSearch and Controlling Search Results.

vHomeInsurance collected detailed data on home insurance & property values across the US and compared it within specific geographic and regional areas. Here is an example of home prices using a heat map showing the house prices across the US. Consumers & analysts need to understand where to live within those pockets based on home prices given the property values, home insurance & other factors.

Heat map-1

Example Data set:

For example, if you want to live in Atlanta, you want to identify the cheapest home insurance in Atlanta and nearby locations. AWS Cloud search helps you do that using geo-location searching. Here is another example data set in California and places close to Los Angeles for home insurance & property values.

City Home Insurance Property Value Number of Homes Zipcode Lat Long
Los angeles $642 $470,000 1419626 90001 33.7866 -118.2987
San Diego $635 $426,100 861451 92104 32.7397 -117.1293
San Jose $672 $659,100 592151 95101 37.3435 -121.8887
Sacramento $606 $242,100 431564 94203 38.5854 -121.4925
San Francisco $687 $750,900 375861 94103 37.7731 -122.411
San Bernardino $602 $217,800 240838 92401 34.1054 -117.2912
Fresno $602 $217,100 232708 93650 36.8419 -119.7952
Ontario $624 $355,700 195756 91758 34.0635 -117.6503

 

Geo location search:

We index the above table using the cloud search API within the search domain in cloud search. The indexed fields can include home insurance rates, home value, number of homes, city, state, Zipcode, population & lat long details.

AWS Cloud Search uses Cosine search for its geo location search. A brief explanation on the cosine search is available below and details can be found here.

Law of cosines is more preferable than haversine when calculating distance between two latitude-longitude points.It gives well-conditioned results down to distances as small as around 1 metre. In view of this, it is probably worth, in most situations, using either the simpler law of cosines or the more accurate ellipsoidal Vincenty formula in preference to haversine.

Law of cosines:

d = acos( sin φ1 ⋅ sin φ2 + cos φ1 ⋅ cos φ2 ⋅ cos Δλ ) ⋅ R
var φ1 = lat1.toRadians(), φ2 = lat2.toRadians(),
Δλ = (lon2-lon1).toRadians(), R = 6371;
var d = Math.acos( Math.sin(φ1)*Math.sin(φ2) + Math.cos(φ1)*Math.cos(φ2) * Math.cos(Δλ) ) * R;

Lat/Lon in degrees:

d = ACOS( SIN(lat1*PI()/180)*SIN(lat2*PI()/180) + COS(lat1*PI()/180)*COS(lat2*PI()/180)*COS(lon2*PI()/180-lon1*PI()/180)) * 6371;

The formula above can be used to find the distance between one geo location to all the document locations in the search domain and return the documents which are in the specified radius.

Here is an example search query to find the cheapest areas for home insurance in Los Angeles within a 50 miles radius:

dis_rank="&rank-dis=acos(sin(latitude)*sin(3.141*lat/(1000000*180))%2Bcos(latitude)*cos(3.141*lat/(1000000*180))*cos(longitude-(-3.141*(long-18100000)/(100000*180))))*6371*0.6214" ;

treshold=”&t-dis=..radius”

For example Los angeles latitude=33.7866,longitude=-118.2987,radius=50(miles). If you pass these values to the above query, it will return all the documents which are less than 50 miles range to Los angeles.

The simplicity and consistency of the AWS cloud search can enable you to do Geo analytics on the fly instead of developing a custom infrastructure and an entire team to support and maintain the search infrastructure.

Hope this article helps you to solve your needs on geo-location analytics with geo radius questions.

—-

This article has been contributed by vHomeInsurance.com. vHomeInsurance.com (www.vhomeinsurance.com) analyzes home insurance rates, home values and other factors to help home owners make better decisions about their insurance.

Leave A Reply

Your email address will not be published.

Join world’s fastest growing Analytics Community
Receive awesome tips, guides, infographics and become expert at: