Hack Session: Identifying Location using Clustering and Language Model

Driving data to product solutions is everyone’s goal these days – this can be either by procuring/buying exact data from third-party sources or by applying the correct machine learning models to the available data in the system.

In the first case, you might land up at a Data Science product solution having unbeatable accuracy. But in the case of the latter, apart from a product with good accuracy, you will discover mines of insights arising from your business ecosystem that none of the third party sources can ever provide. Who would have imagined that a random text written in a local language sent by customers to their drivers while waiting for a ride/car to arrive for their pickup can be utilized to carve out unparalleled information about pickup points? Sometimes even the best map services of the world have no idea of these minute details which end up creating a world-class customer pick-up experience feature!

This talk will focus on details of how machine learning & natural language processing were used on customer’s text data along with bookings data to come up with a product feature, hence improving the customer experience at GO-JEK.


This is a hack session where Divya will walk people through a toy example of how to build a product feature using clustering of geospatial data combined with language model on their corresponding textual data.

  1. Talking about data science, great potential that text data has, how it is being best used across industries

  2. Problem statement: explain the product feature that we will build in the session

  3. Data Understanding: understanding the data(geospatial and textual) that will be used for the session

  4. Building the product

    1. Clustering

      • K-means clustering

      • Adaptation of K-means

    2. Language modeling

      • What it is, what’s the need, current applications, etc.

      • Language variability and language ambiguity

      • Basic concepts of NLP

      • Data Pre-Processing – the heart of any language model

      • N-gram language modeling from the beginning


  • Building a language model from scratch
  • Understanding of building an excellent product feature using textual & geospatial data
  • Understanding the value of data you have and how they can be used to drive business 
  • Understanding of impact that a data solution like this can bring to the logistics service with a live example of metrics being upscaled by this at GO-JEK
  • Understanding of possible use cases of application of similar solution in different other scenarios across the logistics industry


Divya Choudhary

Divya Choudhary is a data scientist, currently working with a Jakarta based technology startup named GO-JEK. She is responsible for building algorithms and mathematical models to drive features across diversified products at GO-JEK.

With 4 years of work experience, Divya is a computer science engineer who has traversed her professional career from being an analyst to a decision scientist to a data scientist. The crux to any data science solution lies in having a problem-solving mindset & Divya has been known for her business acumen & problem solving approach across all the start-ups that she has been a part of.

Buy Ticket
Social media & sharing icons powered by UltimatelySocial