DataHour: Demystifying Clustering in Topic Modeling Algorithms like BERTopic

DataHour: Demystifying Clustering in Topic Modeling Algorithms like BERTopic

18 Jan 202313:01pm - 18 Jan 202314:01pm

DataHour: Demystifying Clustering in Topic Modeling Algorithms like BERTopic

About the Event

In a world where customer reviews drive product sales and business decisions, it is crucial to pick relevant topics from vasthuge corpus of textual feedback provided by customers, either on public platforms like Twitter, TrustPilot, Google reviews or internally collected feedback via email campaigns and surveys. It is manually impossible to sift through tons of textual data and get what customers are speaking about. Enter BERTopic, an unsupervised machine learning topic modeling technique which learns a set of topics from a collection of documents (customer reviews), where each topic is represented as a distribution over words, and each document is represented as a distribution over topics. It identifies the underlying themes or topics from this collection and represents each document in terms of the proportions of those themes that it contains.

Clustering is an integral part of this topic identification process. It groups words into different topics based on their statistical co-occurrence patterns in the documents. Using dimensionality reduction techniques like SVD and clustering algorithms like K-means, groups of words are identified that are strongly associated with specific themes or topics, and to use these groups to define the set of topics learned by the model.

In this DataHour, Abhiram will break down how clustering works in BERTopic with the help of the Amazon Alexa Reviews dataset.

Prerequisites: Basic understanding of NLP and curiosity of learning Data Science.

  1. Best articles get published on Analytics Vidhya’s Blog Space
  2. Best articles get published on Analytics Vidhya’s Blog Space
  3. Best articles get published on Analytics Vidhya’s Blog Space
  4. Best articles get published on Analytics Vidhya’s Blog Space
  5. Best articles get published on Analytics Vidhya’s Blog Space

Who is this DataHour for?

  1. Best articles get published on Analytics Vidhya’s Blog Space
  2. Best articles get published on Analytics Vidhya’s Blog Space
  3. Best articles get published on Analytics Vidhya’s Blog Space

About the Speaker

Abhiram Ravikumar

Abhiram Ravikumar

ML Engineer at Collinson

Abhiram currently works at Collinson as an ML Engineer after having completed Master's in Data Science from King's College, London in Jan 2022. As part of the Mozilla Tech Speakers program, he has presented talks at various tech conferences like PyCon, MozFest and CodeMash all over the world. His LinkedIn Learning course on Rust programming, recorded at Carpinteria, California has been taken by over 60k participants. Apart from public speaking, he's a seasoned researcher, having published papers and posters at IEEE and ACM research conferences in the fields of NLP, brain-computer interfaces, quantum computing and AI. Previously, having graduated from PES University in Bangalore, he worked with enterprise software company SAP Labs in Bengaluru for over 4 years as a developer in the fields of web development, computer vision and robotic process automation (RPA). You can connect with him on Linkedin and Twitter.

Participate in discussion

Registration Details



Become a Speaker

Share your vision, inspire change, and leave a mark on the industry. We're calling for innovators and thought leaders to speak at our event

  • Professional Exposure
  • Networking Opportunities
  • Thought Leadership
  • Knowledge Exchange
  • Leading-Edge Insights
  • Community Contribution