DataHack Radio #22: Exploring Computer Vision and Data Engineering with Dat Tran

Pranav Dar Last Updated : 06 May, 2019

6 min read

Introduction

How do computer vision techniques work in an industry setting? How does an organization use data engineering to scale up its operations?

These are questions every aspiring data scientist must be aware of. Dat Tran, Head of Data Science at idealo internet GmbH, is the perfect person to shed light on these questions.

Dat has worked on a variety of data engineering projects before he came to idealo, and now leads a team of data scientists who work on really cool computer vision problems. This is one of my favorite episodes since we launched DataHack Radio – the depth and breadth of topics covered, plus Dat’s incredible knowledge, make this a must-listen.

In this episode of the DataHack Radio podcast, Kunal and Dat cover multiple topics, including:

Dat’s not-so-straightforward journey into data science
How his team uses computer vision at idealo
His rich experience in data engineering
Challenges faced with implementing models and building data pipelines
Advice to aspiring data scientists, and much more!

I have penned down a few highlights from the podcast below. But I strongly recommend listening to the entire conversation! The energy Dat brings to this episode is incredible.

You can subscribe to the DataHack Radio podcast on any of the below platforms:

Dat Tran’s Background and Journey into Data Science

Dat’s journey into data science isn’t your run-of-the-mill story. He hadn’t even heard of ‘machine learning’ during his undergrad days, where his focus was on investment banking. But Dat quickly realized it wasn’t the field for him. So what next?

Back to the drawing board – a Master’s degree! During this time, a couple of his friends were starting out in machine learning and it wasn’t long before Dat was drawn into this wonderfully complex field.

He landed a job in the advanced analytics department at Accenture. This was back when ‘Big Data’ was starting to become the ultimate buzzword in the industry – a great time to enter this field. Dat moved to Pivotal Inc. a year later (joining as a data scientist), recognizing that this was a brilliant opportunity to get more hands-on experience in machine learning.

At Pivotal, Dat worked on a variety of projects spanning different industries, including automotive and airlines. He worked there for over two years and credits a lot of his current knowledge and experience to his time at Pivotal. He gave talks at multiple PyData conferences as well during this time – a truly impressive achievement.

Dat is now working as the Head of Data Science at idealo internet GmbH, a successful Berlin-based startup and one of the largest portals in the German e-commerce market.

Data Science at idealo – Focusing on Computer Vision

idealo is a price comparison site (for products as well as hotels) so you can imagine the numerous data science functions the team performs – price prediction, indexing, developing and using a recommendation engine, among other things. Dat’s team, however, focuses on applying computer vision.

A fair question to ask – what role does computer vision have in a price comparison site? Well, idealo has a ton of images of products and hotels:

Dat explained this section using a really intuitive example. idealo has approximately 2 million accommodations listed with 130 images per accommodation (on average). Now, there are all kinds of hotels – small-sized, medium ones, and the big players (the luxurious 5-star ones).

The pictures of these hotel rooms vary depending on who took them. The non-luxury hotels typically have images taken by owners themselves while the 5-star hotels send images taken by professionals. There is quite a big gap in the image quality between these two categories.

Dat and his team use an array of computer vision concepts to analyze and make use of these images:

Image tagging: The algorithm essentially tags the image depending on the features – bedroom, bathroom, reception, etc.
Image ordering: Then, this algorithm reorders the images in a visually pleasant way
Another task Dat’s team does is upscaling images from low to high resolution using CV

Really interesting stuff! It’s a pleasure to see computer vision making inroads in the industry, isn’t it?

Data Engineering Experience

I came across Dat’s talk at PyData on YouTube – it doesn’t take long to realize he is a data engineering expert. His talk is on ‘How you really get your data science models into production the cool way!’ and you can check it out below:

At idealo, there are a variety of tools being used for data engineering, such as AWS for training and Kubernetes for putting models into production.

I personally feel data engineering is a very overlooked aspect (by aspiring data scientists) of the overall data science project lifecycle. You will most certainly face questions on model deployment and other aspects of software engineering in your data scientist interview. This section of the podcast will provide you with a bird’s eye view of an industry-ready process.

Challenges Faced in Implementing Data Science and Data Engineering

Data science and data engineering are inextricably linked – you cannot separate them for all intents and purposes. Dat explained this using the example of a neural network (a convolutional neural network (CNN), to be precise). There are quite a few CNN frameworks to choose from, like RESNET, MobileNET, VGG, etc.

The challenge with these CNN models is they have tons of hyperparameters, hence making them quite large. This brings up the age-old debate of balancing accuracy and speed. You can get away with it in research but when you’re working with production environments? That is a significant obstacle.

Dat mentioned quite a few common challenges from the data engineering specific side as well, including:

“How can we use a Keras trained model on a TensorFlow backend?”

“Do we need to transform our images into certain formats?”

“How do we benchmark our model results?”

Keeping yourself Updated on the Latest Data Science Techniques

“You best learn about these things when you do them yourself.”

As we alluded to earlier, Dat has done most of his data science learning on the job. There is nothing like practical hands-on experience to indelibly ingrain concepts.

Outside of that, there are so many options to learn from these days (everything is a quick Google search away!):

Courses
Blog posts
Podcasts, etc.

A major challenge with these platforms is that we don’t get a structured path or answer to a specific problem. That, again, is why experience is king in data science.

Advice to Aspiring Data Science Professionals

Software engineering is a key facet of data science most aspiring professionals are unaware of. And you simply can’t get away from it in an industry role. So here’s Dat’s advice for you:

“You need kind of an engineering background. Learn the basics – how to write clean code, version control, testing, and move on to data science then.”\

And this really, REALLY important point:

“The Machine Learning aspect is a small part of a big software project!”

Knowing mathematics, statistics, machine learning algorithms and even tools like R and Python is good, but these don’t differentiate you from the competition. Everyone else is learning the same thing. So what else is there? It comes down to that one thing again – software engineering.

Dat’s Data Science Hiring Process

Dat uses a straightforward set of pointers and rounds to judge a candidate’s ability:

10 basic machine learning questions: Most people drop off at this stage
A machine learning assignment
On-site interview: This includes working with a member of Dat’s data science team to solve a problem
How well does the candidate write and document code?
Ability to research and the thought process behind it

Future Trends in Machine Learning

Which machine learning functions will see a major improvement and focus in the coming years?

AutoML will continue to gain market share and become an accepted member of the machine learning tool family
Explainable AI: The ability to build interpretable deep learning models will take on far more importance
There will be a far bigger focus on security and governance

End Notes

One of my favorite DataHack Radio episodes so far! Dat brings a ton of enthusiasm and knowledge to the podcast that really shines through in the way he explains his role, the challenges his team faces from both a data science as well as a data engineering perspective, his advice to aspiring data scientists, among other things.

A pleasure listening to him elaborate on relevant industry problems and how to overcome them. What was your favorite part of the episode? Let us know in the comments section below.

Pranav Dar

Senior Editor at Analytics Vidhya.Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

DataHack Radio #22: Exploring Computer Vision and Data Engineering with Dat Tran

Introduction

Dat Tran’s Background and Journey into Data Science

Data Science at idealo – Focusing on Computer Vision

Data Engineering Experience

Challenges Faced in Implementing Data Science and Data Engineering

Keeping yourself Updated on the Latest Data Science Techniques

Advice to Aspiring Data Science Professionals

Dat’s Data Science Hiring Process

Future Trends in Machine Learning

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

DataHack Radio #22: Exploring Computer Vision and Data Engineering with Dat Tran

Introduction

Dat Tran’s Background and Journey into Data Science

Data Science at idealo – Focusing on Computer Vision

Data Engineering Experience

Challenges Faced in Implementing Data Science and Data Engineering

Keeping yourself Updated on the Latest Data Science Techniques

Advice to Aspiring Data Science Professionals

Dat’s Data Science Hiring Process

Future Trends in Machine Learning

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques