Home » 3 Building Blocks of Machine Learning you Should Know as a Data Scientist

3 Building Blocks of Machine Learning you Should Know as a Data Scientist

Overview

  • A machine learning system consists of multiple building blocks that need to be managed
  • Learn about the three key building blocks of machine learning you’ll be working with as a data scientist

 

Introduction

How does a machine learning project work? What are the different building blocks that go into making a machine learning or artificial intelligence (AI) system? This is a topic I personally struggled with during my initial days in the field.

I knew how to make machine learning models but I had no clue how a real-world machine learning project actually worked. It was quite a revelation when I went through the process! And over time, I have seen most data science and machine learning beginners struggle to grasp the nuances of a machine learning system.

Remember – it isn’t just about building models! There is a LOT that goes into creating a successful machine learning and AI system. It’s an amalgamation of hardware and software, among other things. So the question is – what are the key building blocks that make up a successful machine learning system?

That’s what we’ll cover in detail in this article. I will give you an overview of these different components in a machine learning or AI system, and then we will understand these components with the help of a self-driving car.

This article and the concepts we’ll cover are part of the free ‘Introduction to AI and ML‘ course. I highly recommend checking that out – it’s a great place to get familiar with the various concepts in AI and machine learning.

 

And the Three Key Building Blocks of Machine Learning Are:

  • Machine Learning Building Block #1: Capturing the Input
  • Machine Learning Building Block #2: Processing and Storing the Data
  • Machine Learning Building Block #3: Output or Interaction Unit

 

Machine Learning Building Block #1: Capturing the Input

As you might expect, every machine learning system needs a lot of data to function. Ultimately, it will take decisions based on the data it captures. And it needs to capture data about the environment it is in, the ambient conditions, user inputs, and so on.

Hence, the first building block of any machine learning or AI system is the way it captures and input in the system.

So what does this input look like? This could include various sensors like a camera capturing images, G.P.S. location, user inputs from mobile applications, etc. In order to select the right inputs, we need to ask these key questions:

  • What data do we need to capture?
  • How frequently do we need to capture this data?
  • How fast would this data flow?
  • What could be the best way to capture this data?

At times, there would be multiple ways to capture the same data. For example, you might rely on sensors in your car to capture weather information, or you might directly pull them from the internet based on the G.P.S. coordinates of your car.

It might make sense to weigh the pros and cons of various ways to capture data before deciding which one you prefer.

 

Machine Learning Building Block #2: Processing and Storing the Data (Edge and Cloud)

Once we capture this data from input units, we will need to either store it or run computations on it. That’s basically the choice it boils down to when we’re working on a machine learning project!

Both of these (processing or storing) can either happen on the system typically called “AI on the Edge” or they can happen on the cloud. Again, we have a few choices in front of us. We need to decide:

  • What data would get stored on the edge?
  • What computations would happen on the edge? Here, you would typically have limitations on the compute environment (trust me, not everyone has the unlimited computational resources of Google!)
  • What would happen on the cloud?

Typically, if there is any critical operation which should happen, even if there is no internet connection or upgrade to the system, it should always happen on the edge.

These would include things like on-the-fly decisions, alerts, or any other form of monitoring you want on the device. More comprehensive data storage and computations happen on the cloud. This is where data scientists typically apply various machine learning techniques to make the system better. All our data lakes, data warehouses, etc. would typically be on the cloud as well.

 

Machine Learning Building Block #3: Output or Interaction Unit

Finally, there would be an output or interaction unit in a successful AI or machine learning system. This is the unit where the machine learning system would interact with the outside universe and take action.

This could be in the form of a display, voice output, or informal robotic actions. Usually, the output from our machine learning system would have several design considerations as well.

For example, if a vehicle is not able to decide or read the environment with certainty, key questions need to be answered:

  • What should the system do?
  • Should it stop first or should it alert the user?
  • How frequently and what details should you communicate to the user?

These are some of the core questions which come under common consideration in the output layer of any machine learning system.

 

Case Study: Building Blocks for a Self-Driving Car

Now, let us take an example of a self-driving car and see each of these building blocks in more detail. This will help you gain a more practical understanding of how a machine learning or AI system functions in the real world.

So what would be the first building block or component?

You guessed it – input! Check out drive.ai’s self-driving car:

As you can see here, this autonomous vehicle has a lot of sensors that act as the input to the machine learning system. You can see these sensors on top of the car (in blue color). These are called LiDARs, or Light Detection and Ranging. In addition to these, there are other sensors that capture more information like the weather, obstacles in and around the car, detecting lanes, etc.

Then there is compute and storage on the car itself, which enables the car to make decisions like:

  • How much to steer?
  • What speed to run at?
  • What are the obstacles in the way?
  • How to handle these obstacles?

There is also a storage and compute layer on the cloud, which is responsible for making the driving algorithm better over time.

And finally, you would see several output components like a screen for showing messages to people around the car. There is also the action taken by the robotic process to drive the vehicle forward. Here’s an illustration of the different layers that are required at this stage:

There are a lot of other nuts and bolts that go into creating a successful self-driving car. But I wanted to take this example to show you how the overall idea behind a real-world machine learning system works and the key building blocks required to run it.

 

End Notes

Quite fascinating, right? As I said earlier, a machine learning project isn’t just about building models. There is so much more to it that most data science enthusiasts aren’t aware of. This practical knowledge is necessary if you want to land a role as a machine learning specialist.

Here’s a challenge for you. Now that you understand various components of a self-driving car, it is your turn to design the components of an intelligent vacuum cleaner which can navigate the floor on its own and clean the area it navigates. Have fun building that!

And if you have questions or thoughts on this article, I’d love to hear from you.

You can also read this article on our Mobile APP Get it on Google Play

5 Comments