Pranav Dar — Updated On March 20th, 2018


  • Amazon SageMaker enables data scientists to build, train and deploy ML models at any scale
  • Amazon’s CTO, Werner Vogels, has unveiled the technology behind this
  • SageMaker uses streaming algorithms since they are infinitely scalable and can consume unlimited data



Computational power is a major challenge for data scientists. You have all the data you need but the lack of computing power is quite often a major hurdle. Also, data is usually not a static element. Data always keeps being accrued and is dynamic in nature.

This is why Amazon developed SageMaker, a fully-managed service for it’s Amazon Web Services (AWS) customers that enables developers and data scientists to quickly build, train and deploy machine learning models at any scale. It was launched last year and Amazon unveiled the technology behind this tool yesterday.

When you input data into SageMaker, it uses a streaming algorithm that only makes a single pass over the data. Streaming algorithms are infinitely scalable, that is, they can consume unlimited amounts of data. For instance, processing the 20th gigabyte and 2000th gigabyte is pretty much the same thing. As Amazon’s CTO Werner Vogels says in his blog post, “the memory footprint of the algorithms is fixed and it is therefore guaranteed not to run out of memory (and crash) as the data grows”.

SageMaker also uses containers to spread the workload of the machine learning tasks across it’s network. This in turn, significantly increases the speed at which models are trained and deployed. This also allows the models to move between the GPUs and the CPUs, depending on what works best with that particular model.

Currently, SageMaker has the capability of offering production-ready and infinitely scalable algorithms such as:

  • Linear Learner
  • Factorization Machines
  • Neural Topic Modeling
  • Principal Component Analysis (PCA)
  • K-Means clustering
  • DeepAR forecasting

You can use SageMaker to train with any deep learning frameworks, including:

  • TensorFlow
  • mxnet
  • Pytorch
  • Caffe2
  • Chainer
  • Microsoft Cognitive Toolkit


Our take on this

Official benchmarking figures are not out there yet, but there can be comparisons between this and Google’s CloudML. They seem to be structured in quite similar ways.

Amazon is yet to release any official research papers regarding this technology so we will have to wait to understand how it compares to other services. Have you used this technology yet? Let us know in the comments below.


Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!


Also, go ahead and participate in our Hackathons, including the DataHack Premier League and Lord of the Machines!


About the Author

Pranav Dar
Pranav Dar

Senior Editor at Analytics Vidhya. Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *