Databricks and RStudio Launch Platform to make R Simpler than Ever for Big Data Projects!

Pranav Dar 30 Jun, 2018 • 2 min read

Overview

  • Databricks and RStudio have partnered up to make Big Data tasks easier for data scientists and data engineers
  • The unified platform is provided by Databricks and integrates with RStudio in a matter of seconds – it’s the perfect solution for running R code at an unprecedented scale
  • The team has released a KNN Regression demo – view it in HTML or download the R Markdown file (links below)

 

Introduction

Anyone who uses R programming typically does so using the wonderful RStudio IDE. It’s a neat and intuitive tool with excellent and regular maintenance updates. A lot of tools from other languages have tried to copy RStudio’s style but to no avail – it stands out as one of the best coding tools in the community (not to mention it’s open source!).

But performing Big Data tasks with R has been a little challenging. Sure there exist a few packages like Sparklyr that make things easier but scaling up has been an obstacle for many an organization. This gap is now being addressed through an integrated platform developed by Databricks and RStudio. Databricks was founded by the creators of Apache Spark and has recently been in the news thanks to MLflow – their open source platform that works with any language, tool and algorithm.

The platform, provided by Databricks, integrates seamlessly with RStudio and enables data scientists and data engineers to automatically execute R code at an unprecedented scale. Both the popular R packages currently used for connecting and interacting with Apache Spark, sparklyr and SparkR, can be used inside RStudio on Databricks. Awesome!

A demo of this platform has also been provided by Databricks which shows a KNN Regression problem. You can either view it using the HTML version or download the R Markdown file and watch the magic unfold inside RStudio itself.

As mentioned in this Databricks blog post, “R users can get access to the full ETL capabilities of Databricks to provide access to relevant datasets including optimizing data formats, cleaning up data, and joining datasets to provide the perfect dataset for your analytics”.

 

Our take on this

Any data engineer (or to a certain extent a data scientist) who currently works with R will love this release. Despite recent advances in R, performing Big Data tasks has always been a challenge. Most of the data engineers prefer working with Python. It helps massively that a R Markdown file is available to get you started. There’s a free trial available so you can test it out on your machine before applying it in your current project.

All the data engineers out there – what do you make of this release? Will it make your current job easier? Let me know in the comments section below.

 

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

 

Pranav Dar 30 Jun 2018

Senior Editor at Analytics Vidhya. Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear