Introduction to the rstudio::conf 2020!
I’m a heavy R user. It was the first programming language I learned (thanks to my interest in data science) and it has stayed the course with me. Even after Python’s rapid rise in recent years, I often find myself working with the wonderful ggplot2 library in RStudio.
I switch between Tableau and R for my data visualization projects and I couldn’t be happier with the progress R has made. The subtle changes every year to the traditional packages like ggplot2 and the Tidyverse have kept me coming back.
And what better place to see these changes than the best R conference on the planet? I eagerly await for the folks behind rstudio::conf every year to release the recordings and resources to the community. Last year, I brought the top highlights from rstudio::conf 2019 and the response from our community was overwhelming!
So this year, I’ve picked out my top 10 talks from studio::conf 2020 in this article. There were some awesome talks this year ranging from the current state of the Tidyverse by Hadley Wickham to how you can use R with TensorFlow to build deep learning models. Let’s check them out!
Framework for Picking out these 11 Talks from rstudio::conf 2020
rstudio::conf 2020 had over 40 talks so why these 11 talks in particular? Here’s how I made my selection:
- Fit in the Machine Learning pipeline: Where does the talk fit into a typical machine learning project? I always look out for resources, libraries, and frameworks that will help me expand my skill set to understand how a machine learning project works. These talks are a great way to leverage the thought process of data science experts
- Relevance: How relevant is the talk in today’s rapidly evolving data science field? I picked the talks that I feel are most relevant to a data scientist’s’ skill set – data engineering, deep learning, data visualization, etc.
Data Cleaning and Preprocessing Talks at rstudio::conf 2020
Let’s start with the data cleaning and data preprocessing step, shall we? I’m sure you know which libraries are coming up in this section! The Tidyverse, of course.
If you’re new to the Tidyverse and the incredible number of R libraries it provides, I strongly recommend going through the below article:
1. State of the Tidyverse by Hadley Wickham
Hadley Wickham is the most recognizable person in the R universe. He’s the man behind Tidyverse – the collection of R packages for data cleaning and data manipulation. I can’t thank him enough for creating these packages – they’ve been a godsend!
In his talk at rstudio::conf 2020, Hadley spoke about the latest developments and updates to the slew of R packages under the Tidyverse umbrella. Here are the three key takeaway’s from this talk:
There’s no other talk you need to listen to here – Hadley Wickham’s session is enough. View the full talk here and share your thoughts in the comments section below.
Data Exploration and Data Visualization Talks at rstudio::conf 2020
Ah, my favorite step in the machine learning pipeline. I’m a huge advocate of data visualization and telling visual stories through data. I fell in love with ggplot the moment I came across it all those years ago and it remains a faithful companion in my visualization journey.
I’ve even designed and create an entire course on data visualization and storytelling here:
2. Effective Visualizations by Miriah Meyer
A very, very interesting talk. Miriah Meyer and her team work with building effective data visualizations in a research lab environment. That’s a topic I haven’t heard about much before.
Miriah talks about how to design these effective visualizations in R, the principles her team follows and how you can replicate their idea in your own work.
If you love data visualization and are looking to create innovative work, this talk is for you. And here’s a list of books Miriah recommends reading to become a better visualization expert:
Watch this excellent talk by Miriah here.
3. 3D ggplots with rayshader by Dr. Tyler Morgan-Wall
rayshader is an open-source R package for creating data visualizations. We can create both 2D and 3D visualization in R using this superb package.
This talk by Dr. Tyler Morgan-Wall illustrates how you can use this rayshader package to create stunning 3D figures and animations in R. Dr. Tyler talks about how to use principles of cinematography (any movie buffs out there?) to take your audience on a visually appealing tour of your data.
Here’s an example of the power of rayshader:
rayshader definitely deserves an entire tutorial on its own (I’ll get down to write one soon!).
Check out the full recording here.
Data Engineering Talks at rstudio::conf 2020
Data engineering is the hottest role in the data science space right now (yes you read that right). And is that really a surprise? Given the amount of data we’re generating these days, we need skilled people to collect that data from multiple sources, store it, figure out a way to get it to the data scientists and also set up the production environment.
This is by no means an easy role and the demand for data engineers is rising multi-fold at the moment. That’s a key reason behind including 4 talks from rstudio::conf 2020.
Also, if you’re curious about what it takes to become a data engineer, I’ve put together a comprehensive list of resources for you:
4. Practical Plumber Patterns by James Blair
Plumber is a popular R package among the data engineering community. As the developers behind Plumber put it:
“Plumber is an R package that converts your existing R code to a web API using a handful of special one-line comments.”
Plumber’s flexible approach allows R processes to be accessed by frameworks outside of the R environment. You can install the Plumber package right now with just one line of code:
James Blair expounds on how Plumber works in this talk and shows useful patterns for developing and working with robust APIs built in R using this package.
Interested in watching the talk? Here you go!
5. MLOps for R with Azure Machine Learning by David Smith
Azure ML is Microsoft’s flagship cloud-based machine learning platform. Data science teams can use Azure ML solutions to build end-to-end machine learning pipelines at scale. Microsoft and Satya Nadella bet big on their cloud services to pull Microsoft up in the tech industry and the strategy has worked so far.
David Smith’s talk at rstudio::conf 2020 focused on four things primarily:
- Carry out machine learning workflows using the authoring experience of their choice, from no-code to code-first options
- Use the Azure machine learning R SDK to manage cloud resources and train, hyperparameter tune, and log and visualize metrics for their models at scale on Azure compute
- Build machine learning pipelines in R for defining and orchestrating reusable and reproducible machine learning workflows
- Deploy, manage, and monitor your R machine learning models and applications as web services
Watch David Smith’s talk on MLOps with R here.
6. Deploying End-to-End Data Science with Shiny, Plumber, and Pins by Alex Gold
Another Plumber talk? That’s right – and this time we’ll combine it with the awesome Shiny feature of RStudio.
You can read more about how to get started with Shiny in R here:
This talk by Alex Gold, a Solutions Engineer at RStudio, delves into how you can use R to bring your modeling and visualization work into the production environment. That’s a tricky task as experienced data science professionals will attest.
Alex gave us some awesome tips and tricks I’ll surely be using soon!
You should go through Alex’s full session here.
7. Bridging the Gap Between SQL and R by Ian Cook
SQL remains the most popular database language in the world. Did you know that SQL is celebrating it’s 50th birthday this year? Yep – and it continues to be at the core of working with structured data.
SQL is a language every data scientist should know. Here’s a comprehensive course to learn it:
Learning R can be a frustrating experience if you’re coming directly from SQL. The syntaxes are all over the place and it’s not easy to get the hang of it. As Ian Cook mentions in this talk, the popularity of the sqldf package confirms this.
Here’s the good news – we can directly query an R dataframe without having to move out of the R environment. Ian Cook introduces the tidyquery package that runs SQL queries directly on R dataframes!
This is a must-watch talk.
Deep Learning in R
Can you do deep learning in R? This was a knock on R by Python users for a long time but the tide is turning. Popular deep learning packages like TensorFlow can now be used within R itself (and with great effect).
Here’s a good tutorial to get started with deep learning in R:
8. What’s New in TensorFlow for R by Daniel Falbel
TensorFlow is the most popular deep learning framework right now (PyTorch users might have something to say about that). It has its flaws (which have been addressed to quite an extent in TensorFlow 2.0), but the TensorFlow community remains a huge one.
This talk by Daniel Falbel explores what’s new in TensorFlow 2.0 as well as how to build data preprocessing pipelines using the tfdatasets package. Additionally, Daniel also shows how to use pre-trained models with tfhub.
It’s a good starting point if you’re interested in deep learning but don’t want to switch from R.
You can watch the recording here.
9. Deep Learning with R by Paige Bailey
Paige Bailey is a familiar name among the Analytics Vidhya community. She was a guest on our DataHack Radio podcast last year and made quite an impression on our readers.
Paige is the product manager for TensorFlow and Swift for TensorFlow. She also talks about what’s new in TensorFlow 2.0 and explains why this is a great time to get on the TensorFlow bandwagon if you haven’t already.
Paige takes the audience on a trip down building deep learning models using R. This is one of my favorite talks from rstudio::conf 2020!
Here’s the full video of Paige’s talk at rstudio::conf 2020.
Other Relevant Talks from rstudio::conf 2020
I’ve included a couple of other talks from rstudio::conf 2020 that didn’t quite fit in the above sections. These are excellent talks in their own right and I wanted to highlight them for our community.
10. Journalism with RStudio, R, and the tidyverse by Larry Fenn
Associated Press is among the top media outlets in the world. I found it quite intriguing that they primarily use R and the tidyverse for performing data analysis. Given my own interest in data journalism, this was a much-needed talk.
Larry Fenn, a data journalist at Associated Press, showed us the power of R for telling stories:
- Using dbplyr to work off a hosted database containing 380 million opioid records to identify “pill mills”
- Using open-sourced AP style templates for R Markdown and ggplot to quickly produce graphics and reports off breaking news
- R Markdown and htmlwidgets to give reporters and editors interactive reports to identify reporting leads
I urge you to watch Larry’s talk here and use his ideas in your daily projects.
11. Panel: Career Advice for Data Scientists
No data science conference is ever complete without a discussion among the top minds in the industry. This year at rstudio::conf 2020, the panel discussion focused on how to build a career in data science using R. The panelists (mentioned below) discussed topics like the different stages of career growth.
Here are the panelists (hosted by Jen Hecht of RStudio):
- Gabrielade Queiroz – Sr. Machine Learning Manager, AI Developer
- David Keyes – Consultant and Instructor, R for the Rest of Us
- Sydeaka Watson – Senior Data Scientist
Watch the panel discussion here.
I love rstudio::conf! It’s a paradise for R lovers and this year’s conference did not disappoint. I personally loved Paige Bailey’s talk on Deep Learning using R and the talk on data journalism as well.
You can get the code files and PPTs for the talks here.
What was your favorite talk from rstudio::conf 2020? Share your thoughts in the comments section below and let’s get the R community together!