Comprehensive Beginner’s Guide to Jupyter Notebooks for Data Science & Machine Learning
One of the most common question people ask is which IDE / environment / tool to use, while working on your data science projects. As you would expect, there is no dearth of options available – from language specific IDEs like R Studio, PyCharm to editors like Sublime Text or Atom – the choice can be intimidating for a beginner.
If there is one tool which every data scientist should use or must be comfortable with, it is Jupyter Notebooks (previously known as iPython notebooks). Jupyter Notebooks are powerful, versatile, shareable and provide the ability to perform data visualization in the same environment.
Jupyter Notebooks allow data scientists to create and share their documents, from codes to full blown reports. They help data scientists streamline their work and enable more productivity and easy collaboration. Due to these and several other reasons you will see below, Jupyter Notebooks are one of the most popular tools among data scientists.
In this article, we will introduce you to Jupyter notebooks and deep dive into it’s features and advantages.
By the time you reach the end of the article, you will have a good idea as to why you should leverage it for your machine learning projects and why Jupyter Notebooks are considered better than other standard tools in this domain!
Are you ready to learn? Let’s begin!
Table of Contents
- What is a Jupyter Notebook?
- How to Install Jupyter Notebook?
- Getting Started!
- Using the Magic Functions
- Interactive Dashboards in Jupyter Notebooks – why not?
- Keyboard Shortcuts – Save time and become even more productive!
- Useful Jupyter Notebook Extensions
- Saving and Sharing your Notebook
- JupyterLab – The evolution of Jupyter Notebooks
- Best Practices and Tips
What is a Jupyter Notebook?
Jupyter Notebook is an open-source web application that allows us to create and share codes and documents.
It provides an environment, where you can document your code, run it, look at the outcome, visualize data and see the results without leaving the environment. This makes it a handy tool for performing end to end data science workflows – data cleaning, statistical modeling, building and training machine learning models, visualizing data, and many, many other uses.
Jupyter Notebooks really shine when you are still in the prototyping phase. This is because your code is written in indepedent cells, which are executed individually. This allows the user to test a specific block of code in a project without having to execute the code from the start of the script. Many other IDE enviornments (like RStudio) also do this in several ways, but I have personally found Jupyter’s individual cells structure to be the best of the lot.
As you will see in this article, these Notebooks are incredibly flexible, interactive and powerful tools in the hands of a data scientist. They even allow you to run other languages besides Python, like R, SQL, etc. Since they are more interactive than an IDE platform, they are widely used to display codes in a more pedagogical manner.
How to install Jupyter Notebook
As you might have guessed by now, you need to have Python installed on your machine first. Either Python 2.7 or Python 3.3 (or greater) will do.
For new users, the general consensus is that you should use the Anaconda distribution to install both Python and the Jupyter notebook.
Anaconda installs both these tools and includes quite a lot of packages commonly used in the data science and machine learning community. You can download the latest version of Anaconda from here.
The pip method
If, for some reason, you decide not to use Anaconda, then you need to ensure that your machine is running the latest pip version. How do you do that? If you have Python already installed, pip will already be there. To upgrade to the latest pip version, follow the below code:
#Linux and OSX pip install -U pip setuptools #Windows python -m pip install -U pip setuptools
Once pip is ready, you can go ahead and install Jupyter:
#For Python2 pip install jupyter #For Python3 pip3 install jupyter
You can view the official Jupyter installation documentation here.
We’ve now learned all about what these notebooks are and how to go about setting them up on our own machines. Time to get the party started!
To run your Jupyter notebook, simply type the below command and you’re good to go!
Once you do this, the Jupyter notebook will open up in your default web browser with the below URL:
In some cases, it might not open up automatically. A URL will be generated in the terminal/command prompt with the token key. You will need to copy paste this entire URL, including the token key, into your browser when you are opening a Notebook.
Once the Notebook is opened, you’ll see three tabs at the top: Files, Running and Clusters. Files basically lists all the files, Running shows you the terminals and notebooks you currently have open, and Clusters is provided by IPython parallel.
To open a new Jupyter notebook, click on the ‘New’ option on the right-hand side of the page. Here, you get four options to choose from:
- Python 3
- Text File
In a Text File, you are given a blank slate. Add whatever alphabets, words and numbers you wish. It basically works as a text editor (similar to the application on Ubuntu). You also get the option to choose a language (there are a plethora of them given to you) so you can write a script in that. You also have the ability to find and replace words in the file.
In the Folder option, it does what the name suggests. You can create a new folder to put your documents in, rename it and delete it, whatever your requirement.
The Terminal works exactly like the terminal on your Mac or Linux machine (cmd on Windows). It does a job of supporting terminal sessions within your web browser. Type python in this terminal and voila! Your python script is ready to be written.
But in this article, we are going to focus on the notebook so we will select the Python 3 option from the ‘New’ option. You will get the below screen:
You can then start things off by importing the most common Python libraries: pandas and numpy. In the menu just above the code, you have options to play around with the cells: add, edit, cut, move cells up and down, run the code in the cell, stop the code, save your work and restart the kernel.
In the drop-down menu (shown above), you even have four options:
- Code – This is self-explanatory; it is where you type your code
- Markdown – This is where you type your text. You can add your conclusions after running a code, add comments, etc.
- Raw NBConvert – It’s a command line tool to convert your notebook into another format (like HTML)
- Heading – This is where you add Headings to separate sections and make your notebook look tidy and neat. This has now been converted into the Markdown option itself. Add a ‘##’ to ensure that whatever you type after that will be taken as a heading
Using Jupyter Notebook’s Magic Functions
The developers have inserted pre-defined magic functions that make your life easier and your work far more interactive. You can run the below command to see a list of these functions (note: the “%” is not needed usually because Automagic is usually turned on):
You’ll see a lot of options listed and you might even recognise a few! Functions like %clear, %autosave, %debug and %mkdir are some you must have seen previously. Now, magic commands run in two ways:
As the name suggests, line-wise is when you want to execute a single command line while cell-wise is when you want to execute not just a line, but the entire block of code in the entire cell.
In line-wise, all given commands must started with the % character while in cell-wise, all commands must begin with %%. Let’s look at the below example to get a better understanding:
%time a = range(10)
%%timeit a = range (10)
I suggest you run these commands and see the difference for yourself!
To enable R in Jupyter, you will need the ‘IRKernel’ (dedicated kernel for R) which is available on GitHub. It’s a 8 step process and has been explained in detail, along with screenshots to guide you, here.
If you are a Julia user, you can use that within Jupyter Notebooks too! Check out this comprehensive article which is focused on learning data science for a Julia user and includes a section on how to leverage it within the Jupyter environment.
Interactive Dashboards in Jupyter Notebooks – Why not?
Before you go about adding widgets, you need to import the widgets package:
from ipywidgets import widgets
The basic type of widgets are your typical text input, input-based, and buttons. See the below example, taken from Dominodatalab, on how an interactive widget looks like:
You can check out a comprehensive guide to widgets here.
Keyboard Shortcuts – Save time and become even more productive!
Shortcuts are one of the best things about Jupyter Notebooks. When you want to run any code block, all you need to do is press Ctrl+Enter. There are a lot more keyboard shortcuts that Jupyter notebooks offer that save us a bunch of time.
Below are a few shortcuts we hand picked that will be of immense use to you, when starting out. I highly recommend trying these out as you read them one by one. You won’t know how you lived without them!
A Jupyter Notebook offers two different keyboard input modes – Command and Edit. Command mode binds the keyboard to notebook level commands and is indicated by a grey cell border with a blue left margin. Edit mode allows you to type text (or code) into the active cell and is indicated by a green cell border.
Jump between command and edit mode using Esc and Enter, respectively. Try it out right now!
Once you are in command mode (that is, you don’t have an active cell), you can try out the below shortcuts:
- A will insert a new cell above the active cell, and B will insert one below the active cell
- To delete a cell, press D twice in succession
- To undo a deleted cell, press Z
- Y turns the currently active cell into a code cell
- Hold down Shift + the up or down arrow key to select multiple cells. While in multiple selection mode, pressing Shift + M will merge your selection
- F will pop up the ‘Find and Replace’ menu
When in edit mode (press Enter when in command mode to get into Edit mode), you will find the below shortcuts handy:
- Ctrl + Home to go the start of the cell
- Ctrl + S will save your progress
- As mentioned, Ctrl + Enter will run your entire cell block
- Alt + Enter will not only run your cell block, it also adds a new cell below
- Ctrl + Shift + F opens the command palette
To see the entire list of keyboard shortcuts, press ‘H’ in command mode or go to Help > Keyboard shortcuts. Keep checking this regularly as new shortcuts are added frequently.
Useful Jupyter Notebook Extensions
Extensions are a very productive way of enhancing your productivity on Jupyter Notebooks. One of the best tools to install and use extensions I have found is ‘Nbextensions’. It takes two simple steps to install it on your machine (there are other methods as well but I found this the most convenient):
Step 1: Install it from pip:
pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
Once you’re done with this, you’ll see a ‘Nbextensions’ tab on the top of your Jupyter Notebook home. And voila! There are a collection of awesome extensions you can use for your projects.
To enable an extension, just click on it to activate it. I have mentioned 4 extensions below that I have found most useful:
- Code prettify: It reformats and beautifies the contents of code blocks.
- Printview: This extension adds a toolbar button to call jupyter nbconvert for the current the notebook and optionally display the converted file in a new browser tab.
- Scratchpad: This adds a scratchpad cell, which enables you to run your code without having to modify your Notebook. It’s a really handy extension to have when you want to experiment with your code but don’t want to do it on your live Notebook.
- Table of Contents (2): This awesome extension collects all the headers in your Notebook and displays them in a floating window.
These are just some of the extensions you have at your disposal. I highly recommend checking out their entire list and experimenting with them.
Saving and Sharing your Notebook
This is one of the most important and awesome features of a Jupyter Notebook. When I have to do a blog post and my code and comments are in a Jupyter file, I need to first convert them into another format. Remember these notebooks are in json format and that isn’t really helpful when it comes to sharing it. I can’t go about posting the different cells blocks in an email or on the blog, right?
Go to the ‘Files’ menu and you’ll see a ‘Download As’ option there:
You can save your Notebook in any of the 7 options provided. The most commonly used is either a .ipynb file so the other person can replicate your code on their machine or the .html one which opens as a web page (this comes in handy when you want to save the images embedded in the Notebook).
You can also use the nbconvert option to manually convert your notebook into a different format like HTML or PDF.
You can also use jupyterhub, which lets you host notebooks on it’s server and share it with multiple users. A lot of top notch research projects use this for collaboration.
JupyterLab – The evolution of Jupyter Notebooks
JupyterLab was launched in February this year and is considered the evolution of Jupyter Notebooks. It allows a more flexible and powerful way of working on projects, but with the same components that Jupyter notebooks have. The JupyterLab environment is exactly the same as a Jupyter Notebook, but with a more productive experience.
JupyterLab enables you to arrange your work area with notebooks, terminals, text files and outputs – all in one window! You just have to drag and drop the cells where you want them. You can also edit popular file formats like Markdown, CSV and JSON with a live preview to see the changes happening in real time in the actual file.
You can see the installation instructions here if you want to try it out on your machine. The long term aim of the developers is for JupyterLab to eventually replace Jupyter notebooks. But that point is still a bit further away right now.
While working alone on projects can be fun, most of the time you’ll find yourself working within a team. And in that situation, it’s very important to follow guidelines and best practices to ensure your code and Jupyter Notebooks are annotated properly so as to be consistent with your team members. Here I have listed down a few best practices pointers you should definitely follow while working on a Jupyter Notebook:
- One of the most important things for any programmer – always ensure you properly add comments for your code!
- Make sure you have the required documentation for your code
- Consider a naming scheme and stick to it throughout your code to ensure consistency. This makes it easier for others to follow along
- Whatever libraries you require for your code, import them at the start of your notebook (and add a comment next to them for what purpose you’re loading them)
- Ensure proper line spacing in your code. You don’t want your loops and functions in the same line – that makes for a maddening experience when it has to be referenced later!
- You’ll find sometimes that your file has become quite code heavy. Check out options on how to hide some of the code you deem not important for later reference. This can be invaluable to make your Notebook look tidier and cleaner
- Check out this notebook on matplotlib to see how beautifully and neatly it can be represented
Another bonus tip! When you think of creating a presentation, the first tools to come to mind are PowerPoint and Google Slides. Nut your Jupyter Notebooks can create slides too! Remember when I said it’s super flexible? I wasn’t exaggerating.
To convert your Notebook into slides, go to ‘View’ -> ‘Cell Toolbar’ and click on ‘Slideshow’. Boom! Each block of code now displays a ‘Slide Type’ drop-down option on the right. You will get the below 5 options:
Play around with each option to understand it better. It will change the way you present your code!
Do note that this is not an exhaustive list of things you can do with your Jupyter notebook. There is so much more to it and you pick these things up the more you use it. The key, as with so many things, is experimenting with practice.
Check out this GitHub repository which contains a collection of fascinating Jupyter Notebooks.
This guide is just the starting point in your data science journey and I’m glad you are taking it with me! Let me know your take on Jupyter Notebooks and how they have helped you in the comments section below. Also, if you have any questions – let me know!
Learn, train, compete, hack and get hired!