7 Python Hacks, Tips and Tricks for Data Science Projects

Ram Dewani 28 Apr, 2020

5 min read

Overview

Python is a superb language for data science but not everyone is a Python expert
Here, we present 7 Python hacks that’ll help you navigate your Python code in double-quick time!

Introduction

“Talk is cheap. Show me the code.” ― Linus Torvalds

Let’s start off with a slightly curious question – have you considered building machine learning models using just pen and paper? It’s actually a phenomenal method of learning how an algorithm works underneath. But here’s the catch – we are living in the midst of a data boom. This means we can’t calculate or design anything manually.

The data we work with in machine learning projects is huge (and I mean HUGE). To build a machine learning model at that scale, we need two resources working simultaneously – the data scientist and the machine! And there’s where the power of Python programming comes to the fore.

Honestly, Python is one of the most versatile languages I’ve worked with. It’s simple to learn and implement and yet is supremely effective at complex machine learning tasks. And the fact that there are a plethora of Python libraries to help us quickly perform these tasks – that’s just the icing on the cake.

While it is important to have a decent programming knowledge to perform data science, you don’t need to have the expertise of a Python developer. It’s important to keep the focus on what’s important: building better machine learning models.

In this article, I have crafted 7 Python hacks that will definitely save you a lot of time and effort, and make you much more effective at your job.

This is the fourth part of my Data Science hacks, tips, and tricks series. I recommend going through the previous articles to become a more efficient data scientist or analyst:

I have also converted my learning into a free course that you can check out:

Data Science Hacks, Tips, and Tricks!

Also, if you have your own Data Science hacks, tips, and tricks, you can share it with the open community on this GitHub repository: Data Science hacks, tips and tricks on GitHub.

We are posting these hacks daily on social media platforms like LinkedIn, Twitter, Facebook. Make sure to follow #avhackoftheday to get your daily dose of freshly brewed data science hacks, tips, and tricks!

Python Data Science Hack #1 – Zen of Python
Python Data Science Hack #2 – Format your code using Black
Python Data Science Hack #3 – Code debugging with Python Debugger
Python Data Science Hack #4 – Generator v/s list comprehension
Python Data Science Hack #5 – Pretty-print data structures
Python Data Science Hack #6 – Track Time for Nested Loops
Python Data Science Hack #7 – Lazy import data science libraries using Pyforest

Python Data Science Hack #1 – The Zen of Python

If you’ve ever looked to get some tips on how to become a better Python developer, then this is the place. This hack itself will give you 20 tips in one go.

The Zen of Python, written by Tim Peters, provides a beautifully written set of 20 guiding principles for writing computer programs that influence the design of the Python language.

You’ll notice that out of 20 principles, only 19 have been written down. Tim left the 20th principle for Guido van Rossum (author of the Python language) to fill in but it was left empty.

This is included as an easter egg in Python interpreter which you can access by typing import this.

Note: These are not hard and fast rules but it’ll be beneficial for you to make these principles your habit because these are the guiding principles that influence the design of the Python language.

Zen of Python in code

Python Data Science Hack #2 – Format your Code using Black

Just imagine living in a magical house that automatically cleans itself when you wake up. The bedsheet is folded, the dishes are done and you can enjoy your morning coffee. Doesn’t that sound awesome?

That is what black can do with your code! Black calls itself “the uncompromising code formatter” which I believe is because it has made life simpler for me as well my colleagues reading my code.

Black is an automatic code formatter for Python, therefore you just write code in your style and then black formats it into a consistently formatted code. This really helps in focusing on the content rather than the structure. Also, it makes code review faster. You can check out this cool playground which showcases the power of black.

Follow these steps to format your code automatically:

Save your Python file with .py extension.
Go to the terminal and type: “black [filename.py]”
Congrats! Your file is now formatted

Python Data Science Hack #3 – Code debugging with Python Debugger

All of us have experienced moments where we are stuck on a rather persistent error and spend tons of time on StackOverflow trying to figure out a solution. The Python debugger will make your life much easier in these situations.

pdp is an interactive debugger for Python programs. Whenever you come across an error, you can use pdb to inspect your code line-by-line. In this example, I have used %pdb which is a magic command in Jupyter notebooks to invoke the Python standard debugger.

Python Debugger

Data Science Hack #4 – Python Generators or List Comprehensions

The Python generator yields one item at a time and generates them only when in demand. Generators, hence, are much more memory efficient. Have you used it in your daily programming cycle yet?

If not, then I’ll try to give you reasons to include it in your practice through this code where I have compared the memory usage and time consumption of list comprehension and Python generators!

Code for Python Generators v/s List Comprehension

Data Science Hack #5 – Pretty Print Data Structures in Python

pprint is a built-in module in Python. It gives you the ability to print arbitrary data structures in a ‘pretty’ format that is well-formatted and much more readable.

Let’s see it in this example where we are trying to print a nested library. This is quite a useful Python hack to better understand data structures.

Code for pretty print hack

Data Science Hack #6 – Track Time for Nested Loops

Most of you must have heard about the tqdm Python library and you might even be using it to track the progress of forever running ‘for’ loops. Most of the time we write complex functions having nested ‘for’ loops. tqdm allows tracking that too.

Here is how you can track nested loops using tdqm in Python:

Code for tracking time of nested loops

Python Data Science Hack #7 – Lazy import data science libraries using Pyforest

Have you been in a situation where you feel you’re spending way too much time thinking of which libraries to import at the start of any data science project? It happens to most of us!

I have some awesome news for you – PyForest is the solution for all your library importing woes!

PyForest imports all the popular data science libraries to your work environment ONLY when you need them. This is known as lazy import in Python. Don’t worry, it doesn’t import all the list of libraries at once. It won’t import the libraries or functions of libraries you haven’t imported.

Check out this video to understand more about it:

End Notes

In this article, we covered 7 Python data science hacks, tips and tricks to become a better and efficient data scientist. I hope these hacks will help you with day-to-day niche tasks and save you a lot of time.

Let me know your Data Science hacks, tips and tricks in the comments section below!