40 Helpful Python tips to Boost your speed as a Data Scientist

Bahauddin Taha 02 Sep, 2022

12 min read

This article was published as a part of the Data Science Blogathon

Introduction

Because of its simplicity and ease of learning, Python has become very popular these days. It is used for various activities such as data science, machine learning, web development, scripting, automation, etc. Python is one of the most demanding skills for data scientists. The simplicity of Python is the first of several advantages in data science. Although some data scientists have a background in computer science or know other programming languages, many data scientists come from statistics, mathematics, or other technical disciplines and may not have as much programming knowledge when they enter the industry. Python syntax is easy to understand and write, which makes it a fast and easy to learn programming language. In this article, I will introduce more than 40 ideas and methods that can help you speed up your data science activities on a regular basis.

Alright, let’s get started…..

List comprehensions

The elements of a list can be looped quite extensively in a single line. Let’s put it into practice using the following example:

The same thing can be done using dictionaries, sets, and generators. Let’s look at another example, this time with dictionaries:

dictionary = {'first_num': 1, 'second_num': 2,
              'third_num': 3, 'fourth_num': 4}
oddvalues = {key: value for (key, value) in dictionary.items() if value % 2 != 0}
print(oddvalues)Output: {'first_num': 1, 'third_num': 3}

Enumerate Function

Enumerate is a helpful function for iterating over an object like a list, dictionary, or file. The function produces a tuple that includes the values acquired from iterating through the object as well as the loop counter (from the start position of 0). When you wish to write code depending on the index, the loop counter comes in handy. Let’s look at an example where the first and last elements might be treated differently.

sentence = 'Just do It'
length = len(sentence)
for index, element in enumerate(sentence):
    print('{}: {}'.format(index, element))
    if index == 0:
        print('The first element!')
    elif index == length - 1:
        print('The last element!')

Output: 0: J
The first element!
1: u
2: s
3: t
4:
5: d
6: o
7:
8: I
9: t
The last element!

Files can also be enumerated with the enumerate function. Before breaking out of the loop, we will print the first 10 rows of the CSV file in the example below. We’re not going to replicate the result because it’s too long. You may, however, use it on whatever files you have.

with open('heart.csv') as f:
    for i, line in enumerate(f):
        if i == 10:
            break
        print(line)

Return Multiple Values by a Function

We frequently wish to return more than one value when designing functions. We’ll go through two typical approaches here:

Method 1:

Let’s start with the easiest option: returning a tuple. This technique is often used only when there are two or three values to return. When there are more values in a tuple, it’s simple to lose track of the order of the items.
In the code section below get_student is an example function that returns the employee’s first and last name as tuples based on their ID numbers.

# returning a tuple.
def get_student(id_num):
    if id_num == 0:
        return 'Taha', 'Nate'
    elif id_num == 1:
        return 'Jakub', 'Abdal'
    else:
        raise Exception('No Student with this id: {}'.format(id_num))

When we call the function get_student with the number 0, we notice that it returns a tuple with two values: ‘Taha’ and ‘Nate’

Student = get_student(0)
print('first_name: {}, last_name: {}'.format(Student[0], Student[1]))

Output: first_name: Taha, last_name: Nate

Method 2:

Returning a dictionary is the second option. Because dictionaries are key, value pairs, we may name the values that are returned, which is more intuitive than tuples.
Method 2 is the same as Method 1, it just returns a dictionary.

# returning a dictionary
def get_data(id_num):
    if id_num == 0:
        return {'first_name': 'Muhammad', 'last_name': 'Taha', 'title': 'Data Scientist', 'department': 'A', 'date_joined': '20200807'}
    elif id_num == 1:
        return {'first_name': 'Ryan', 'last_name': 'Gosling', 'title': 'Data Engineer', 'department': 'B', 'date_joined': '20200809'}
    else:
        raise Exception('No employee with this id: {}'.format(id_num))

It’s easier to refer to a specific value by its key when a result is a dictionary. We are calling the function with id_num = 0

employee = get_data(0)
print('first_name: {},nlast_name: {},ntitle: {},ndepartment: {},ndate_joined: {}'.format(
    employee['first_name'], employee['last_name'], employee['title'], employee['department'], employee['date_joined']))

Output: first_name: Muhammad,
last_name: Taha,
title: Data Scientist,
department: A,
date_joined: 2020-08-07

Compare multiple numbers just like in Math

If you have a value and wish to compare it to two other values, you may use the following basic mathematical expression: 1<x<30

That’s the kind of algebraic expression we learn in primary school. However, the identical statement may be used in Python as well. Yes, you read that correctly. Until now, you’ve presumably done comparisons in this format:

1<x and x<30

In Python, all you have to do is use the following: 1<x<30

x = 5
print(1<x<30)

Output: True

Turn a string into a list of strings:

Let’s imagine you got the input of a function as a string, but it should be a list like this:

input = [[1, 2, 3], [4, 5, 6]]

Rather than dealing with complex regular expressions, just import the module ‘ast’ and invoke its function literal eval:

import ast
def string_to_list(string):
    return ast.literal_eval(string)
string = "[[1, 2, 3],[4, 5, 6]]"
my_list = string_to_list(string)
print(my_list)

Output: [[1, 2, 3], [4, 5, 6]]

For-Else Method

This method is used to apply looping on a list. In general, when you want to iterate through a list you apply, a for-loop. But in this method, you can pass an else condition in a loop, which is extremely rare. Other programming language doesn’t support this method.

Let’s have a look at how it works in General. If you want to check if there is any even number in a list.

number_List = [1, 3, 7, 9,8]

for number in number_List:

if number % 2 == 0:

print(number)

break

else:

print("No even numbers!!")

Output: 8

If an even number is found, the number will be printed and the else part will not execute since we pass a break statement. If the break statement never executes then the else block will execute.

Do you want to find the n-largest or n-smallest element from a list?

By using “heapq” module you can find the n-largest or n-smallest element from a list. Let’s see an example:

import heapq
numbers = [80, 25, 68, 77, 95, 88, 30, 55, 40, 50]
print(heapq.nlargest(5, numbers))
print(heapq.nsmallest(5, numbers))

Output: [95, 88, 80, 77, 68] [25, 30, 40, 50, 55]

8. Do you want to pass all values of a list as a functions argument?

All elements of a list can be accessed by using a “*”

def Summation(*arg):
    sum = 0
    for i in arg:
        sum += i
    return sum
result = Summation(*[8,5,10,7])
print(result)

Output: 30

Do you want to repeat a whole string without looping?

Just multiply the string with a number, the number of times you want the string to be repeated. Then your work is done.

value = "Taha"
print(value * 5)  
print("-" *21)
Output: TahaTahaTahaTahaTaha

———————

10. Do you want to find the index of an element from a list?

Use “.index” to find the index an element from a list

cities= ['Vienna', 'Amsterdam', 'Paris', 'Berlin']
print(cities.index('Berlin'))

Output: 3

Do you want to print multiple elements in the same line?

print("Analytics", end="")
print("Vidhya")
print("Analytics", end=" ")
print("Vidhya")
print('Data', 'science', 'blogathon', '12', sep=', ')

Output: AnalyticsVidhya
Analytics Vidhya
Data, science, blogathon, 12

Do you want to separate big numbers to make it easy to read?

Sometimes when you try to print a big number, then it’ll be really confusing to pass the whole number and hard to read. Then you can use the underscore, to make it easy to read.

print(5_000_000_000_000)

print(7_543_291_635)

Output: 5000000000000
7543291635

Do you want to reverse the slicing of a list?

When you slice a list, then you need to pass the minimum, maximum, and step sizes. To make the slicing in the reverse order you just need to pass a negative step size. Let’s see an example:

sentence = "Data science blogathon"
print(sentence[21:0:-1])
# Take two steps forward
print(sentence[21:0:-2])

Output: nohtagolb ecneics ata

nhaobenisaa

What is the difference between “is” and “==”?

If you want to check whether two variables are pointing to the same object, then you need to use “is”

But if you want to check whether two variables are the same or not, then you need to use “==”.

list1 = [7, 9, 4]
list2 = [7, 9, 4]
print(list1 == list2) 
print(list1 is list2)
list3 = list1
print(list3 is list1)

Output: True
False
True

The first statement is True, because list1 and list2 both hold the same values, so they are equal. The second statement is False because the values are pointing to different variables in the memory and the third statement is True because list1 and list3 both pointing to a common object in memory.

How to merge 2 dictionaries in a single line of code?

first_dct = {"London": 1, "Paris": 2}
second_dct = {"Tokyo": 3, "Seol": 4}
merged = {**first_dct, **second_dct}
print(merged)

Output: {‘London’: 1, ‘Paris’: 2, ‘Tokyo’: 3, ‘Seol’: 4}

How to identify whether a string starts with a specific alphabet or not?

If you need to know if a string starts with a specific alphabet then you can use the indexing method which is common. But you can also use a function called “startswith“, it will tell you whether a string starts with a specific word or not, which you pass to the function.

sentence = "Analytics Vidhya"
print(sentence.startswith("b"))
print(sentence.startswith("A"))

Output: False
True

How to get the Unicode of a character?

If you need to know the Unicode of a character then you need to use a function called “ord” and pass the character in the function, whose Unicode you want to know. Let’s see an example:

print(ord("T"))
print(ord("A")) 
print(ord("h")) 
print(ord("a"))

Output: 84
65
104
97

How to get the key-value pair of a dictionary?

If you want to access the key and value of a dictionary differently, you can do that using a function called “items()”.

cities = {'London': 1, 'Paris': 2, 'Tokyo': 3, 'Seol': 4}
for key, value in cities.items():
    print(f"Key: {key} and Value: {value}")

Output: Key: London and Value: 1
Key: Paris and Value: 2
Key: Tokyo and Value: 3
Key: Seol and Value: 4

Do you know Boolean values can be utilized in mathematical operations?

False is considered as 0 and True is considered as 1

x = 9
y = 3
outcome = (x - False)/(y * True)
print(outcome)

Output: 3.0

How to add value in a specific position of a list?

If you want to add a value to a list by using the “append” function, but it will add a value in the last position of a list. So, what if you want to add value in a specific position of a list. You can do that also, you can use a function called “insert” to insert a value in a specific position of a list.

Syntax:

list_name.insert(position, value)

Let’s see an example.

cities = ["London", "Vienna", "Rome"]
cities.append("Seoul")
print("After append:", cities)
cities.insert(0, "Berlin")
print("After insert:", cities)

Output: After append: [‘London’, ‘Vienna’, ‘Rome’, ‘Seoul’] After insert: [‘Berlin’, ‘London’, ‘Vienna’, ‘Rome’, ‘Seoul’]

Filter() function

The working of the filter function lies within its name. It filters a specific iterator by a specific function passed within it. It returns an iterator.

Syntax:

filter(function, iterator)

Let’s see an example with filter function:

mixed_number = [8, 15, 25, 30,34,67,90,5,12]
filtered_value = filter(lambda x: x > 20, mixed_number)
print(f"Before filter: {mixed_number}") 
print(f"After filter: {list(filtered_value)}")

Output: Before filter: [8, 15, 25, 30, 34, 67, 90, 5, 12] After filter: [25, 30, 34, 67, 90]

How to create a function without having a boundary of parameters?

You can create a function without worrying about the parameters. You can pass any number of parameters you want when you call the function. Let’s see an example:

def multiplication(*arguments):
    mul = 1
    for i in arguments:
        mul = mul * i
    return mul
print(multiplication(3, 4, 5))
print(multiplication(5, 8, 10, 3))
print(multiplication(8, 6, 15, 20, 5))

Output: 60
1200
72000

How to iterate over two or more lists at a single time?

You can iterate over a single list using enumerate function, but when you have two or more lists, you can also iterate over them using the “zip()” function.

capital = ['Vienna', 'Paris', 'Seoul',"Rome"]
countries = ['Austria', 'France', 'South Korea',"Italy"]
for cap, country in zip(capital, countries):
    print(f"{cap} is the capital of {country}")

Output: Vienna is the capital of Austria
Paris is the capital of France
Seoul is the capital of South Korea
Amsterdam is the capital of Italy

How to alter the cases of letters in a sentence?

If you want to change the cases of letters i.e. upper case to lower case and lower case to upper case, then you can do that using a function called “swap case”. Let’s see an example:

sentence = "Data Science Blogathon."
changed_sen = sentence.swapcase()
print(changed_sen)

Output: dATA sCIENCE bLOGATHON.

How to check the size of the memory used by an object?

To check the memory used by an object first import the sys library then use a method of this library called “getsizeof“. It will return the size of the memory used by the object.

import sys
mul = 5*6
print(sys.getsizeof(mul))

Output: 28

Map() function:

The map() function is used to apply a specific function to a given iterator.

Syntax:

map(function, iterator)

values_list = [8, 10, 6, 50]
quotient = map(lambda x: x/2, values_list)
print(f"Before division: {values_list}")
print(f"After division: {list(quotient)}")

Output: Before division: [8, 10, 6, 50] After division: [4.0, 5.0, 3.0, 25.0]

How to reverse a whole string?

To reverse a string you can use the slicing method. Let’s see an example:

value = "Analytics Vidhya"
print("Reverse is:", value[::-1])

Output: Reverse is: ayhdiV scitylanA

How to find out the execution time of a block?

When you train your machine learning or deep learning model or simply run a block of code then you can check how much time it took to run the block of code. You have to use a magic function “%%time” at the top of the block of your code. It will show you the amount of time it took to run the code block. Let’s see an example:

%%time
sentence = "Data Science Blogathon."
changed_sen = sentence.swapcase()
print(changed_sen)

Output: dATA sCIENCE bLOGATHON.
Wall time: 998 µs

How to drop the left or right character of a string?

There are two functions called “rstrip()” and “lstrip()”, “rstrip” is used to drop some character from the right of a string and “lstrip” is used to drop some character from the left of a string. Both function’s default value is whitespace. But you can pass your specific character to remove them from the string.

sentence1 = "Data Science Blogathon       "
print(f"After removing the right space: {sentence1.rstrip()}") 
sentence2 = "        Data Science Blogathon"
print(f"After removing the left space: {sentence2.lstrip()}")
sentence3 = "Data Science Blogathon .,bbblllg"
print("After applying rstrip:", sentence3.rstrip(".,blg"))

Output: After removing the right space: Data Science Blogathon
After removing the left space: Data Science Blogathon
After applying rstrip: Data Science Blogathon

How to count the number of times an item appears in a list?

You can count the number of times an element appears in a list by running a for loop among them. But you can do it more easily, just by calling a method on the list called “count”. Here is an example:

cities= ["Amsterdam", "Berlin", "New York", "Seoul", "Tokyo", "Paris", "Paris","Vienna","Paris"]
print("Paris appears", cities.count("Paris"), "times in the list")

Output: Paris appears 3 times in the list

How to find the index of an element in a tuple or list?

You can find the index of an element in a tuple or list just by calling a simple method called “index” on that tuple or list. Here is an example:

cities_tuple = ("Berlin", "Paris", 5, "Vienna", 10)
print(cities_tuple.index("Paris")) 
cities_list = ['Vienna', 'Paris', 'Seoul',"Amsterdam"]
print(cities_list.index("Amsterdam"))

Output: 1
3

How to remove every element from a list or set?

You can remove all elements from a list or set by applying a method called “clear” on that list or set.

cities_list = ['Vienna', 'Paris', 'Seoul',"Amsterdam"]
print(f"Before removing from the list: {cities_list}")
cities_list.clear()
print(f"After removing from the list: {cities_list}")
cities_set = {'Vienna', 'Paris', 'Seoul',"Amsterdam"}
print(f"Before removing from the set: {cities_set}")
cities_set.clear()
print(f"After removing from the set: {cities_set}")

Output: Before removing from the list: [‘Vienna’, ‘Paris’, ‘Seoul’, ‘Amsterdam’] After removing from the list: [] Before removing from the set: {‘Vienna’, ‘Amsterdam’, ‘Seoul’, ‘Paris’}
After removing from the set: set()

How to join 2 sets?

For joining two sets you can apply the method called “union()”. It will join the two lists on which you applied the method.

set1 = {'Vienna', 'Paris', 'Seoul'}
set2 = {"Tokyo", "Rome",'Amsterdam'}
print(set1.union(set2))

Output: {‘Vienna’, ‘Tokyo’, ‘Seoul’, ‘Amsterdam’, ‘Rome’, ‘Paris’}

How to sort values of a list based on their frequency?

First, use “counter” from the module called collections to measure the frequency of each value, then apply a method called “most_common” on the result of the counter to sort values of the list based on their frequency.

from collections import Counter
count = Counter([7, 6, 5, 6, 8, 6, 6, 6])
print(count)
print("Sort values according their frequency:", count.most_common())

Output: Counter({6: 5, 7: 1, 5: 1, 8: 1})
Sort values according their frequency: [(6, 5), (7, 1), (5, 1), (8, 1)]

How to drop the duplicate values from a list?

First, convert the list into a set, this will remove the duplicate values because a set doesn’t contain duplicate values. Then convert the set to a list again, this way you can easily drop the duplicate values from a list.

cities_list = ['Vienna', 'Paris', 'Seoul',"Amsterdam","Paris","Amsterdam","Paris"]
cities_list = set(cities_list)
print("After removing the duplicate values from the list:",list(cities_list))

Output: After removing the duplicate values from the list: [‘Vienna’, ‘Amsterdam’, ‘Seoul’, ‘Paris’]

How to make a single sentence from all the elements of a list?

By using a method called “join” you can join all the single elements of a list and make a single string or sentence.

words_list = ["Data", "science", "Blogathon"]
print(" ".join(words_list))

Output: Data science Blogathon

How to return multiple values from a function at a single time?

Yes, you can do this in python. You can return multiple values from a function at a single time. Let’s see an example:

def calculation(number):
    mul = number*2
    div = number/2
    summation = number+2
    subtract = number-2
    return mul, div, summation, subtract
mul, div, summation, subtract = calculation(10)
print("Multiplication:", mul)
print("Division:", div)
print("Summation:", summation)
print("Subtraction:", subtract)

Output: Multiplication: 20
Division: 5.0
Summation: 12
Subtraction: 8

How to find out the difference between two lists?

First, convert the lists into sets, then apply the method called “symmetric_difference” on these sets. This will return the difference between these two lists.

cities_list1 = ['Vienna', 'Paris', 'Seoul',"Amsterdam", "Berlin", "London"]
cities_list2 = ['Vienna', 'Paris', 'Seoul',"Amsterdam"]
cities_set1 = set(cities_list1)
cities_set2 = set(cities_list2)
difference = list(cities_set1.symmetric_difference(cities_set2))
print(difference)

Output: [‘Berlin’, ‘London’]

How to convert two different lists into a single dictionary?

First, apply a zip function on these two lists, then convert the output of the zip function into a dictionary. Your work is done, it’s that easy to convert two lists into a single dictionary.

number = [1, 2, 3]
cities = ['Vienna', 'Paris', 'Seoul']
result = dict(zip(number, cities))
print(result)

Output: {1: ‘Vienna’, 2: ‘Paris’, 3: ‘Seoul’}

How to get the n largest or n smallest elements of a list?

First import the “heap q” module then apply the method “n largest” and “n smallest” and pass the value of n and the name of the list, this way you can get the n largest and n smallest elements of a list.

import heapq
numbers = [100, 20, 8, 90, 86, 95, 9, 66, 28, 88]
print(heapq.nlargest(3, numbers))
print(heapq.nsmallest(3, numbers))

Output: [100, 95, 90] [8, 9, 20]

End Notes

Thank you for sticking with me all the way to the end. Hope this article helped you learn something new.

About me

I am an undergraduate student, studying Computer Science, with a strong interest in data science, machine learning, and artificial intelligence. I like diving into data in order to uncover trends and other useful information. You can connect with me on Linkedin.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

B

Bahauddin Taha 02 Sep, 2022

I am an undergraduate student, studying Computer Science, with a strong interest in data science, machine learning, and artificial intelligence. I like diving into data in order to uncover trends and other useful information.

Beginner Programming Python Technique