Home » Everything you Should Know About Data Structures in Python

Everything you Should Know About Data Structures in Python

Overview

  • Data structures in Python are a key concept to learn before we dive into the nuances of data science and model building
  • Learn about the different data structures Python offers, including lists, tuples and much more

 

Introduction

A Data Structure sounds like a very straightforward topic – and yet a lot of data science and analytics newcomers have no idea what it is. When I quiz these folks about the different data structures in Python and how they work, I’m met with a blank stare. Not good!

Python is an easy programming language to learn but we need to get our basics clear first before we dive into the attractive machine learning coding bits. That’s because behind every data exploration task we perform, even analytics step we take, there is a basic element of storage and organization of the data.

And this is a no-brainer – it’s so much easier for us to extract information when we store our data efficiently. We save ourselves a ton of time thanks to our code running faster – who wouldn’t want that?

And that’s why I implore you to learn about data structures in Python.

In this article, we will explore the basic in-built data structures in Python that will come in handy when you are dealing with data in the real world. So whether you’re a data scientist or an analyst, this article is equally relevant for you.

Make sure you go through our comprehensive FREE Python course if you’re new to this awesome programming language.

 

Table of Contents

  • What are Data Structures in Python?
  • Data Structure #1: Lists in Python
    • Creating Lists
    • Accessing List elements
    • Appending values in Lists
    • Removing elements from Lists
    • Sorting Lists
    • Concatenating Lists
    • List comprehensions
    • Stacks & Queues using Lists
  • Data Structure #2: Tuples in Python
    • Creating Tuples in Python
    • Immutability of Tuples
    • Tuple assignment
    • Changing Tuples values
  • Data Structure #3: Dictionary in Python
    • Generating Dictionary
    • Accessing keys and values
  • Data Structure #4: Sets in Python
    • Add and Remove elements from Sets
    • Sets Operations

 

What are Data Structures?

Data structures are a way of storing and organizing data efficiently. This will allow you to easily access and perform operations on the data.

There is no one-size-fits-all kind of model when it comes to data structures. You will want to store data in different ways to cater to the need of the hour. Maybe you want to store all types of data together, or you want something for faster searching of data, or maybe something that stores only distinct data items.

Luckily, Python has a host of in-built data structures that help us to easily organize our data. Therefore, it becomes imperative to get acquainted with these first so that when we are dealing with data, we know exactly which data structure will solve our purpose effectively.

 

Data Structure #1: Lists in Python

Lists in Python are the most versatile data structure. They are used to store heterogeneous data items, from integers to strings or even another list! They are also mutable, which means that their elements can be changed even after the list is created.

 

Creating Lists

Lists are created by enclosing elements within [square] brackets and each item is separated by a comma:

Since each element in a list has its own distinct position, having duplicate values in a list is not a problem:

 

Accessing List elements

To access elements of a list, we use Indexing. Each element in a list has an index related to it depending on its position in the list. The first element of the list has the index 0, the next element has index 1, and so on. The last element of the list has an index of one less than the length of the list.

But indexes don’t always have to be positive, they can be negative too. What do you think negative indexes indicate?

While positive indexes return elements from the start of the list, negative indexes return values from the end of the list. This saves us from the trivial calculation which we would have to otherwise perform if we wanted to return the nth element from the end of the list. So instead of trying to return List_name[len(List_name)-1] element, we can simply write List_name[-1].

Using negative indexes, we can return the nth element from the end of the list easily. If we wanted to return the first element from the end, or the last index, the associated index is -1. Similarly, the index for the second last element will be -2, and so on. Remember, the 0th index will still refer to the very first element in the list.

But what if we wanted to return a range of elements between two positions in the lists? This is called Slicing. All we have to do is specify the start and end index within which we want to return all the elements – List_name[start : end].

One important thing to remember here is that the element at the end index is never included. Only elements from start index till index equaling end-1 will be returned.

 

Appending values in Lists

We can add new elements to an existing list using the append() or insert() methods:

  • append() – Adds an element to the end of the list
  • insert() – Adds an element to a specific position in the list which needs to be specified along with the value

 

Removing elements from Lists

Removing elements from a list is as easy as adding them and can be done using the remove() or pop() methods:

  • remove() – Removes the first occurrence from the list that matches the given value
  • pop() – This is used when we want to remove an element at a specified index from the list. However, if we don’t provide an index value, the last element will be removed from the list

 

Sorting Lists

Most of the time, you will be using a list to sort elements. So it is very important to know about the sort() method. It lets you sort list elements in-place in either ascending or descending order:

But where things get a bit tricky is when you want to sort a list containing string elements. How do you compare two strings? Well, string values are sorted using ASCII values of the characters in the string. Each character in the string has an integer value associated with it. We use these values to sort the strings.

On comparing two strings, we just compare the integer values of each character from the beginning. If we encounter the same characters in both the strings, we just compare the next character until we find two differing characters. It is, of course, done internally so you don’t have to worry about it!

 

Concatenating Lists

We can even concatenate two or more lists by simply using the + symbol. This will return a new list containing elements from both the lists:

 

List comprehensions

A very interesting application of Lists is List comprehension which provides a neat way of creating new lists. These new lists are created by applying an operation on each element of an existing list. It will be easy to see their impact if we first check out how it can be done using the good old for-loops:

Now, we will see how we can concisely perform this operation using list comprehensions:

See the difference? List comprehensions are a useful asset for any data scientist because you have to write concise and readable code on a daily basis!

 

Stacks & Queues using Lists

A list is an in-built data structure in Python. But we can use it to create user-defined data structures. Two very popular user-defined data structures built using lists are Stacks and Queues.

Stacks are a list of elements in which the addition or deletion of elements is done from the end of the list. Think of it as a stack of books. Whenever you need to add or remove a book from the stack, you do it from the top. It uses the simple concept of Last-In-First-Out.

Queues, on the other hand, are a list of elements in which the addition of elements takes place at the end of the list, but the deletion of elements takes place from the front of the list. You can think of it as a queue in the real-world. The queue becomes shorter when people from the front exit the queue. The queue becomes longer when someone new adds to the queue from the end. It uses the concept of First-In-First-Out.

Now, as a data scientist or an analyst, you might not be employing this concept every day, but knowing it will surely help you when you have to build your own algorithm!

 

Data Structure #2: Tuples in Python

Tuples are another very popular in-built data structure in Python. These are quite similar to Lists except for one difference – they are immutable. This means that once a tuple is generated, no value can be added, deleted, or edited.

We will explore this further, but let’s first see how you can create a Tuple in Python!

 

Creating Tuples in Python

Tuples can be generated by writing values within (parentheses) and each element is separated by a comma. But even if you write a bunch of values without any parenthesis and assign them to a variable, you will still end up with a tuple! Have a look for yourself:

Ok, now that we know how to create tuples, let’s talk about immutability.

 

Immutability of Tuples

Anything that cannot be modified after creation is immutable in Python. Python language can be broken down into mutable and immutable objects.

Lists, dictionaries, sets (we will be exploring these in the further sections) are mutable objects, meaning they can be modified after creation. On the other hand integers, floating values, boolean values, strings, and even tuples are immutable objects. But what makes them immutable?

Everything in Python is an object. So we can use the in-built id() method which gives us the ability to check the memory location of an object. This is known as the identity of the object. Let’s create a list and determine the location of the list and its elements:

As you can see, both the list and its element have different locations in memory. Since we know lists are mutable, we can alter the value of its elements. Let’s do that and see how it affects the location values:

The location of the list did not change but that of the element did. This means that a new object was created for the element and saved in the list. This is what is meant by mutable. A mutable object is able to change its state, or contents, after creation but an immutable object is not able to do that.

But we can call tuples pseudo-immutable because even though they are immutable, they can contain mutable objects whose values can be modified!

As you can see from the example above, we were able to change the values of an immutable object, list, contained within a tuple.

 

Tuple assignment

Tuple packing and unpacking are some useful operations that you can perform to assign values to a tuple of elements from another tuple in a single line.

We already saw tuple packing when we made our planet tuple. Tuple unpacking is just the opposite-assigning values to variables from a tuple:

It is very useful for swapping values in a single line. Honestly, this was one of the first things that got me excited about Python, being able to do so much with such little coding!

 

Changing Tuple values

Although I said that tuple values cannot be changed, you can actually make changes to it by converting it to a list using list(). And when you are done making the changes, you can again convert it back to a tuple using tuple().

This change, however, is expensive as it involves making a copy of the tuple. But tuples come in handy when you don’t want others to change the content of the data structure.

 

Data Structure #3: Dictionary in Python

Dictionary is another Python data structure to store heterogeneous objects that are immutable but unordered. This means that when you try to access the elements, they might not be in exactly the order as the one you inserted them in.

But what sets dictionaries apart from lists is the way elements are stored in it. Elements in a dictionary are accessed via their key values instead of their index, as we did in a list. So dictionaries contain key-value pairs instead of just single elements.

 

Generating Dictionary

Dictionaries are generated by writing keys and values within a { curly } bracket separated by a semi-colon. And each key-value pair is separated by a comma:

Using the key of the item, we can easily extract the associated value of the item:

These keys are unique. But even if you have a dictionary with multiple items with the same key, the item value will be the one associated with the last key:

Dictionaries are very useful to access items quickly because, unlike lists and tuples, a dictionary does not have to iterate over all the items finding a value. Dictionary uses the item key to quickly find the item value. This concept is called hashing.

 

Accessing keys and values

You can access the keys from a dictionary using the keys() method and the values using the values() method. These we can view using a for-loop or turn them into a list using list():

We can even access these values simultaneously using the items() method which returns the respective key and value pair for each element of the dictionary.

 

Data Structure #4: Sets in Python

Sometimes you don’t want multiple occurrences of the same element in your list or tuple. It is here that you can use a set data structure. Set is an unordered, but mutable, collection of elements that contains only unique values.

You will see that the values are not in the same order as they were entered in the set. This is because sets are unordered.

 

Add and Remove elements from a Set

To add values to a set, use the add() method. It lets you add any value except mutable objects:

To remove values from a set, you have two options to choose from:

  • The first is the remove() method which gives an error if the element is not present in the Set
  • The second is the discard() method which removes elements but gives no error when the element is not present in the Set

If the value does not exist, remove() will give an error but discard() won’t.

 

Set operations

Using Python Sets, you can perform operations like union, intersection and difference between two sets, just like you would in mathematics.

Union of two sets gives values from both the sets. But the values are unique. So if both the sets contain the same value, only one copy will be returned:

Intersection of two sets returns only those values that are common to both the sets:

Difference of a set from another gives only those values that are not present in the first set:

 

End Notes

Isn’t Python a beautiful language? It provides you with so many options to handle your data more efficiently. And learning about data structures in Python is a key aspect of your own learning journey.

This article should serve as a good introduction to the in-built data structures in Python. And if it got you interested in Python, and you are itching to know more about it in detail and how to use it in your everyday data science or analytics work, I recommend going through the following articles and courses:

You can also read this article on our Mobile APP Get it on Google Play

6 Comments