A Beginners’ Guide to Data Structures in Python

Ayushi Gupta 06 Jun, 2023 • 18 min read

Overview

  • Role of Data Structures in doing Analysis using Python
  • Getting to know about Tuple, List, Dictionary, Set, Array, Series and Data Frame
  • Major data structures explained in detail

Data Structures in Python

Introduction

Every one of us in our data science journey is inclined towards learning Python: The Python Programming Language. Why so? Majorly for the Predictive Analytics part of analytics which can be done in Python through 3 ways namely using the Statistical Methods, Machine Learning Methods and Deep Learning Methods depending upon the complexity of the business problem at hand. When it comes to learning to code in the language we can’t miss learning about the Data Structures here as they give us a way to store the data we will be using for our analysis. Every tool/language has its own set of data types and data structures and in the case of Python we have various data types and data structures however there are seven major data structures in Python. We will be covering all those 7 in this article. Let’s dive right into our discussion!

Data Structures in Python

In python, we have data types like str, float, int, bool etc. which help us know the type of the variables we store our information/data in. Further, we can club and store these variables having the same or different data types into a structure known as a data structure. Now depending on the characteristics and the nature of these data structures, we have several of them.

While some data structures are inbuilt i.e. are available in the basic python(example: tuple), some require the installation of specific packages(example: DataFrame). Also, we can also create our own data structures. Such data structures are known as User Defined Data Structures. Some of the most common data structures we can create are Stack, Tree, Queue, Linked List, Graph etc.

In today’s article we will be talking about the following data structures in detail:

  1. Tuple – inbuilt in python
  2. List – inbuilt in python
  3. Dictionary – inbuilt in python
  4. Set – inbuilt in python
  5. ndarray – from NumPy package
  6. Series – from Pandas package
  7. DataFrame – from Pandas package

We would not be taking the User Defined Data Structures in today’s discussion.

Basic Inbuilt Data Structures

Let us start our discussion with the basic data structures in Python which are inbuilt in Python and do not require any external libraries/packages to be installed first in order to use them. These are Tuple, List, Dictionary and Set. They have few properties in common.

  1. All four are 1D (one-dimensional) i.e. data can be stored either in the form of a single row or a single column.
  2. All four are heterogeneous in nature so we can store any type of data in these i.e. the entries can have the same or different data types. Don’t worry this point will be more clear moving forward.

Let’s clear up some of the confusion first!

In python, we store the individual values in variables of various data types.

For example, var1 = 20 so variable var1 contains an integer value 20 and var1 is known as an object of int type or int data type.

We store multiple values in data structures. In this case, the variables having multiple values are the objects of different classes.

For example: var2 = [1,2,3,4,5] so variable var2 contains a list(a data structure) and var2 is known as an object of the list class.

Don’t worry about it! These concepts will be clear by the end of this article.

Tuple

In general, a tuple is a finite ordered list of elements. Since we are dealing with Tuple as a data structure in Python its elements are variables/values of different data types. We can also give a data structure as an element in a tuple. Let’s start slow and deal with the creation of tuples in Python.

How to create a tuple?

We can use different ways to do so!

Method 1: Without using parentheses ()

tuple1 = 1,2,3,4
tuple1

type(tuple1)

Method 2 : Using parentheses ()

tuple2 = (1,2,3,4) 
tuple2

type(tuple2)

Here we took homogeneous data for creating a tuple i.e are integer elements. Let’s try with heterogeneous data as well taking elements with integer, float, string and bool data types.

tuple3= (1,5.675,'Tuple',False)
tuple3

print(type(tuple3))

We can also pass a tuple inside a tuple. This process is known as the nesting of tuples.

tuple4= (1,5.675,'Tuple',False,tuple3) 
tuple4

Not just tuples we can also store other data structures like List, Set and Dictionary inside a tuple, however, we will discuss this later in the article.

Method 3 : Using a type conversion function – tuple()

In this method, we convert some other data structure or the range object into a tuple using a function.

tuple5 = tuple(range(10,51,10))
tuple5

print(type(tuple5))

Accessing elements of a tuple

Note: Whenever it comes to accessing the data stored in a data structure i.e taking out the elements of a data structure we use square brackets [ ]. Also here we need to know about a term called index. In all the data structures the index starts from 0.

To take out a single element we specify its index in the brackets, its syntax being :

T1[index]

To take out multiple elements whether consecutive or with a certain step pattern we use slicers, its syntax being :

T1[start : end : step]

It displays those elements of T1 which have an index from start to end – 1 in the case of the step being a positive number and from step to end + 1 in case of the step being a negative number.

We can also get the elements out of a tuple based on some conditions and this process is known as indexing.

t=(200,'Python',True,800,'Accessing')
print(t[1]) #prints second element
print(t[-1]) #prints last element
print(t[::]) #prints all the elements
print(t[1:4]) #prints elements with index 1 to 3
print(t[:2]) #prints elements with index 0 to 1
print(t[1:]) #prints elements with index 1 to 4
print(t[0:4:2]) #prints elements with index 0,2

Modifying the data inside a tuple

I am having a tuple t with me where t = (200,’ Python’, True,800,’ Accessing’). I wish to change the value of the third element (which could be referred to as updating or modifying the data) to 50.

t[2]=50

And we get an ERROR! Why? A tuple is immutable. What does that mean? The data inside a tuple cannot be changed. Once the data is stored inside a tuple it cannot be modified or updated in the same memory location. This characteristic of a tuple is often helpful in the cases where we do not wish to change our data once entered due to security reasons.

However, if we ever wish to change the elements inside a tuple we can redefine it with an updated value.

t = (200,'Python',50,800,'Accessing')

It is not the same tuple t, it gets a new id.

Performing the Operations on a tuple

To get the list of all the methods and the attributes associated with a data structure or an object of the data structure we use the dir() function.

print(dir(tuple)) #tuple is a data structure
print(dir(t)) #t is an object of the tuple class (tuple created above)

As we can see there are a lot of attributes and methods associated with the tuple, let’s discuss a few of them.

t6=(200,'Python',True,800,200)

To count the number of occurrences of a particular element in a tuple: count

t6.count(200)

To get the position of the first occurrence of a particular element: index

t6.index(200)

To get the length of the tuple i.e. number of elements: len

len(t6)

List

The next data structure in our list is the list. It has many more functions and properties associated with it as compared to tuples. Let’s get started with this long section about the list data structure.

How to create a list?

There are 2 methods!

Method 1: Using closed brackets [ ]

list1=[3,60,900,12000]
print(list1)
print(type(list1))

Just like tuples, we can store data of various types here as well.

list2=[0,44.5,'List',False]
print(list2)
print(type(list2))

We can also pass a list inside a list. This process is known as the nesting of lists.

list3=[5,99.99,True,'Python',(1,2,'Learn',6.9),['Learning',77,'True']]
list3

Let’s check another thing!

list4=([7,8,9,'Creation'])
list4

Method 2 : Using list() i.e. a type conversion function

list5 =list(range(100,200,6))
print(list5)

Accessing elements of a list

This is done in the exact same way as we access elements of a tuple so let’s directly move to the example!

l=[3,True,100.0,'Access',(60,70,'List',99.99),[False,True,'Elements',4000]]
print(l[0]) #prints first element
print(l[-2]) #prints second last element
print(l[::]) #prints all the elements
print(l[4][2]) #prints the element 'List'
print(l[:-2:-1]) 
print(l[:3:-1])
Data Structures in Python

Modifying the data inside a list

Considering the above list l, we want to modify the last element and set it to ‘Updated’

l[-1]='Updated'
print(l)
Data Structures in Python

And it gets updated! This is something different from tuples!

A list can be updated unlike a tuple and this characteristic is the only difference between the two. Lists are mutable objects i.e. the objects that can be updated in the same memory location. Just as the immutable objects(a tuple for example) have their use cases similarly the mutable objects are extremely useful too and these are in fact used more in data science due to their updation property.

Performing the Operations on a list

First, let’s check out what are the various operations we can perform on a list.

print(dir(list))
Data Structures in Python

Consider the following list :

list6=[55,(-6,-5,-4),7,'Index',['A','B','C']]

Find the index of an element

To determine the index of element 7 in the list :

list6.index(3)

Adding elements to a list

We can add the elements at any position in the list, however, to add elements at the end we have special functions.

list6.append(69) # adds one element (value : 69) at the end
print(list6)
t6=(200,'Python',True,800,200)
list6.append(t6) #adds one element (object : t6 , a tuple) at the end
print(list6)

We have another method named extend, let’s try that out!

list6.extend(78) #want to add the value 78 at the last position

We get an error ! Why? The extend method works only on the iterables i.e objects on which we can iterate upon.

l7=[1,2,3,4,5]
list6.extend(l7) #want to add the values of list l7 as individual values in list6 
print(list6)

Removing elements from a list

We can remove any element from a list by specifying the value we want to remove, however, to remove the last element we have a specific function pop(). It displays the last element of the list and removes it permanently from the list. It can also be used to remove a particular element from the list with regard to the position.

list6.pop() #to remove the last element

print(list6)

list6.pop(4) #displays the element with index as 3 and removes it permanently

print(list6)

Let’s remove some specific elements by mentioning them. We will use the remove() function for this which removes the first occurrence of the passed value permanently from the list.

list6.remove(7) #removes the first occurrence of 7 from the list
print(list6)

Sorting the elements of a list

For sorting the elements in a list we use the sort() function however remember that this works only on the lists with homogeneous data of numeric and string values only.

list7 = [97,22,-3,4,-5,61]
list7.sort()
print("list7 =", list7)
list8 = ['python','excel','R','tableau']
list8.sort()
print("list8 =", list8)

And by now we have covered the major operations that we frequently perform on lists. As mentioned earlier, let’s see tuple creation with lists.

Storing a list inside a tuple

list0=[1,2.5,'list',True,(1,'Demo',3),['Py',6,'Thon']]
tuple7 = (1 , 10.5, 'Python' , True , tuple5 , list0)
print(tuple7)

Converting a list to a tuple

list0=[1,2.5,'list',True,(1,'Demo',3),['Py',6,'Thon']]
tuple8 = tuple(list0)
tuple8

Accessing elements from a nested tuple

tuple8[4][1]

Important to note: All the basic data structures are iterable!

Since a list is iterable, we can use this property to perform mathematical operations on the individual elements of the list and to extract elements from a list based on certain conditions. This will help us to display the elements, update the existing list and create a new list as well. We will be using the methods discussed above.

Displaying the elements of a list

demo_list = [1,20,40,60,80,100]
#Print all the values of the demo_list.
for elt in demo_list:
print(elt)

Creating a New List

Create a new list with values 50 less than the values in demo_list.

demo_list2 = [ ]
for elt in demo_list :  #Subtracting 50 from each element
           demo_list2.append(elt-50)
demo_list2

Create a new list with values from demo_list2 which are greater than 10.

demo_list3 = [ ]
for elt in demo_list2 :  #taking values greater than 10
  if elt>10:
           demo_list3.append(elt)
demo_list3

List Comprehensions

We can do the task done above easily by using list comprehension which behaves just like loops i.e it is an iterative statement. However, we can use it for one line operations only.

Create a new list with values 50 less than the values in demo_list.

demo_list2 = [elt - 50 for elt in demo_list]

Create a new list with values from demo_list2 which are greater than 10.

demo_list3=[elt for elt in demo_list2 if elt>10]

We can even combine these 2 operations together in just 1 list comprehension.

demo_list_new = [elt - 50 for elt in demo_list if elt>10]

I have always been confused about the terms! For example: Whether a list is a data structure or an object or a class! Since I got it cleared out so let me share the same with you. A list is a data structure. Also, we have a list class in python and any variable storing a list data structure in it is an object of the list class.

Dictionary

The next data structure we move on to is a dictionary which is a bit different from list and tuple.

How to create a dictionary?

To do so we use the curly braces { } providing key-value pairs inside it.

dict1={1: 'Python', 2:'SQL' ,3 :'Excel'}
print(dict1)
type(dict1)

Here 1,2,3 are the keys and ‘Python’, ‘SQL’, ‘Excel’ are the values.

There is an important thing to remember: we can only give the immutable objects like int, str, tuple etc. as a key, however, for values, anything can be given.

Data Structures as values in the dictionary

list_demo = [4.1,'Excel',99]
dict2= { 1: 6 , 2: 'Python', 3: False, 4: 6.876 , 5: (1,'SQL',7.7) , 6: list_demo} 
print(dict2)

Nesting a dictionary inside a dictionary

dict3= { 1: 6 , 2: 'Python', 3: False, 4: 6.876 , 5: (1,'SQL',7.7) , 6: [4.1,'Excel',99] , 7: dict1} 
print(dict3)

Accessing elements of a dictionary

dict1[0]

This is something different with dictionaries! To access the elements mention the key in the [ ] not the index.

dict1[1]

Modifying the data inside a dict

A dictionary in python is a mutable object and we can modify the data easily.

Updating value for an existing key

dict1[2]= 'Tableau'  
print(dict1)

Assigning a new key-value pair

dict1[4]='R'
print(dict1)

Operations on dictionaries

d= { 1: 6 , 2: 'Python' , 3: False, 4: 6.876 , 5: (1,'SQL',7.7) , 6: [4.1,'Excel',99]}
print(dir(d))

print(d.keys())
print(d.values())
print(d.items())
print(len(d))

Removing elements from dictionary

print(d.pop(4))
print(d)

The complete key-value pair is gone!

Here too we have a special function to remove the last element i.e the last key-value pair.

print(d.popitem())
print(d)

Set

This data structure is an easy one. We have been working on sets since our school days. In python, we relate sets with dictionaries and refer to a set as a keyless dictionary. However, the sets here do follow their basic intrinsic properties; they have no duplicates and the data is sorted in ascending order.

How to create a set?

Method1 : We use the curly braces { } for this.

set1={100,65,32,99}
print(type(set1))
set2={12,88.7,'Creating Set',(10,20,20)}
print(set2)

Method2: We use the Type conversion function set()

set3 = set([5,20,1,666,-22])
print(set3)

Because of the basic properties of sets, we cannot access the elements out of them as the elements have no positions assigned.

Operations of sets

s={10,20,30,40,50 } 
print(dir(s))

Display individual elements from the set

for elt in s:
print(elt)

s.add(70) #adds 70 to set s
print(s)
s.remove(50) #removes 50 from s
print(s)

Operations on 2 sets

s3= {100,200,500,1100,80}
s4= {80,200,77,123,99}
print("Intersection :" , s3.intersection(s4)) # displays the common elements
print("Union :", s3.union(s4)) # displays elements from both s3 and s4
print("s3 - s4 :", s3.difference(s4)) # displays s3-s4
print("Symmetric difference :", s3.symmetric_difference(s4)) # does not displays the common elements

Note that the above commands do not lead to a permanent change in any of the sets. Just for your reference, I am mentioning a command to make a permanent change!

s3.difference_update(s4)   
print(s3)

This updates the set s3 and the set s3 gets the values of the difference s3-s4.

Now let’s move on to the data structures defined in particular packages. To use them we need to first install and then import the respective packages.

ndarray

Defined in the numpy library of python, ndarray is an n-dimensional homogeneous data structure. So unlike the inbuilt data structures in python here we cannot store the data of varied types. In a single array, all the elements should have the same data type.

To work on ndarray we need to import the numpy library! Let’s do that.

import numpy as np

How to create an ndarray?

Method 1 – By type conversion

If we have a list with us we can easily type convert it to an array in python using the array function of numpy.

Consider the list l1,

l1=[20,25,30,35,40]
arr1= np.array(l1)
print("arr1 =" , arr1)
print(type(arr1))

This array function can take many other arguments and I would encourage you to check them out. We can also specify the datatype we want the elements to be of.

arr2=np.array(l1,dtype=np.int8)
print("arr2 =" , arr2)
print(type(arr2))

The output is the same but the difference is in the memory allocation! We will discuss this later in the article.

Method 2 – From numpy methods i.e. methods inbuilt in numpy

For using the various methods and attributes defined in numpy let’s first determine them.

print(dir(np))

Let’s create some arrays using these methods!

arr3= np.zeros(10) #gives 1d array of 10 values each 0.0 
arr4= np.zeros((2,5) , dtype=int) #gives array of shape (2,5) each value 0
arr5= np.ones((2,5) , dtype=int) #gives array of shape (2,5) each value 1
arr6= np.full((4,3),'A') #gives array of shape (4,3) each value 'A'
arr7= np.arange(5,50,2.5) #float values between 5 and 50 with 2.5 as gap
arr8= np.linspace(2,20 ,10) #10 equidistant float numbers between 2 and 20
arr9= np.random.random(20) #array of 20 random numbers between 0 and 1 
arr10= np.random.randint(25,150,10) #1d array of 10 random values between 25 and 149
print("arr3 =" , arr3)
print("arr4 =" , arr4)
print("arr5 =" , arr5)
print("arr6 =" , arr6)
print("arr7 =" , arr7)
print("arr8 =" , arr8)
print("arr9 =" , arr9)
print("arr10 =" , arr10)

Structure of ndarray

Let’s know about the structure of these ndarray’s created!

print(arr7.nbytes) #memory allocated to arr7
print(arr6.dtype)
print(arr4.shape)
print(arr8.ndim)
print(arr3.size)
print(arr5.T)

Accessing the array elements

Similar to the list and tuple here we can access the elements inside a 1D array (ndarray with n=1) using the index and by slicing. The rules remain the same!

Let’s consider 1D array arr7!

print(arr7)
print(arr7[4]) #displays 5th element
print(arr7[2:5]) #displays element with index 2,3 and 4

arr11= np.linspace(2,20 ,10 , dtype=int).reshape(5,2)
print(arr11)
print(arr11[2]) 
print(arr11[:,0])
print(arr11[4,1])
print(arr11[2,::])
print(arr11[::,1])
print(arr11[1:3,0:1])

We are done with the easy part: let’s try to take out elements conditionally!

Display the array of elements of arr11 less than 5

print(arr11[arr11<5])

The mutability of numpy arrays

The numpy arrays are mutable and hence can be updated easily.

Update the elements of arr11 which are less than 5 to 1.

arr11[arr11<5]=1
print(arr11)

Until now we have learnt about the data structures that don’t work on structured data. To deal with the structured data we have structures defined in the pandas’ library of python. Let’s import!

import pandas as pd

There are 2 of them: Series and DataFrame

Series

A Series in python is a 1-dimensional homogeneous data structure. It is just like the inbuilt data structures in terms of the dimension and like the ndarray from numpy in case of the type of data it can contain.

print(dir(pd.Series))

How to create a Series?

First, let’s type convert an existing data structure into a Series. Consider the list l1,

l1=[10,20,30,40,50,60,70]
s1=pd.Series(l1)
print(s1)

When I was learning about this I used to wonder why the first column is coming with values 0,1,2,… These are the indexes in fact the default indexes. We can give our own index as well known as the user-defined index.

s2=pd.Series(l1,range(1,8))
print(s2)

To determine the values and index we will use the attributes!

s2.index

s2.values

How to access elements from a Series?

To access elements from a Series we use .iloc and .loc along with the index.

.iloc[ ] always refers to the default index while. loc[ ] always refers to a user-defined index i.e. the index given by you.

When accessing elements using slicers

In case of .iloc[start:end:step] we get elements with index from start to end-1 while in case of .loc[start:end:step] we get elements with index from start to end.

#Here we get same output
s1.iloc[2]  #gives 3rd element
s2.iloc[2] #gives 3rd element
#Here we get different output
s1.loc[2] #element with UDI 2
s2.loc[2] #element with UDI 2
s2.iloc[1:4]

s2.loc[1:4]

We are done with the easy part: let’s try to take out elements conditionally! For this we can use .loc[ ] however .iloc[ ] does not work.

Display all elements between 10 and 40

s2.loc[(s2>=10) & (s2<=40)]

DataFrame

A DataFrame in python is a 2-dimensional heterogeneous data structure. It is unlike all the structures we have studied till now. A DataFrame is made up of Series only. It is used to store the structured data and is designed in such a way as similar to a table with rows and columns.

print(dir(pd.DataFrame))

How to create a DataFrame?

Method 1 – From other data structures like dict, Series

Let’s create a data frame from a dictionary! Remember that to do so the dictionary must have lists all of the same length.

d1 = {'Name' :['Ben','David','Ross'] ,
        'Marks':[99 , 65 , 80]} #creating the dictionary
df1 = pd.DataFrame(d1)
print(df1)
print(type(df1))

Method 2 – Loading the datasets from existing data stored in databases or files and storing it as a DataFrame

df2=pd.read_csv('C:/Users/ACER/Desktop/courses.csv')
print(df2)

Accessing elements from a data frame

Similar to a Series data structure here too we use the attributes .loc and .iloc to access the elements.

A lot of operations can be performed on data frames and we can do a lot of analysis by storing the data in it. To learn about these operations I strongly recommend you to read this article: Exploratory Data Analysis in Python

Frequently Asked Questions

Q1. What is data structure in Python?

A. In Python, a data structure is a way of organizing and storing data in memory to efficiently perform operations on it. Python provides several built-in data structures, including lists, tuples, dictionaries, sets, and more. These data structures have different characteristics and are suited for various tasks, such as storing collections of items, mapping key-value pairs, or maintaining unique elements.

Q2. What is tuple in Python?

A. In Python, a tuple is an ordered, immutable collection of elements enclosed in parentheses (). Tuples can contain elements of different data types and are often used to store related pieces of information. They are similar to lists, but unlike lists, tuples cannot be modified once created. Tuples are commonly used for returning multiple values from a function or for grouping related data together.

EndNotes

It was quite a long discussion on data structures in Python where we talked about their characteristics, performed various operations on them and got to know about the various methods and attributes associated with each of them.

I hope this article helped you in clearing your basic python concepts but as we know that this is not all so let me recommend you a comprehensive article on Python Programming for Data Science so that you grow in all dimensions.

I strongly recommend going through the following article :

A Complete Python Tutorial to Learn Data Science from Scratch

You can connect with me on LinkedIn.

 

Ayushi Gupta 06 Jun 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Python
Become a full stack data scientist