Introduction to Pandas DataFrame.query() function

Shipra Saxena 09 Feb, 2021 • 4 min read

Overview

  • Sometimes during EDA(Exploratory Data Analysis) selecting or filtering data frames can be a tedious task
  • Pandas DataFrame query() enables users to analyze and filter the data just like where clause in SQL

 

Introduction

Python is one of the dominantly used programming languages by data scientists. The reason for the proliferation of this language is that it is easy to use and enjoys strong community support, with dedicated libraries for data analysis and predictive modeling. Pandas is one such library that is widely accepted by the data science community for performing data cleaning and analysis.

pandas dataframe query

Sometimes during EDA(Exploratory Data Analysis) selecting or filtering data frames can be a tedious task. Hence pandas provide some functions that make this task a cakewalk even for beginners. The query is a highly useful function. It enables users to analyze and filter the data just like where clause in SQL. Also, it helps in compiling several conditions simultaneously in a memory-efficient manner as it does not uses temporary variables.

In this article, we are going to see a very interesting pandas data frame function Query.

 

Table of Contents

  1. Syntax
  2. Filter Employees of IT Department
  3. Employees in IT Department and Not Living in India
  4. Implementation of OR condition
  5. Using Environment Variable
  6. Why use Query?

 

Syntax

Let’s have a look at the syntax before jump into the query.

pandas.DataFrame.query(expr,inplace=false,**kwargs)
expr:The condition to filter the data.
inplace: weather the query modify the original data or return a modified copy.
**kwargs: Any other keyword argument you want to pass

Now we will look at some examples of how this amazing function eases your Analysis.

Let’s first create a Dataframe for the demo

import pandas as pd
import numpy as np
df= pd.DataFrame({'Name':['Vikram','John', 'Alex','Paul','Andrew','Rafel' ],
'Age':[28,39,21,50,35,43],
'Department':['HR', 'Finance','IT','HR','IT','IT'],
'Country':['USA','India','Germany','USA','India','India']
})
print(df.head())

Output

pandas query demo data

 

 

 

Now, let’s Filter the dataset using the query function provided by pandas for its DataFrame objects.

 

Filter employees of IT department

In this example want to select only the employees who work for the IT department.

import pandas as pd
import numpy as np 
df= pd.DataFrame({'Name':['Vikram','John', 'Alex','Paul','Andrew','Rafel' ],
                  'Age':[28,39,21,50,35,43], 
                  'Department':['HR', 'Finance','IT','HR','IT','IT'],
                  'Country':['USA','India','Germany','USA','India','India'] })
print(df.query('Department == "IT"'))

Here is the required result. We have all the IT department employees present in the dataset.

pandas query filter

 

Employees in-department IT and not living in India

Let’s take another example of satisfying two conditions simultaneously using ‘and’.

import pandas as pd
import numpy as np 
df= pd.DataFrame({'Name':['Vikram','John', 'Alex','Paul','Andrew','Rafel' ],
                  'Age':[28,39,21,50,35,43], 
                  'Department':['HR', 'Finance','IT','HR','IT','IT'],
                  'Country':['USA','India','Germany','USA','India','India'] })
print(df.query('Department == "IT" and Country != "India" '))

In the dataset, there is only one employee who satisfies both requirements.

not living in india

 

Let’s see the implementation of the OR condition

Here we present another example showing how can you implement the ‘OR’ clause. that means selecting the records satisfying either the first condition or the other.

import pandas as pd
import numpy as np 
df= pd.DataFrame({'Name':['Vikram','John', 'Alex','Paul','Andrew','Rafel' ],
                         'Age':[28,39,21,50,35,43], 
                         'Department':['HR', 'Finance','IT','HR','IT','IT'],
                        'Country':['USA','India','Germany','USA','India','India'] })
print(df.query('Age>= 40 or Country =="India"'))

Output

or

 

Using Environment variable

Sometimes we want to use the environment variables in the expression, we can do this using @ before the variable name. Just like SQL where you can use a subquery to reference for selecting rows where conditions are satisfied. The ‘@’ method saves the value that you want to compare against. as shown in the example below.

In the example Location is a local variable. To use it in a query expression you just need to put @ before it and it’s done.

import pandas as pd
import numpy as np

 df= pd.DataFrame({'Name':['Vikram','John', 'Alex','Paul','Andrew','Rafel' ], 
                   'Age':[28,39,21,50,35,43],
                   'Department':['HR', 'Finance','IT','HR','IT','IT'], 
                   'Country':['USA','India','Germany','USA','India','India'] }
Location= " India"                              
print(df.query( Country == @Location'))

Output

Environment variable

Note: Always remember if there is any space in your column name, but that between a pair of backtick (“) like strings while using in expression or replace the space with “_”

 

Why use Query?

For a data scientist or software programmer time and space, complexities are important aspects while considering the ways to perform a task. The query method is highly efficient for larger datasets. As it saves the memory also it offers cleaner syntax to the user.

 

End Notes

To summarize, here we have seen pandas.query a highly useful function to slice and dice the dataframe.  If you are an absolute beginner in python, I will recommend you to enroll in our Python course(Free) to enhance your fundamentals in python.

In the end, I have a task for you. Just pick a dataset and try different methods of filtering the data and compare them for time and memory efficiency. It will be great if you want to share your findings in the comments.

Also, comment below if you have any doubt.

 

Shipra Saxena 09 Feb 2021

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Related Courses