Learn everything about Analytics

Home » Query a MongoDB Database using PyMongo!

Query a MongoDB Database using PyMongo!

Overview

  • We will discuss how you can query a MongoDB database using the PyMongo library.
  • We will cover basic aggregation operations in MongoDB.

 

Introduction

Following the global spread of the internet, we are generating data at an unprecedented pace now. Because performing any kind of analysis would require us to collect/query the necessary data from the database, it becomes of utmost importance that we choose the right tool to query the data. Consequently, we cannot imagine ourselves using SQL to work with this volume of data as every single query will be expensive.

Query MongoDB using PyMongo

Query a MongoDB database using PyMongo

This is precisely where MongoDB comes in. MongoDB is an unstructured database that stores data in the form of documents. Additionally, MongoDB is able to handle huge volumes of data very efficiently and is the most widely used NoSQL database as it offers rich query language and flexible and fast access to data.

In this article, we will see multiple examples of how to query a MongoDB database using PyMongo. Further, we will see how to use the comparison operators and logical operators, regular expression, and the basics of aggregation pipelines.

This article is in continuation of the MongoDB in Python Tutorial for Beginners where we covered the challenges of unstructured databases, installation steps, and basic operations of MongoDB. So if you are a complete beginner in MongoDB, I would recommend you go through that article first.

 

Table of Contents

  1. What is PyMongo?
  2. Installation Steps
  3. Insert the data into the database
  4. Query the database
    1. Filter based on Fields
    2. Filter based on Comparison Operators
    3. Filter based on Logical Operators
    4. Regular Expressions
    5. Aggregation Pipelines
  5. End Notes

 

What is PyMongo?

PyMongo is a Python library that enables us to connect with MongoDB. Moreover, this is the most recommended way to work with MongoDB and Python.

Also, we have chosen Python to interact with MongoDB because it is one of the most commonly used and considerably powerful languages for data science. PyMongo allows us to retrieve the data with dictionary-like syntax.

In case you are a beginner in Python, I will recommend you to enroll in this free course: Introduction to Python.

 

Installation Steps

Installing PyMongo is simple and straightforward. Here, I am assuming you already have Python 3 and MongoDB installed. The following command will help you to install PyMongo:

pip3 install pymongo

 

Insert the data into the Database

Now let us set things up before query a MongoDB database using PyMongo. Firstly we will insert the data into the database. The following steps will help you in this-

  1. Importing the libraries and connect to mongo client

    Start the MongoDB server on your machine. I am assuming it is running file at localhost:27017.

    Let’s start by importing some of the libraries that we are going to use. By default, MongoDB server runs at port 27017 on the local machine. Then, we will connect to the MongoDB client using the pymongo library.

    Then get the database instance of the database sample_db. In case it doesn’t exist, MongoDB will create one for you.

     

  2. Create the Collections from the JSON files

    We will use the data from a meal delivery company that operates in multiple cities. Additionally, they have various fulfillment centers in these cities for dispatching meal orders to their customers. You can download the data and the code here.

    1. weekly_demand:
      • id: unique ID for each document
      • week: Week Number
      • center_id: Unique ID for fulfillment center
      • meal_id: Unique ID for Meal
      • checkout_price: Final price including discount, taxes & delivery charges
      • base_price: Base price of the meal
      • emailer_for_promotion: Emailer sent for promotion of meal
      • homepage_featured: Meal featured at the homepage
      • num_orders: (Target) Orders Count
    2. meal_info:
      • meal_id: Unique ID for the meal
      • category: Type of meal (beverages/snacks/soups….)
      • cuisine: Meal cuisine (Indian/Italian/…)

    Then we will create two collections in the sample_db database:

    Query MongoDB using PyMongo - weekly demand collection

    Query MongoDB using PyMongo - meal info collection

  3. Insert data into Collections

    Now, the data we have is in the JSON format. Then we will get the instance of the collection, read the data file, and insert the data using the insert_many function.

     

Finally, we have 456548 documents in the weekly_demand_collection and 51 documents in the meal info collection. Now, let’s have a look at one document from each of these collections.

weekly_demand_collection

Query MongoDB using PyMongo - weekly demand collection find one

meal_info_collection

Query MongoDB using PyMongo - meal info collection

Now, our data is ready. Let’s move on to query this database.

 

Query the Database

We can query a MongoDB database using PyMonfo with the find function to get all the results satisfying the given condition and also using the find_one function which will return only one result satisfying the condition.

The following is the syntax of the find and find_one:

your_collection.find( {<< query >>} , { << fields>>} )

You can query the database using the following filtering techniques-

  1. Filter based on fields

    For instance, you have hundreds of fields and you want to see only a few of them. You can do that by just putting all the required field names with value 1. For example-

     

    Query MongoDB using PyMongo - Filter based on fields

    On the other hand, if you want to discard a few fields only from the complete document you can put the field names equal to 0. Therefore, only those fields will be excluded. Please note that you cannot use a combination of 1s and 0s to get the fields. Either all should be one or all should be zero.

    Query MongoDB using PyMongo - Filter based on fields

  2. Filter with a condition

    Now, in this section, we will provide a condition in the first braces and fields to discard in the second. Consequently, it will return the first document with center_id is equal to 55 and meal_id is equal to 1885 and will also discard the fields _id and week.

     

     

    Query MongoDB using PyMongo - Filter with a condition

  3. Filter based on Comparison Operators

    The following are the nine comparison operators in the MongoDB.

    NAME DESCRIPTION
    $eq It will match the values that are equal to a specified value.
    $gt It will match the values that are greater than a specified value.
    $gte It will match all the values that are greater than or equal to a specified value.
    $in It will match any of the values specified in an array.
    $lt It will match all the values that are less than a specified value.
    $lte It will match all the values that are less than or equal to a specified value.
    $ne It will match all the values that are not equal to a specified value.
    $nin It will match none of the values specified in an array.

    The following are some examples of using these comparison operators-

    1. Equal to and Not Equal to

      We will find all the documents where center_id is equal to 55 and homepage_featured is not equal to 0. Since we are going to use the find function, it will return the cursor for that command. Further, use a for loop to iterate through the results of the query.

       equal to not equal to

    2. In the List and Not in the List

      For instance, you need to match an element with multiple elements. In that case, instead of using the $eq operator multiple times, we can use the $in operator. We will try to find out all the documents where center_id is either 24 or 11.

      in the list and nott in the list

      Then we, find all the documents where center_id is not present in the specified list. The following query will return all the documents where center_id is not 24 and also not 11.

       in the list and not in the list

    3. Less than and Greater than

      Now, let us find all the documents where center_id is 55 and checkout_price is greater than 100 and less than 200. Use the following syntax for this-

      Query MongoDB using PyMongo - Less than and greater than

  4. Filter based on Logical Operator

    NAME DESCRIPTION
    $and It will join query clauses with a logical AND and returns all documents that match both the conditions.
    $not It will invert the effect of a query and returns documents that do not match the query expression.
    $nor It will join the query clauses with a logical NOR and return all documents that fail to match the clauses.
    $or It will join the query clauses with a logical OR and return all documents that match the conditions of either clause.

    The following examples illustrate the use of logical operators-

    1. AND Operator

      The following query will return all the documents where the center_id is equal to 11 and also the meal_id is not equal to 1778. The subqueries for the and operator will come inside a list.

      Query MongoDB using PyMongo - And operator

    2. OR Operator

      The following query will return all the documents where either the center_id is equal to 11 or the meal_id is either 1207 or 2707. Futher, the subqueries for the or operator will come inside a list.

      Query MongoDB using PyMongo - or operator

       

  5. Filter with Regular Expressions

    Regular Expressions are of great use when you have text fields and you want to search for documents with a specific pattern. In case you want to learn more about regular expressions, I highly recommend you go through this article: Beginners Tutorial for Regular Expressions in Python.

    It can be used with the operator $regex and we can provide value to the operator for the regex pattern to matc. We will use the meal_info collection for this query and then we will find out the documents where the cuisine field starts with character C.

     

    Let’s take another example of regular expressions. We will find out all the documents in which category starts from the character “S” and the cuisine ends with “ian“.

  6. Aggregation Pipelines

    MongoDB’s aggregation pipeline provides a framework to perform a series of data transformations on a dataset. The following is its syntax:

    your_collection.aggregate( [ { <stage1> }, { <stage2> },.. ] )
    

    The first stage takes the complete set of documents as input, and from there each subsequent stage takes the previous transformation’s result set as input to the next stage and produces the output.

    There are around 10 transformations available in the MongoDB aggregate out of which we will see $match and $group in this article. We will discuss each of the transformations in detail in the upcoming article of the MongoDB.

    For example, in the first stage, we will match the documents where center_id is equal to 11 and in the next stage, it will count the number of documents with center_id equal to 11. Note that we have assigned the $count operator the value equalling total_rows in the second stage which is the name of the field that we want in the output.

    Now, let’s take another example where the first stage is the same as before i.e. center_id is equal to 11 and in the second stage, we want to calculate the average of the field num_orders for the center_id 11 and the unique meal_ids for the center_id 11.

    Query MongoDB using PyMongo

     

End Notes

The unfathomable amount of data generated today makes it necessary to find better alternatives like this to query data. To summarize, in this article, we learned how to query a MongoDB database using PyMongo. Also, we understood how to apply various filters as per the situation required.

In case you want to learn more about querying data, I recommend the following course – Structured Query Language (SQL) for Data Science

In the upcoming article, we will discuss the aggregation pipelines in detail.

I encourage you to try things on your own and share your experiences in the comments section. Additionally, if you face any problem with any of the above concepts, feel free to ask me in the comments below.

 

 

 

You can also read this article on our Mobile APP Get it on Google Play