K Sumanth — May 2, 2022
Advanced Data Engineering Database Guide MongoDB Python

 This article was published as a part of the Data Science Blogathon.

Introduction on Mongo DB

In the first part of this article, we will try to understand Mongo DB architecture along with how the Mongo DB database can be used for Data Science. We will first try to understand how is Mongo DB different from Relational DataBase. We will also try to understand the different features of Mongo DB that make it unique and scalable as compared to other Data Bases.

We will also look at some of the operations that can be performed on data using Mongo DB. Also, we will try to connect to Mongo DB using Python as a tool and perform various actions like Creating a Database, Creating a Collection, and Inserting a Document. Finally, we will also, look at Querying and Indexing Mongo DB using Python.

In the second part of the article, we will look at How to work with Mongo DB Atlas and how to visualize data using Mongo DB Atlas, and also how to connect with the cloud and Mongo DB Atlas which we will look at in the second and final article of this series.

What & Why of Mongo DB?

First, let us try to understand Mongo DB and NO- SQL before actually working with some of its features. A NO-SQL or non- SQL Database is one wherein it provides us with the facility to access the data using fields. (We will look into what fields are at a later part in this article). While in Relational Data Bases the data is stored in the form of Tables in the case of Non- Relational Data Bases such as Mongo DB data is stored in the form of Collections, Documents, and Fields. The following diagram gives us a clear understanding of the structure and differences between a Relational DataBase and Mongo DB.

 

SQL VS NO SQL (MONGO DB) COMPARISION

Now that we have seen the basic differences in terms of the structure of RDBMS and Mongo DB. Let us try to understand the advantages of Mongo DB over Relational Database and answer our primary question about why Mongo DB.

There are various features of Mongo DB that make it so popular in the industry and especially when dealing with different data formats, Big Data. These features of Mongo DB made it widely usable. In today’s world Data is growing at a tremendous speed in TB within a fraction of seconds. This data mostly comes from social media, Banking, etc which do not have rows and columns in it and are changing at a very high speed. In order to handle such huge volumes of data that come in a variety of formats, the Relational Databases are not suitable. Hence we use No- SQL databases like Mongo DB wherein data is stored in the format of BSON (Binary format of JSON) in the form of fields, Documents, and collections.

Features of Mongo DB

  1. High Performance
  2. Ad-hoc Queries
  3. Indexing
  4. Aggregation Framework
  5. Grid FS
  6. Sharding
  7. Document Oriented
  8. Replication
  9. BSON Format
  10. Capped Collection
  11. Mongo DB Management Services
FEATURES OF MONGO DB

Now, let us look in detail at these features:

High Performance

It is an open-source DataBase with high performance. It has a faster response because of Indexing and Replication (We will discuss Indexing and Replication below). On average Mongo DB is 100 times faster than an average Relational Database. As it is a document-oriented Database MongoDB makes accessing documents easy by Indexing. Thus, making it a better choice for Big Data and Real-time applications.

Adhoc-Queries

When we are designing a schema of a database, we don’t know about the queries we will perform in advance. Ad-hoc queries are the queries that are not known while structuring the database. So, MongoDB provides ad-hoc query support that makes it so special. These (Ad-hoc) queries are updated in real-time, thus leading to improvement in performance.

Indexing

Let’s now try to understand indexing in MongoDB. Indexes in MongoDB are a special data structure that is used to locate any record in the given table without the requirement of traversing through every record of the table. MongoDB makes use of this index to limit the number of documents that need to be searched in a collection.

INDEXING IN MONGO DB

Aggregation Framework

Mongo DB’s Aggregation Framework is a way to query the Data from Mongo DB.

AGGREGATION STAGES

                                                          

As can be seen from the above diagram we can see that there are different stages involved between input and output which works like a waterfall model i.e. output of one stage is fed as an input to the next stage until the desired output is achieved.

For instance, if we want to extract or view the data of a number of specific model cars purchased in a specific time period we can apply the Aggregation operation to our collection. The following is the syntax.

db.<Collection N
ame>.aggregate()

Aggregation operation must be defined as an array, Each array value is a stage.

( [ {Stage 1}, {Stage 2}, {Stage 3} ] )

Based on the aggregation condition in each stage the result should be passed (piped) to the next stage. Thus the final output will consist of only what is being queried.

The following are the most common stages in the Aggregation Framework.

$Project: Pass documents to a specific field or newly computed field to the next stage. 
$match: It reduces the number of documents passed to the next stage.
$group: Groups documents by a distinct.
$sort: Returns the documents in sorted order.
$Skip: It skips the documents based on a given condition.
$limit: Limit the number of documents sent to the next stage.
$unwind: This is used to unwind documents that are using arrays. This stage will increase the number of documents for the next stage.

Now, let us understand the different stages in the Aggregation Framework with an example. For instance consider that we have a collection named my dbcars having the following documents with information about Cars such as car name, Car specifications, Company Name, etc. Now, we want to find out the car name which has been sold the most in the particular time period.  The following aggregate query performs the same.

Aggregate Query

db.mydbcars.aggregate([
{ $match : { TAGS : "MySQL" , 'POSTED' : { '$gte' : ISODate("2018-12-31T23:59:59") }} },
{ $project : {Car: 1 ,Purchases:1}},
{"$group" : {_id:"$Car", HITS:{ $sum:"$Purchases"}}},
{$sort: {HITS: -1}}])

Stages Explained

Now, let us look at the different stages involved and what each stage does.

Stage 1

{ $match : {TAGS: "MySQL", 'POSTED': {'$gte': ISODate("2018-12-31T23:59:59")}} }

In this stage, the $match operator scans for all the documents which match the condition. Only those documents that match the condition are passed onto the next stage.

Stage 2

{ $project: {Car: 1, Purchases:1} }

In this stage, those fields mentioned in the $project operator only will be passed onto the next stage.

Stage 3

{ "$group": {_id:"$Car", Purchases: {$sum: "$Purchases"}} }

In this stage, the $group operator groups the documents by specified expression and outputs to the next stage a document for each grouping. The output document contains an _id field which will contain the distinct group by key.

The output documents contain an _id field that will contain the distinct group by key.

Stage 4

{ $sort: {HITS: -1} }

In this stage, the $Sort operator sorts by respective sort order. Here 1 is to specify ascending order and -1 is to specify descending order.

GRID FS

Now, let us look at the GRID FS feature of Mongo DB. It is the specification for storing and retrieving large files such as images, audio files, video files, etc. It is a file system to store files however its data is stored in Mongo DB Collections. It has the capability to store files even greater than its document size limit of 16 MB.

GridFS will divide a file into chunks and store each chunk of data in a separate document, each of a maximum size of 255k.

GridFS uses two collections i.e.fs. files and fs. chunks as shown in the below image to store the file’s metadata and the chunks. Each chunk is identified by its unique_id.

GRID FS IN MONGO DB

                                                              

For more details on GRIDFS, you can visit the following page. Mongo DB manual on GRID FS.

Sharding

It is a method for distributing data across multiple servers/machines. This facility of sharding is used by Mongo DB to support deployments of very large data sets and high throughput operations. Databases with large data sets can challenge the capacity of a single server. Practically, it is not feasible to store exponentially growing data on a single machine/server. Querying a large amount of data stored on a single machine/ server will result in high resource utilization and may also not provide us with a satisfactory read and write throughput.

SHARDING ARCHITECTURE

                                                              

Document Oriented

Mongo DB is a document-oriented database. In the case of RDBMS, there are tables, rows, and columns for storing data. Whereas in the case of NO-SQL Database like Mongo DB there are fields instead. There are different documents that can store different types of data. Collections of similar documents. Each document will have a unique key id or Object id.

Replication

Replication as the name suggests is the replication or a copy of the same data onto multiple servers by Mongo DB. This is achieved by using a Replica Set. A replica set is basically a group of Mongo DB instances that maintain the same data set and pertain to any mongod process. Replication makes ready for contingencies. When the primary node is down for some reason the other node becomes the primary node and thus making the operations smooth.

1. Replication Keeps our Data Safe.

2. Makes data available (24*7)

3. Disaster recovery is made possible by Replication.

4. No downtime for Maintenance.

 

 

REPLICATION IN MONGO DB

 

BSON Format

MongoDB stores documents (objects) in a format called BSON. BSON- “Binary JSON”, but also contains extensions that allow representations of data types that are not part of JSON. The key advantage as compared to XML and JSON is efficiency. BSON is more “schemaless”.

Capped Collection

These collections are fixed-size circular collections that support high-throughput operations that will insert and retrieve documents based on insertion order. By circular, we mean that when the fixed size allocated to the collection is exhausted, then it will start deleting the oldest document without providing any explicit commands.

Mongo DB Management Services

Using MongoDB Management Services we can track our databases or machines and also we can back up our data. It provides a feature of custom alert, with which we can discover issues before the MongoDB instance gets affected.

Common Terminology in Mongo DB

Now let us look at some of the common terminologies used in Mongo DB to familiarize ourselves with Mongo DB.

1. Document

2. Collection

3. Date Operator

4. Delete Command

Document

A document in MongoDB is a basic unit for storing data. A document is like the heart of Mongo DB where the data is stored in the JSON format. Mongo DB document is an ordered set of key-value pairs wherein for every key there is an associated value associated with it.

Mongo DB documents have the following structure that is composed of field and value pairs as below.

{

field1: value1,
field2: value2,
field3: value3,
fieldN: valueN
}

The value of the field can be of BSON data type i.e. BSON is a binary representation for JSON documents.

Collection:

Now, let us look at what collections are. Mongo DB stores documents in collections. Collections are very much similar to Tables in Relational Database.

COLLECTIONS IN MONGO DB

         

Data Operator

We use the Date Operator to see the current date in Mongo DB.

Data: It returns the current data as a string

New Data: It returns the current date as a data object.

Delete Command: A delete command is used to delete documents, collections, etc from MongoDB. A Delete command takes the following format.

 

{
   delete: ,
   deletes: [
      {
        q : ,
        limit : ,
        collation: ,
        hint: 
      },
      ...
   ],
   comment: ,
   let: , // Added in MongoDB 5.0
   ordered: ,
   writeConcern: {  }
}

Now let us look at the collection.remove() – It will delete a single document that matches the filter.

DB.collection.delete many()- Deletes all the specified documents that match with the filter.
You may visit the following page from Mongo DB documentation for detailed information on Delete Command.

https://www.mongodb.com/docs/manual/reference/command/delete/

Mongo DB Environment Setup

Now let us look at how to set up Mongo DB on our local machine. The following steps will take you through the process.

Step 1

Before downloading MongoDB onto our local machine we must know which windows version are you using. To know the same open your command prompt and type and execute the commands as shown in the below screenshot.

Mongo DB Environment Setup

Step 2

After knowing our windows architecture we can download the Mongo DB setup file accordingly from the below link.

Mongo DB Community Server

 

As seen in the above screenshot go to Mongo DB Community Server and select your platform from the drop-down and the package and click on Download. The above link will take you to the download page directly.

Step 3

Now run the downloaded file and follow the installation steps.

MONGO DB SETUP INSTALLATION

 

We can choose a custom directory for the MongoDB data and log files. Otherwise, by default, MongoDB will get installed to C:Program FilesMongoDBServer3.6.

We can install MongoDB at any location of our choice. All MongoDB data is stored in a specified data directory. We need to create it manually in the MongoDb folder. For this please run the following command on your command prompt.

md datadb

We can install MongoDB at any location of our choice.

Command Prompt MongoDB

Start Mongo DB

In order to start MongoDB, we need to execute the following command in our command prompt.

“C:Program FilesMongoDBServer3.6binmongod.exe”

This will start the Main MongoDB process. We will see at the bottom of the Command Prompt a message as “Waiting for a Connection” which means the process has been started.

Connect to the Mongo DB

In order to connect to the Mongo DB server, we need to run the following command on our Command Prompt.

“C:Program FilesMongoDBServer3.6binmongo.exe”

Now we need to create some directories using a command prompt & executing the following commands:

mkdir c:datadb
mkdir c:datalog

Now we will create a folder and name it data and then create two folders as dB and log inside the data folder. All of our database and the log files will be stored in these two folders.

Command Prompt

Create a Mongo DB Configuration File

In order to create a MongoDB configuration the following is the command. This file is required to be created otherwise the system will throw an error and the service will not be started.

systemLog:
    destination: file
    path: c:datalogmongod.log
storage:
    dbPath: c:datadb

Create the Mongo DB Service

We need to execute the following command to create the Mongo DB service.

sc.exe create MongoDB binPath= “”C:Program FilesMongoDBServer3.6binmongod.exe” –service –config=”C:Program FilesMongoDBServer3.6mongod.cfg”” DisplayName= “MongoDB” start= “auto”

If the service is created successfully we get the service created successfully message.

Start the Mongo DB Service

In order to start the service, we have to just write the following command in the command prompt.

net start MongoDB.

Similarly in order to stop the service we have to run the following command.

net stop MongoDB

Mongo DB Data Types

Now, let us look at different Data Types of Mongo DB.

Mongo DB stores data in BSON i.e. Binary encoded format of JSON. The following are the different data types in MongoDB. Each datatype corresponds to a unique number which can be used to identify them in the $type method.

DATATYPES IN MONGO DB

 

Now, let us look at some of the data types which are not so common.

 Undefined: This datatype of Mongo DB will store the undefined values.

Object ID: This data type in MongoDB stores the unique key id of documents stored. There is an id field for each document in Mongo DB.

Regular Expression: This data type stores regular expressions in MongoDB.

Min & Max Key: Min key will compare the value of the lowest BSON element. Mas key will compare the value against the highest BSON element.

Mongo DB Create and Drop Database

Now let us look at the concept of creating and dropping a Database in Mongo DB. In MongoDB, we need not create a DataBase as we do in the case of an RDBMS using a Create Command. In the case of MongoDB, we need not create a DataBase manually or it is optional to create one as in the case of Mongo DB it automatically creates a Data Base for the first time when we store the values in the collections. So, explicitly we need not create a DataBase instead it will be created automatically once the collection is filled with values.

The ‘use’ command can be used to create a DataBase in Mongo DB. The “use” command is usually followed by the database name that needs to be created. This command will tell the MongoDB client to create the database with this name. Mongo DB will create a new database if there is no database with that name else it will return the existing database with this name.

The code for creating the Test database is as below.

use my_Test_db

Here ‘use’ is the command for creating a new DataBase in MongoDB and My_Test is the name of the DataBase.

Also, if we want to view all the existing DataBases in Mongo DB we can use the following command or syntax.

show dbs

In order to know our current database, we can use the following command.

db

It will show us the name of the current DB as an output.

Dropping a Database in Mongo DB

In order to drop a database in MongoDB, it is very similar to the Drop command in SQL which is used to drop a Database or a Table. For using the Drop Command we have to reside on the current database.

The following is an example command for dropping a database in Mongo DB.

db.dropDatabase()
{ "dropped": "my_project_db", "ok": 1}

The output after dropping the database.

 

This way it will drop the existing database in which we are currently residing.

Mongo DB Operations on Document and Regular Expressions

Now, let us look at Insert, delete, update, and querying a MongoDB document along with the syntax one by one. Most of these operations are very much similar to SQL operations.

Insert Values in Mongo DB Document

This query is used when we want to insert any value into the MongoDB document. The following is the code to insert one value into the Mongo DB document at a time.

db.collection.insertOne()

Let us look at the same with an example below where we will try to insert a MongoDB document in the collection named Fruits. The code to insert a document is as below.

db.Fruits.insertOne(
{ item: "Orange", qty: 100, tags: ["Orange"], size: { h: 10, w: 5.5, uom: "cm" } }
)

Here, in the above example insert one() will return a document that includes a newly inserted document’s id value.

To retrieve the document that we have just inserted we can write the following code.

db.Fruits.find( { item: "Orange" } )

Similarly, if we want to insert many documents at a time then we can write the following code.

db.collection.insertMany()

Let us look at the above concept of inserting many documents with an example code below.

db.Fruits.insertMany([
{ item: "Orange", qty: 25, tags: ["Orange"], size: { h: 10, w: 5, uom: "cm" } },
{ item: "Pine Apple", qty: 85, tags: ["Fruit"], size: { h: 10.0, w: 9.5, uom: "cm" } },
{ item: "Water Melon", qty: 25, tags: ["Green", ], size: { h: 20, w: 15, uom: "cm" } }
])

In the above code insert many() will return a document that includes the newly inserted_id field values for the same document.

Now if we want to retrieve the inserted values in the document we may use the below command.

db.Fruits.find( {} )

Now we will look at how to query a MongoDB document. Here we will use the Fruits Collection.

db.Fruits.insertMany([
{ item: "Orange", qty: 25, tags: ["Orange"], size: { h: 10, w: 5, uom: "cm" }, status:"Shipped"},
{ item: "Pine Apple", qty: 85, tags: ["Fruit"], size: { h: 10.0, w: 9.5, uom: "cm" },status:"Ordered"},
{ item: "Water Melon", qty: 25, tags: ["Green", ], size: { h: 20, w: 15, uom: "cm" },status:"Shipped"},
]);

Now we shall look at how to select all documents in a collection. In order to do the same, we can pass an empty document to the find method.

The following is the code to select all the documents from our collection.

db.Fruits.find( {} )

This is very much similar to Select * From Table in SQL.

Specifying Equality Condition

Now, we shall look at how to specify equality conditions in querying a MongoDB document. The following code will help us do the same wherein we will use the : expressions to specify equality conditions.

{ : , … }

The following is the example code to do the same with our Fruits Collection.

db.Fruits.find( { status: "Shipped" } )

This is very much similar so SQL Commands where we select all the records from the Table where Status = “Shipped”.

Specify AND Condition in Mongo DB Collection

The following is the code to specify AND Condition on our Fruits Collection.

db.Fruits.find( { status: "Shipped", qty: { $lt: 30 } } )

The following is the code to specify AND Condition on our Fruits Collection. Using the above code we will retrieve all the documents from Fruits Collection where the status is “Shipped”, and the quantity is less than 30.

Now, let’s try to specify an OR condition similar to the one in SQL. The following is the code to specify OR conditions.

db.Fruits.find( { $or: [ { status: "delivered" }, { qty: { $lt: 30 } } ] } )

This is very much similar to the OR operator in SQL.

Update and Delete Mongo DB documents: Now let us look at how to perform the operations of Updating and deleting a single and Multiple Mongo DB documents.

The following is the code to update the single Mongo DB document for the Fruits Collection.

db.Fruits.updateOne(
{ item: "Orange" },
{
$set: { "size.uom": "cm", status: "Ordered" },
$currentDate: { lastModified: true }
}
)

Here the document will be updated when the item is equal to Orange.

Now, let us look at how to update Multiple MongoDB documents. The following is the code to update Multiple Mongo DB documents.

db.Fruits.updateMany(
{ "qty": { $lt: 50 } },
{
$set: { "size.uom": "in", status: "Shipped" },
$currentDate: {lastModified: true }
}
)

Here the documents of Fruits collection will be updated where quantity is less than 50 and status is shipped.

Now, let us try to understand how to delete a document from MongoDB using the same Fruits Collection example.

The syntax to delete MongoDB documents is as below.

db.Fruits.deleteMany({})

Deleting a Mongo DB Document Based on Condition

Now, let us try to understand how to delete a document from MongoDB using the same Fruits Collection example based on a condition. The following is the code for deleting a Mongo DB document based on a condition.

db.Fruits.deleteOne( { status: "shipped" } )

Once the above code is executed the first document with the Status as shipped will be deleted from the Fruits Collection.

Now, let us delete Many documents at a time which match a certain condition. The following is the code for deleting many documents.

db.Fruit.deleteMany({ status : "shipped" })

This will delete all the documents from the Fruits collection where the status is shipped. This will come in handy when we want to delete multiple documents from a collection based on a condition.

Here, we have seen multiple queries that can be executed on a MongoDB collection like Insert, update, and delete also we have seen the queries to modify the collections based on a condition.

 Mongo DB Regular Expressions

Now let us look at some of the Regular Expressions in Mongo DB. First, let us try to understand what are Mongo DB regular expressions are and where we use them. Mongo DB regular expressions are used for matching different patterns which are useful for finding strings within documents. Sometimes we may not know what field value we should search for in documents for a specific collection. Here we can use regular expressions in MongoDB for finding out some unique matches for any document.

Mongo DB $ Regex

This regex operator is used to search for strings in the collection. In order to understand it better, we shall take an example table with two fields as below.

ITEM _ID ITEM _NAME
1. Laptop
2. C.P.U
3. Tablet
4. Table
5. Camera
6. Mouse
7. Monitor
8. Tube Light

Now we will try to find the Items starting with Mo.

db.Item.find({ITEM _NAME: {$regex: "Mo" }}).forEach(printjson)

MongoDB $options

When we want to provide some additional options in the regex operator then we can use the $options keyword. For instance, if we want to search all documents which are having “Mo” in their Product Name irrespective of whether it is case sensitive or not. The following is the code to find the item names starting with MO.

db.Item.find({ITEM_NAME:{$regex: "mo",$options:'i'}}).forEach(printjson)

In the above code, $options with the ‘i’ parameter specify that we want to carry out a search irrespective of the upper or lower case. Thus, in our example, we are searching for Item Names starting with mo irrespective of whether it is lower case or upper case. This comes in handy when we want to search for a specific pattern without taking into consideration whether it is lower case or upper case.

Fetching Last n Documents From a Mongo DB Collection

Now, let us look at how to fetch the last n documents from a MongoDB collection. The following is the code for fetching the Last 2 documents from our Mongo DB collection.

db.Item.find().sort({_id:-1}).limit(2).forEach(printjson)

Here the limit is used to display the number of records we want as it is set to 2 we will search for the last 2 documents and here -1 is used for sorting the document in descending order.

Mongo DB Backup and Restore Options

If we want to take the Backup of our data from Mongo DB we can use the mongodump command.

The following is the syntax to backup our data using mongodump.

>mongodump

For this, we will have to start our server, we can start it on localhost and port number 27017. Go to the command prompt on your machine and type mongodump.

mongodump --host data-analytics.training --port 27017

If we want to back up only a collection out of a database we can use the following command.

mongodump --collection Fruits--db Data_Analytics

Here the collection of fruits is being backed up from the Data_Analytics database.

Mongo DB Restore

Similarly, if we want to restore all the data from a backup directory back to the Mongo DB database we can use the following command.

>mongorestore

Once we run the above command on our command prompt the output is as below.

Mongo DB Restore

Hence we can use the mongodump command to take the backup of our data in MongoDB and to restore the data we can use the mongo restore command.

Python and MongoDB

Prerequisites

We need to first download and install python on our machine. The following link will take you to the Official download page of python.

Now we shall start using MongoDB using Python. MongoDB provides us with an official Python driver called PyMongo. PyMongo provides us with a rich set of tools that we can use to communicate with a MongoDB server. It provides functionality to query, retrieve, write and delete data and also to run database commands.

PyMongo Installation

Now we shall start using MongoDB using Python. To start using PyMongo we need to first install it in our python environment. Run the following command on your Anaconda Prompt to install PyMongo.

pip install pymongo

Once we have installed Pymongo on our Python environment we can go to a Python interactive session and run the following import code.

import pymongo

If the above code runs without any exception in the python shell, then our installation works fine.

Establishing a Connection With a Database

In order to establish a connection to a database, we need to create a MongoClient instance. In order to do this get back to your Python Interactive Session (Jupyter Notebook works) and import MongoClient from Pymongo.

The following is the code for establishing a connection with the database in python.

from pymongo import MongoClient
client = MongoClient()
client

The code above will establish a connection to the default host (localhost) and Port (27017).

Working with MongoDB Databases, Collections and Documents Using Python:

Now that we have established our connection to a database by creating a Mongo Client instance we will start working with Databases, Collections, and Documents In the Mongo DB database. We will look at each one of them in some detail below.

Once we have a connection established with an instance of MongoClient we can access any database managed by the specified MongoDB server. The following is the code for the same.

db = client.test
db

In this case, ‘test’ is the name of the database we will be working with. If this database does not exist then Mongo DB creates it for us but only when we perform the first operation on the database.

In order to store data in our database using PyMongo, we need to first create our Mongo DB documents. In python, we will use dictionaries to create documents. The following is the code to store data using documents.

document_1 = {
     "title": "Working With MongoDB in Python",
     "author": "K.sumanth",
     }

Once we have created the document using python’s dictionary, we need to specify which collection we want to use. The following is the code to create a collection named test.

test = db.test
test

Here, the test is an instance of collection and it represents a physical collection of documents in our database. We can insert documents into the test by calling .insert_one() on it as below.

result = test.insert_one(document_1)
result

In the above code, insert_one() takes document_1 inserts into the test collection and it will return us the result.

In case of many documents are to be inserted into the database we can use insert_many(). The code for inserting many documents is as below.

document_2 = {
     "title": "P.C.A And its Underlying Mathematical Principles",
     "author": "K.Sumanth",
         }
document_3 = {
“title”: “Working with different libraries in Python”,
“author”: “K.Sumanth”,
}

new_result = test.insert_many([document_2, document_3])
print(f"Multiple documents: {new_result.inserted_ids}")

To retrieve documents from a collection we can use them. find (). The code for retrieving documents using find() is as below.

import pprint
for doc in test.find():
     pprint.pprint(doc)

The following is the output of the above code in a Jupyter Notebook.

Jupyter Notebook

We can also use find_one() to retrieve a single document. If we want to retrieve the first publication by this author then we can use the following code.

sumanth_author= test.find_one({"author": "K.sumanth"})

PyMongo also provides methods for replacing, updating, and deleting documents from a database. You may look at the documentation for collection for a better understanding of these concepts.

Closing Connections

We can close the connection by calling.Close() on the MongoClient instance. The following is the code for closing the connection.

client.close()

Pros and Cons of Using Mongo DB

Finally, let us look at the pros and cons of Mongo DB in some detail:

Advantage of Mongo DB

1. Performance Levels
2. High Speed and Higher Availability
3. Simplicity
4. Easy Environment and Quick Set-up
5. Flexible Database
6. Sharding
7. Scalability
8. Technical Support
 

Now, let us look at each one of them in some detail.

Performance Levels: When it comes to the performance of Mongo DB we can term it as a high-speed Data-Base as it allows a quicker response while executing queries.

High Speed and Higher Availability: Mongo -DB as we know is a document-oriented database. It is easier to access documents by indexing. It has also got the attributes like replication and Grid FS which allows for an increase in data availability.

Mongo DB Performs 100 times faster than an Average Relational Database

Simplicity: Mongo DB offers a simple query syntax that is easier compared to SQL.

 Easy Environment and Quick Setup: The installation steps and execution of Mongo DB are quick and simple. It also offers us Javascript frameworks. This feature has made users confidently select NO SQL structures.

Flexible Database: Mongo Db is a schema-less database, that helps it to store any type of data in a separate document. This feature is very important for businesses as the business keeps evolving data keeps growing and the data can be of different types which can be easily stored using a separate Mongo DB document.

Sharding: It is the process of dividing data from a large set and distributing it to multiple servers which are connected to the application. If any issue arises where the server cannot handle the data due to its size, sharding will automatically divide the data further without pausing the activity.

Scalability: Sharding allows Mongo DB to use horizontal scalability which is a great advantage of Mongo DB when we have to handle a large data, we can distribute it to several machines.

Technical Support: MongoDB inc. provides for professional technical support whenever there is any problem we can directly reach out to the Mongo DB client support team.

Limitations of Using Mongo DB

  1. Joins is not supported
  2. High Memory Usage
  3. Limited Data Size
  4.  Limited Nesting
  5. Transactions
  6. Indexing
  7. Duplication of Data

In spite of having so many advantages Mongo DB also has some limitations. Let us discuss a few of them here in some detail.

Joins are not Supported:  Mongo DB does not support joins thus making it difficult to access data from different documents simultaneously. However, one can still use joins by coding the same manually however it would require multiple queries and a lot of code which makes the job tedious and also the query execution slow.

High Memory Usage: Mongo DB requires a very high amount of storage space due to the lack of joins feature and thus leading to duplication of data also there is an increase in data redundancy which takes up a lot of unnecessary space in the memory.

Limited Data Size: It allows for a limited size of 16 MB for a document.

Limited Nesting: In Mongo DB, we cannot perform the nesting of documents of more than 100 levels.

Transactions: In Mongo DB transactions refers to the process of reviewing and eliminating unwanted data. Mongo DB uses multi-document ACID(Atomicity, Consistency, Isolation, and Durability) transactions which the majority of the applications do not require transactions, although there are a few that may need it. This may lead to corruption of data which is one of the major drawbacks of Mongo DB.

Indexing: Mongo DB is known for its high-speed performance with the right indexes. In case of wrong indexing i.e. if the indexes are incorrectly implemented Mongo DB will perform at a very low speed and fixing the same will take a lot of time. This is one more major limitation of Mongo DB.

Duplication of Data: Duplicates as the name suggests it is the duplication of data in Mongo DB. This limitation of Mongo DB makes it difficult to handle data sets as the relations are not well defined. Sometimes the duplication of data may even lead to corruption of data as it is not ACID compliant.

Conclusion

Finally, we have come to the last part of this article where we will look at what we have learned in this article and some of the key takeaways from this article.

The following are some of the key takeaways from this article.

  • We understood the various features of Mongo DB that comes in handy while dealing in a Big Data environment.
  • We have seen how to install Mongo DB on our local Machines
  • We have also looked at how to work with Mongo DB and also performed various operations on our database using Py-Mongo.
  • We have also looked at the feature to take a backup of our Data from the Mongo DB database and also the feature to restore data back to the Mongo DB database.

 The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

About the Author

K Sumanth

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *