Aryan Garg — Published On October 30, 2022 and Last Modified On November 4th, 2022
Database Guide Intermediate NoSQL python

This article was published as a part of the Data Science Blogathon.

Introduction

Databases are collections of data that computers can access. They are used to store information such as customer records, employee records, and product information.

Databases can be divided into two types: relational and non-relational. Relational databases store data in tables that are linked together by key values. Non-relational databases store data in a more flexible format that does not require key values.

Databases are essential because they allow businesses to store and access large amounts of data quickly and easily. They also enable companies to share data with other businesses and organizations.

Databases
Source – www.aceinfoway.com

What is File System for Data Storage:

In File System, data is stored on a computer’s hard disk. The file system is a way of organizing files and directories on a computer. It is responsible for keeping track of where files are on the hard disk and how they are organized.

The file system is divided into the directory structure and the file data. The directory structure is a hierarchy of directories containing information about the files in each. The file data is the actual data that is stored in the files.

Each file in the file system has a unique name and a path that specifies the file’s location in the directory structure. The path of a file is the sequence of directories that leads from the root directory to the file. For example, the path of a file named “myfile.txt” in the “Documents” directory would be “Documents/myfile.txt.” The file system also contains metadata about each file, such as the file’s size, owner, and permissions.

There are many different file systems, but they all essentially perform the same primary function: they allow you to store and organize your files. Some file systems are designed for specific operating systems, while others are cross-platform.

One of the most popular file systems is the FAT32 file system. It was initially developed for MS-DOS but is now also used by Windows and Linux. FAT32 has several advantages, including the fact that it is straightforward to use and very efficient. However, it has some disadvantages as well, such as the fact that it is not very secure and is not very reliable. NTFS is another popular file system. It was developed by Microsoft and is used by Windows. NTFS is a very safe and reliable file system but is also very complex and challenging to use.

Difference between Database & File System:

There are many differences between databases and file systems. Still, the most fundamental difference is that databases are designed to store data in a structured way, while file systems are designed to store data in an unstructured way.

Databases are typically organized around data models, which define the structure of the data and the relationships between different pieces of data. This structure makes it easy to query the data and find the information you need.

On the other hand, file systems are designed to store data in a flat, unstructured way. It makes it difficult to query the data and find the needed information. Another difference between databases and file systems is that databases are typically designed to be accessed by multiple users simultaneously. In contrast, file systems are designed to be accessed by a single user simultaneously. Databases usually have locking and concurrency control to prevent data corruption, while file systems typically do not.

Databases
Source – https://www.educba.com/dbms-vs-file-system/

This article will discuss the different types of databases we encounter daily.

Let’s get started, 😉

Relational Databases

Relational databases are a type of database that store data in tables. Tables are a type of data structure that store data in rows and columns. Rows represent records, and columns represent fields.

Relational databases often store data that SQL can query. SQL is a language that is used to query data in relational databases.

Relational databases are a powerful way to store data. They are easy to query, and we can use them to keep a large amount of data.

Where can we use Relational Databases:

1. Websites: SQL databases can be used to store data for a website. This data can include user information, such as name and contact information, and website content, such as articles or blog posts.
2. Mobile apps: SQL databases can power mobile apps. This data can include user information, such as name and contact information, and app content, such as articles or blog posts.
3. Businesses: SQL databases can be used to run a business. This data can include customer information, such as name and contact information, as well as product information, such as inventory levels and pricing.

Limitations:

SQL databases are limited in their ability to scale horizontally. It means they cannot distribute data efficiently across multiple servers. It can be a problem when dealing with large amounts of data or when trying to provide high availability. Another limitation of SQL databases is that they are not as flexible as NoSQL databases regarding schema changes. Changing your data model without taking your database offline can make it challenging.

relational database
Source – www.pragimtech.com

You have heard the term “Scaling database horizontally” in the above paragraph. It means distributing the load to multiple servers. Let’s dive deep into it and learn more about Database Scaling.

Database Scaling:

Most people are familiar with the concept of scaling a database vertically. It simply means adding more resources to an existing database server to increase performance. It might involve adding more RAM, a faster CPU, or more storage.

Horizontal scaling is a bit different. Rather than adding more resources to a single server, horizontal scaling involves adding more servers to a database cluster. We can do it by adding read-only replicas of the database to different servers. Or, it can include sharding the database, which means splitting the data into smaller pieces and storing it on other servers.

Which approach is best depends on the situation? Vertical scaling can be easier to implement and can be less expensive. But, it eventually reaches a point where adding more resources to a single server is no longer feasible. At that point, horizontal scaling becomes a necessary solution.

scaling
Source – www.geeksforgeeks.org

NoSQL Databases

NoSQL databases are a type of database that does not use the traditional table-based relational database model. Instead, NoSQL databases are designed to be scalable and flexible, and they use a variety of data models, including key-value stores, document databases, column-oriented databases, and graph databases.

Where can we use NoSQL Databases:

1. Storing large amounts of data: NoSQL databases are ideal for storing large amounts of data. They are designed to scale horizontally, meaning they can easily add more nodes to the system as needed.
2. Handling high traffic: NoSQL databases are also suitable for handling high traffic. It is because they are designed to be highly available, meaning they can keep running even if one or more nodes go down.
3. Processing real-time data: NoSQL databases are also suitable for real-time processing data. It is because they are highly scalable, meaning they can easily add more nodes to the system as needed.

Limitations:

NoSQL databases have many benefits, but they also have some limitations. One significant rule is that they are not well-suited for transactions that span multiple records. NoSQL databases do not support joins, which are necessary for transactions that update multiple records. Another limitation is that NoSQL databases can be less consistent than traditional relational databases. NoSQL databases often do not enforce data integrity rules, such as foreign key constraints. Finally, NoSQL databases can be more difficult to query than relational databases. NoSQL databases often do not support standard query languages, such as SQL.

Nosql database
Source – https://www.educba.com/what-is-nosql-database/

Graph Databases

A graph database is a database that represents and stores data using graph topologies with nodes, edges, and characteristics, enabling semantic searches. A crucial system idea is that the graph directly links data elements in the storage. We may immediately link data in the store together thanks to the relationships, and in many situations, we can access it with only one action. Although they have a distinct data model, graph databases are comparable to relational databases.

There are many different types of graph databases, but they all have three standard features:
Nodes – the entities or objects in the graph. In a social network, nodes might represent people.
Edges – the relationships between the nodes. In a social network, edges might represent friendships.
Properties – information attached to the nodes and edges. A social network’s properties might represent names, ages, and locations.

Where can we use Graph Databases:

There are a few different graph databases, and each has its use cases. Here are some examples of where you might use a graph database:
1. Social media networks use graph databases to map out relationships between users.
2. Fraud detection systems use graph databases to identify behavior patterns that may indicate fraudulent activity.
3. Recommendation engines use graph databases to find similar items and recommend them to users.
4. Networking and IT systems use graph databases to map out and visualize complex relationships between data.

Limitations:

There are a few potential limitations of graph databases to be aware of:
1. They can be more complex to query than other databases. It is because you often have to traverse the graph to find the data you’re looking for, which can be time-consuming.
2. They can be more challenging to scale than other databases. Because each node in the graph is connected to other nodes, adding more nodes can quickly become complicated.
3. They can be less reliable than other types of databases. It is because the data in a graph database is often spread across many different nodes, making it challenging to keep track of everything.

Databases
Source – singhnaveen.medium.com

Some Examples of Famous Databases

There are many examples of famous databases, but here are a few of the most popular:

1. MySQL:

A free and open-source database management system is MySQL. Some of the biggest websites in the world, including Facebook, Twitter, and YouTube, use it as a standard option for online apps.

MySQL is a database management system developed in C and C++ on most operating systems. The most widely used database management system for PHP is MySQL.

A relational database management system is MySQL (RDBMS). Tables are used in relational databases to hold data. In a file system, tables function similarly to folders in that each table contains a collection of data.

We may utilize the robust MySQL database system for a variety of purposes. It has many functions and is simple to use.

mysql
Source – https://aws.amazon.com/rds/mysql/what-is-mysql/

2. PostgreSQL:

With over 30 years of ongoing development, PostgreSQL is a powerful open-source object-relational database system built with a solid reputation for dependability, feature robustness, and speed.

PostgreSQL is an excellent choice for a database for several reasons:

It is free and open source.

It has a strong community of developers and users.

It is very stable and has a track record of handling high-traffic websites.

It supports a wide variety of data types, including geographic data.

It has many advanced features, including full-text search and an advanced system for managing permissions.

postgresql
Source – https://ubiq.co/database-blog/postgresql-create-database/

3. Oracle Database:

Oracle Database is a robust, reliable, and scalable relational database management system. It is the industry’s leading database, with over 400,000 customers worldwide. Oracle Database is used by some of the world’s largest organizations, including many Fortune 500 companies.

Oracle Database is available in various editions, each designed to meet your organization’s specific needs. Oracle Database Standard Edition is the entry-level edition of Oracle Database, providing a cost-effective way to deploy Oracle Database in small to medium-sized environments. Oracle Database Enterprise Edition is the most comprehensive edition of Oracle Database, providing all the features and options available, including high-end features such as Oracle Real Application Clusters and Oracle Active Data Guard.

We recommend starting with Oracle Database Standard Edition if you’re new to Oracle Database. Once you’re familiar with the features and capabilities of Oracle Database, you can upgrade to Oracle Database Enterprise Edition to take advantage of its additional features and options.

oracle database
Source – www.fujitsu.com

4. Cassandra Database:

Cassandra is a NoSQL database that provides high availability and linear scalability. It is a distributed database designed to handle large amounts of data across many commodity servers while providing high availability with no single point of failure. Cassandra is an open-source project that is part of the Apache Foundation.

Cassandra features a ColumnFamily data model that allows for flexible data schema. Cassandra also has a built-in MapReduce framework that makes it easy to perform data analytics. Cassandra is written in Java and has been ported to other languages such as C++, Python, and Go.

Cassandra is used by some of the largest companies in the world, including Netflix, Facebook, and eBay. Cassandra has been proven to scale to handle massive data sets with very high availability.

cassandra database
Source – https://cassandra.apache.org/_/cassandra-basics.html

5. MongoDB:

MongoDB is a powerful document-oriented database system. It has an index-based search feature that makes data retrieval quick and easy. MongoDB also offers a scalability feature, allowing it to handle extensive scale data.

mongo db
Source – storyperday.com

6. Microsoft SQL Server:

Microsoft created the relational database management system known as Microsoft SQL Server. It is a software product known as a database server. Its main job is to store and retrieve data when other software programs, which may operate on the same computer or a different machine over a network, require it (including the Internet). A part of the Microsoft Windows Server operating system is Microsoft SQL Server.

SQL Server 2016 introduces a new deployment option: Stretch Database. Stretch Database dynamically stretches your on-premises SQL Server databases to Azure. By dynamically extending the warm and cold transactional data in your databases to Azure, Stretch Database enables your applications to continue to access their data, even when it is moved to Azure.

SQL Server 2016 also introduces several other exciting new features:

• Polybase enables you to query data stored in Hadoop clusters using the same Transact-SQL language used to query relational data in SQL Server.

• Always Encrypted protects your most sensitive data, such as credit card and banking information.

SQL server
Source – https://www.iperiusbackup.net/en/sql-server-how-to-create-a-full-database-backup/

Conclusion

In this article, we have talked about the various types of databases, their use, and also their limitations. Other than this, there are more types of databases, like Map-Reduce Databases or XML Databases; we will discuss them in upcoming articles.

We have seen above that no database is perfect. All of them have their benefits and limitations. An ideal database must be able to store data without any loss or corruption. It must be able to retrieve data perfectly and accurately. It must be able to update data ideally without any loss or corruption. Lastly, it must be able to delete data ideally without any loss or corruption.

Key takeaways of this article:
1. Firstly, we discussed databases and file systems and identified why databases are much better than file systems.
2. After that, we discussed databases like SQL, NoSQL, and Graph Databases. Also we have discussed their uses and limitations too.
4. Then we discussed some famous examples of databases used in present industries like MongoDB, MySQL, Oracle DB, etc.
3. Finally, we have concluded the article by discussing the conditions a perfect database must satisfy.

It is all for today. I hope you have enjoyed the article. If you have any doubts or suggestions, feel free to comment below. Or you can also connect with me on LinkedIn. I will be delighted to get associated with you.

Do check my other articles also.

Thanks for reading, 😊

GitHub | Instagram | Facebook

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

About the Author

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *