Shanthababu Pandian — May 10, 2022
Advanced Azure Cloud Computing Data Engineering

This article was published as a part of the Data Science Blogathon.

Introduction on Cosmos DB

Hello! Data Engineers, I am sure this simple article will help you guys better understand Cosmos DB from Azure with nice features. Recently many customers have been looking forward to implementing the Data Migration into Cosmos DB.

Before getting into deep dive, let’s understand Data Engineering. Let’s understand the Data and its journey,

  • I hope you know that every byte of data has its own story. In the modern digital world, data makes a journey of its own way, from where it gets generated for the user who consumes it for analytical and machine learning requirements.
  • But what makes the journey of data today very special and different when compared to before and after it gets into the process of Extract, Transform, Load (ETL) or ELT or ELTL.
  • When we discuss Data Engineering, the Database Migration from on-premises to the cloud is going very crazy and unavoidable situation in the current scenario.
  • since this, the process confirms complete data integrity, reduces storage costs with large ROI, and minimizes interruption to day-to-day business operations with minimal manual efforts.
  • And upgrades underlying multiple services and boosting the efficiency and effectiveness of the Morden Data Warehouse systems.

Database Migration Strategy

Data migration is the set of processes, which consists of Selecting, Preparing, Extracting, and Transforming the data and permanently transferring it from one storage system to another for a long term perspective. Here the other storage is nothing but Cloud. Why Cloud. I hope you all know!

Data migration is an essential topic, and organizations are using it by following the perfect choice of data management strategy.

When it is required

  • Upgrading or Consolidating Data Base Server
  • Moving to a new computing environment (On-premises into Cloud)

Data Migration Strategies (DMS): With respect to DMS, there are two distinct mythologies are there One is Big Bang Migration, and the next one is Trickle Migrations.

Data Migration – Risk and Challenges:

Do you think, How Data migration work and the challenges behind them?

Yes! Of course. A quick list

  • Data Loss
  • Compatibility Problems
  • Longer Downtime
Data Migration – Risk and Challenges:

Factors Influencing the Data Migration: multiple factors influence the DM during the process; here is the key list.

  • Amount of Data Involved
  • Workload Type
    • Migration Speed
    • Online migrations
    • Offline migrations

Data Migration Planning: Should we plan for DM, Which consists of the below steps one after another

Data Migration Planning

 

As I mentioned earlier, Data migration and data integration are related terms. Data integration is the process of collecting and combining different data sets from different sources, followed by cleansing, organizing, and preparing-actionable analytical and Machine Learning solutions.

Coming to the Core of this article, what is the feasibility of Data Migration from No SQL into Azure, how it is complex, and what are its benefits. Dose MS-Azure provide any specific component?

THE ANSWER IS YES! And in ONE WORD, that is Azure Cosmos DB.

For this discussion, Let’s take the MongoDB source; why MongoDB?

Because of the following reason

  • MongoDB provides a scalable, flexible, cross-platform and document-oriented NoSQL database.
  • It can be used for high-volume data storage purposes for big data analytics.
  • Uses collections and documents instead of the traditional tables
  • Typically this is used on JSON-like documents with schema(s)

Before getting into detail, lets will touch basis the Azure – Key Components, which really support this DM on a very large scale,

Azure – Key Components

Active Directory

  • The Azure Active Directory (Azure AD) is an enterprise-identity service that provides single sign-on and multi-factor authentication to help protect your users from 99.9 per cent of cybersecurity attacks.

Azure Storage Account

  • An Azure has a storage account that includes all of your Azure Storage data objects: blobs, file shares, queues, tables, and disks. The storage account provides a unique namespace(s) for your Azure storage data that is accessible from anywhere in the world using HTTP

Azure SQL Server/Azure SQL Database

  • MS- Azure SQL Database is a managed database.
  • A cloud database is that runs on a cloud computing platform and provides a service to access it. Managed database services used to take care of backup, high availability and scalability of the database.

Azure Blob Storage

  • Azure Blob storage is an object-based storage solution for the cloud. Ans this is optimized for massive amounts of unstructured data storage.

Azure Cosmos DB

  • Cosmos
    DB is a category of NoSQL data store that is available in Azure, and it is
    designed to be globally scalable and highly available with too low latency
    capability.
Azure Cosmos DB

Exploring Azure Cosmos DB

  • Azure Cosmos DB is a Fully Managed NoSQL database offered as PaaS and globally-distributed.
  • It has single-digit millisecond (m.sec) response times, are automatic and instant scalability oriented.
  • Global low latency Reads (10ms) /Writes (15ms) and multimodal DB – Key-Value, Document, Graph, Column based.
  • Guarantee speed at a different scale.
  • Business continuity is assured with SLA.
  • High availability (99.99%) and enterprise-grade security.
  • This is a fully managed service; It takes care of the DB administration responsibility off your hands with automatic management, updates, patching and etc.,
  • It handles capacity management-related activities in low-cost, cost-effective serverless, automatic scaling features.
  • Schema Agnostic Indexing feature enables automatic indexing without schema and index management.
  • The excellent scale of elastic from 1000 to 100 million of req/sec

Turnkey Global Distribution 

 

Turnkey Global Distribution 

Cosmos Consistency

Cosmos Consistency: In Cloud data storage technology, the consistency is that the data are in sync and come across in a given time.

  • Since this is globally distributed, data can be simultaneously written into different DB spread across all available regions. Here data is stored in the Primary node and replicated in the Secondary nodes,
  • Due to this replication, the users can access the data from the nearest region and increase accessibility.

Cosmos DB provides Multiple Levels of Consistency

  • Eventual Consistency – Data is written on the primary node and propagated eventually to read-only on the secondary- node, with little delay in sync up here.
  • Strong Consistency – Will get the latest copy of data for all secondary nodes with low performance
  • Consistency Prefix – Users can read the data in the same order as they updated (A, B, C…1,2,3,…)
  • Session Consistency – Those who are updating the data can see the changes, but others will take some time to see.
  • Bonded Staleness – Data won’t be replicated; secondary node for determining the timeline
Cosmos DB provides Multiple Levels of Consistency

Cosmos SLAs for Availability

  • Azure Cosmos DB provides throughput, latency at the 99th percentile, consistency, and high availability
Cosmos SLAs for availability

Azure Cosmos DB Resource Model

  • Cosmos-DB account is a fundamental unit of Global Distribution and with High-Availability.
  • You can have 50 accounts under one Azure subscription.
  • Azure regions can be added and removed from the account at any point time.

Azure Cosmos DB Resource Model

 

Azure Cosmos DB Resource Model

Cosmos Containers 

  • A Cosmos DB, in turn, consists of a set of containers.
  • The containers serve as the logical units of distribution and scalability too.
  • You can create collections, tables, and graphs(internally) just as containers.
  • These containers are schema-agnostic in nature and provide a scope for a query.
  • Data that resides in containers are automatically indexed upon the ingestion process.
  • The automatic indexing feature enables users to query the data without the hassles of schema or indexing management, especially in globally-distributed environments.
Cosmos Containers 

With a few clicks or programmatically, you can add or remove the geographical regions associated with your Cosmos database

Azure Cosmos DB’s High Availability

  • It provides high availability.
  • It replicates data across multiple regions configured in the Cosmos-DB account.
  • It uses to maintain four replicas of data within the configured regions.
  • This feature is available in all regions where Azure is available. (48 Regions)
  • Users can be associated with any number of Azure regions with an Azure Cosmos DB account and eventually, the data is automatically and transparently replicated to all regions. (Think about the cost )
  • Cosmos DB is available in all five distinct Azure cloud environments (Azure public/ Azure China 21Vianet/ Azure Germany/ Azure Government (US)/ Azure Government for DoD.
  • Advice: Having an Azure Cosmos DB account in more than 2 regions certainly improves the availability of your application and provides low latency across the associated regions.

API and Data Model Supported by Cosmos DB

  • Below APIs support the Cosmos DB to create and manage the DBs,
API and Data Model Supported by Cosmos DB

Migrate MongoDB to Azure Cosmos DB

Azure provides below different options for this migration

  • Migrate MongoDB to Azure Cosmos DB API Online or offline using DMS
  • Using Azure Data Factory
Migrate MongoDB to Azure Cosmos DB

Azure Database Migration Service

  • Azure DMS is a fully managed service designed, and it enables seamless migrations options from multiple database sources into Azure Data platforms with minimal downtime (online migrations).
  • With the help of Data Migration Assistant to GENERATE ASSESSMENT REPORTS and this would provide the recommendation to guide us before performing a migration.
Azure Database Migration Service

Migrate MongoDB to Azure Cosmos DB

Migrate MongoDB to Azure Cosmos DB API for MongoDB (Off/On)

  • Offline
    • One-time migration
    • Perform an offline
  • Online
    • Minimal downtime
    • Perform an offline

The process has been defined below

Migrate MongoDB to Azure Cosmos DB

Pre-migration

  1. Ensure that you set up Azure Cosmos DB to fulfil your application’s post-migration requirements
  2. (2) plan out how you can exactly execute the migration.
  • Follow all these steps to perform the pre-migration process
    • Discover your existing MongoDB resources and create an artefact to track them in handy
    • Assess the readiness of your existing MongoDB resources for the data migration process
    • Map your existing MongoDB resources to new Azure Cosmos DB resources (End-to-End)

Post-migration

  • After you migrate, you can connect to Azure Cosmos DB and manage the data.
    • Follow these steps to perform a thorough post-migration
      • Set consistency level
      • Connect (cutover) your application
      • Tune for optimal performance

Azure Database Migration Service

  • Azure DMS is designed to streamline the process of migrating on-premises databases into Azure.
  • Offline and Online migration
  • Database Migration Service pricing
  • Near-zero downtime Migration
  • Automate your database migrations
  • Simplify your migration
  • Move widely used databases

Common Scenarios

  • SQL Server ⟶ Azure SQL Database
  • SQL Server ⟶ Azure SQL Database Managed Instance
  • MongoDB ⟶ Azure Cosmos DB
  • MySQL ⟶ Azure Database for MySQL
  • PostgresSQL ⟶ Azure Database for PostgresSQL
  • DB2 ⟶ Azure SQL Database
  • Oracle ⟶ Azure DB for PostgresSQL (requires ora2pg)

Conclusion

Guys! So far, we have discussed the very entering topic, and I trust it would help you all to understand the Azure Cosmos DB and the scope of Database Migration; when you’re building Database Migration Strategy and Architecture as Data Engineering in a cloud migration project, you have to consider below points, You must handle the cloud components by accessing various aspects and prepare the solutions as feasible, sustainable and cost-effective. Throughout this article, we have learned so many things right from the importance below.

  • Database Migration and its necessity, process planning,
  • An overview of Azure Data engineering components and their importance
  • Specifically, Azure Cosmos DB and its capabilities like
    • Consistency
    • Availability
    • Resource Model
    • List of API
    • Data Model Support
    • Migrate MongoDB to Azure Cosmos DB Steps.

This could be a little part of the Data Engineering and its scope with respect to Cosmos DB; there are many more. I will bring more concepts to my next articles. Until then! Bye! Cheers! Shanthababu.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

About the Author

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *