We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details

Setup Mage AI with Postgres to Build and Manage Your Data Pipeline

Adarsh Balan 03 Oct, 2024
8 min read

Introduction

Imagine yourself as a data professional tasked with creating an efficient data pipeline to streamline processes and generate real-time information. Sounds challenging, right? That’s where Mage AI comes in to ensure that the lenders operating online gain a competitive edge. Picture this: thus, unlike many other extensions that require deep setup and constant coding, Mage AI has a clear and undemanding step-by-step setup and you are welcome to use its clear interface without having to drag and drop items. I’ll also share with you how to import Mage AI with PostgreSQL so that you can create your first data pipeline with Mage AI. Allow me to introduce you the steps that will help make your data processing even better!

Learning Outcomes

  • Understand how to configure Mage AI for seamless integration with PostgreSQL.
  • Learn to upload raw data to PostgreSQL and create schemas using pgAdmin4.
  • Master the process of building and managing data pipelines in Mage AI.
  • Explore how to set up automated triggers and scheduling for data pipelines.
  • Gain insights into Mage AI’s advanced features like real-time processing and monitoring.

This article was published as a part of the Data Science Blogathon.

What is Mage AI?

Mage AI simplifies the integration of growing data workflows as an open-source tool. With its clean design and app-like interface, data engineers and analysts can easily create data pipelines using one-click options, eliminating the need for coding. Importing, analyzing and manipulating big data is much easier with Mage AI which comes with features such as drag and drop, data transformation, data source compatibility among others. The former enables users to spend their time on the analytics side instead of worrying about the underlying infrastructures to set up. Mage AI also supports Python scripting where one can define custom transformations which make it suitable for both, a technical and non technical user.

Benefits of Using Mage AI with PostgreSQL

Let us look into the benefits of using Mage AI with PostgreSQL.

  • Streamlined Data Management: Mage AI simplifies data pipeline creation with its drag-and-drop interface, making it easy to load, transform, and export data from PostgreSQL without manual coding.
  • Enhanced Automation: Automate recurring data tasks, like ETL processes, by setting up triggers and scheduled pipelines, reducing the need for constant manual interventions.
  • Seamless Integration: Mage AI integrates smoothly with PostgreSQL, enabling users to manage large datasets efficiently and perform complex data operations within the same workflow.
  • Customizable Transformations: Leverage Python scripting in Mage AI to perform custom data transformations on PostgreSQL data, allowing flexibility for advanced data processing.
  • Scalable and Reliable: Mage AI efficiently manages pipelines, ensuring smooth handling of both small and large datasets, while PostgreSQL’s scalability supports business growth without performance bottlenecks.
  • User-Friendly: The intuitive interface makes it accessible to users with varying levels of technical expertise, enabling quicker learning and faster deployment of data solutions.
Setup Mage AI with Postgres to Build and Manage Your Data Pipeline

Setup Mage AI with Postgres to Build and Manage Your Data Pipeline

Setting up Mage AI with Postgres allows you to seamlessly build and manage powerful data pipelines, automating workflows and simplifying complex data tasks for efficient insights. Let us look into the steps required to set up Mage AI with Postgres.

Step1: Preparing Your Postgres Database

Before diving into Mage AI, upload your raw data files to Postgres using pgAdmin4, and create the correct schema for each file. Here’s how to get started:

Upload Raw Files to Postgres via pgAdmin4

  • Open pgAdmin4 and connect to your Postgres server.
  • Create a new database or use an existing one.
  • Ensure that you add the correct schema for each raw data file.
  • Upload/Export your data files to the appropriate tables within this schema.
import pandas as pd
import chardet

# Open the file in binary mode and read a sample
with open("expensemaster.csv", 'rb') as file:
    sample = file.read(10000)  # Read first 10,000 bytes as a sample

# Detect encoding
detected = chardet.detect(sample)
print(detected['encoding'])

# Use the detected encoding to read the CSV
try:
    df = pd.read_csv("expensemaster.csv", encoding=detected['encoding'])
except UnicodeDecodeError:
    # If reading fails, try with a common encoding like UTF-8
    df = pd.read_csv("expensemaster.csv", encoding="utf-8")

# Infer data types
dtype_mapping = {
    'object': 'TEXT',
    'int64': 'BIGINT',
    'float64': 'DOUBLE PRECISION',
    'datetime64[ns]': 'TIMESTAMP',
    'bool': 'BOOLEAN'
}

column_definitions = ', '.join([f'"{col}" {dtype_mapping[str(df[col].dtype)]}' for col in df.columns])

# Generate the CREATE TABLE SQL
table_name = 'expensemaster'
create_table_sql = f'CREATE TABLE {table_name} ({column_definitions});'
print(create_table_sql)
Setup Mage AI with Postgres

Click refresh on the “Tables’ to get the newly created table.

Setup Mage AI with Postgres to

Start the Postgres Service

Make sure the Postgres service is running. You can check this in pgAdmin4 or by using the psql terminal.

Step2: Gathering Postgres Configuration Details

You’ll need specific details to configure Mage AI with Postgres. Here’s what you need and how to find it:

  • POSTGRES_DBNAME: The name of your Postgres database.
  • POSTGRES_SCHEMA: The schema where your data files are uploaded.
  • POSTGRES_USER: The username for your Postgres database.
  • POSTGRES_PASSWORD: The password for your Postgres database.
  • POSTGRES_HOST: The host IP address of your Postgres server.
  • POSTGRES_PORT: Usually 5432 for Postgres.

Step3: Installing Mage AI Using Docker in VS Code

To install Mage AI, we will use Docker Extension in Visual Studio Code (VS Code). Ensure you have Docker Desktop and the Docker extension for VS Code installed.

Install Docker Desktop

Download and install Docker Desktop from here and initialize it.

Install the Docker Extension for VS Code:

  • Open VS Code and go to the Extensions view by clicking on the Extensions icon in the Activity Bar on the side of the window or by pressing Ctrl+Shift+X.
  • Search for “Docker” and install the Docker extension by Microsoft.

Pull the Mage AI Docker Image

  • Open a terminal in VS Code and navigate to your project folder.
  • Run the following command to pull the latest Mage AI Docker image:
docker pull mageai/mageai:latest

Run the Mage AI Docker Image

  • Once the Mage AI image is pulled, go to the Docker tab in VS Code.
  • Find the Mage AI image and run it. This will create a new container.
  • Right-click on the newly created container and select “Open in Browser.”
  • The Mage AI interface should now load in your default web browser.
Run the Mage AI Docker Image

Step4: Configuring Mage AI to Connect with Postgres

Configure the database connection in io_config.yaml:

  • Navigate to the All Files section of your pipeline.
  • Locate and open the io_config.yaml file.
  • Add your Postgres connection details as follows
Step4: Configuring Mage AI to Connect with Postgres

Allow Mage AI to Access the Postgres Database

  • To grant access to the database on your IP address, you need to modify the pg_hba.conf file.
  • Locate the pg_hba.conf file at C:\Program Files\PostgreSQL\16\data.
  • Open the file and add the row under the # IPv4 local connections section as shown in the Fig. 4.

Step5: Creating Your First Data Pipeline

Now that Mage AI is configured to connect with Postgres, we can create our first data pipeline. We’ll start by setting up data loader blocks for each dataset and using the drag-and-drop feature to connect them in a flowchart.

Create Data Loader Blocks

  • For each dataset, create a separate data loader block.
  • In the Mage AI interface, drag and drop a data loader block onto the canvas for each dataset you need to load from Postgres.
  • Configure each data loader block with the appropriate connection details and query to fetch the data from Postgres.
Create Data Loader Blocks

Connect the Data Loader Blocks to the Transformer block

Use the drag-and-drop feature to connect the data loader blocks in the flowchart to the next transformer code block. This visual representation helps in understanding the data flow and ensuring all steps are connected correctly.

Connect the Data Loader Blocks to the Transformer block
Connect the Data Loader Blocks to the Transformer block

Creating Data Exporter Blocks

  • In the Mage AI interface, after configuring your data loader and transformation blocks, add a data exporter block to the canvas.
  • Choose “Postgres” as the destination for the data under Python.
  • Provide the necessary connection details to your Postgres database. Write the code to export the transformed data back to the PostgreSQL database.
Creating Data Exporter Blocks
Mage AI with Postgres

Step6: Creating Triggers and Scheduling Pipelines

Mage AI offers the ability to create triggers for running your pipeline and scheduling it for regular execution. This ensures your data is always up-to-date without manual intervention.

Creating a Trigger

  • In Mage AI, you can set up triggers to run your pipeline based on specific events or conditions. For example, you can trigger a pipeline to run whenever new data is added to your Postgres database.
  • To create a trigger, navigate to the pipeline settings and configure the trigger conditions as needed.

Scheduling the Pipeline

  • Mage AI supports scheduling pipelines to run at regular intervals. This can be done through the scheduling settings in the Mage AI dashboard.
  • You can specify the frequency (daily, weekly, etc.) and the time for the pipeline to run.
Mage AI with Postgres

Additional Features of Mage AI

Mage AI provides several powerful features to automate and enhance your data pipelines:

  • Integration with Multiple Data Sources: Mage AI also accepts numerous kinds of data inputs: databases, cloud storage, and APIs enabling you to construct diverse and extensive data flows.
  • Advanced Transformation Capabilities: Based on Python, Mage AI provides you an opportunity to implement custom transformation with the help of decorators which facilitates the process of realization of various data transformation algorithms.
  • Scalability: Mage AI optimizes your throughput for big data, enabling it to handle increasing amounts of data as they grow.
  • Monitoring and Alerts: Mage AI provides a strong monitoring and alerting functionality and allows one to monitor the workflow of the pipeline as well as receive notifications on failures.
  • User-Friendly Interface: The graphical layout of the data pipelines means that users do not have to worry about complicated coding in order to manipulate and transform their data.

They to make Mage AI a tool to automate the data workflows as the data infrastructure so that you do not need to spend much time on it.

Conclusion

Today, information is a valuable asset, making data management essential for organizations. This article provides clear guidance on configuring Mage AI with PostgreSQL, helping you build a robust data pipeline that not only streamlines multiple processes but also significantly boosts productivity. With the software associate, the utilization of Mage AI alongside with robust databases such as PostgreSQL enable users to handle, analyze and make the right decisions in the shortest time possible. As organizations have stepped up efforts in data-driven methodologies and frameworks, technologies such as Mage AI are poised be the dominant models for managing data.

Frequently Asked Questions

Q1. What is Mage AI?

A. Mage AI is an open-source tool designed to simplify the process of building and managing data workflows. It provides a user-friendly interface and automation features that help data professionals create pipelines without extensive coding knowledge.

Q2. Why use PostgreSQL with Mage AI?

A. PostgreSQL is a powerful, open-source relational database management system known for its robustness and scalability. When paired with Mage AI, it allows users to efficiently store, retrieve, and manipulate large datasets, making it an ideal choice for data pipelines.

Q3. Do I need programming skills to use Mage AI?

A. While some familiarity with programming concepts can be helpful, Mage AI is designed to be user-friendly and accessible to users with varying levels of technical expertise. Many tasks can be accomplished through its intuitive interface.

Q4. Can I integrate other data sources with Mage AI?

A. Yes, Mage AI supports integration with various data sources, allowing users to build comprehensive data pipelines that pull in data from multiple platforms, enhancing the overall data ecosystem.

Q5. Is Mage AI free to use?

A. Mage AI is an open-source tool, which means it is free to use. However, users may incur costs associated with hosting, storage, and other related services, depending on their infrastructure choices.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Adarsh Balan 03 Oct, 2024

Responses From Readers

Clear