Intelligent Document Processing with Azure Form Recognizer

Harun Raseed 31 Mar, 2023

8 min read

Introduction

Intelligent document processing (IDP) is a technology that uses artificial intelligence (AI) and machine learning (ML) to automatically extract information from unstructured documents such as invoices, receipts, and forms. IDP combines optical character recognition (OCR) technology with AI and ML algorithms to extract data and insights from documents, reducing the need for manual data entry and improving accuracy with Azure forms recogniser.

In this article, we are going to see how we can implement IDP using the Azure Forms Recogzier service and create an end-to-end pipeline to automate the process of document extraction and data visualization using Azure Functions and Power BI.

Learning Objectives

Create an Azure Forms Recognizer service and use its built-in models.
Prepare, Label, Train, and Analyse a custom model based on your own requirement.
How to integrate the Form Recognizer output with Azure Function and automate the process?

This article was published as a part of the Data Science Blogathon.

Getting Started with Azure Forms Recognizer

Azure Forms Recognizer is a cutting-edge technology that utilizes machine learning algorithms to automate document processing and data extraction tasks. With its advanced capabilities, it can quickly analyze structured and unstructured documents such as invoices, receipts, and forms and extract valuable data in a matter of seconds.

In this section, we will see how to create an Azure Form Recognizer service from Azure Portal,

First, log in to your Azure portal (portal.azure.com).
Once you’re logged in, click on the “Create a resource” button on the left-hand side of the screen.
In the “New” pane, type “Form Recognizer” into the search box and press enter.
Select the “Form Recognizer” service from the results.
In the “Form Recognizer” pane, click on the “Create” button.
In the “Create Form Recognizer” pane, fill out the required fields such as subscription, resource group, name of the service, pricing tier, and location.
Next, under the “Features” tab, select the type of form that you want to recognize, such as receipts, invoices, business cards, or custom forms.
Once you select your form type, click the “Review + create” button.
Review your settings and then click on the “Create” button to create your Form Recognizer service.

From the above-created Azure Form Recognizer, we can be able to process some of our documents and receipts whose prebuilt models are already available with Azure Form Recognizer Service.

For Example, the below pre-built models are already available with Azure Form Recognizer service.

Invoices
Receipts
Business Cards
Identity Documents
Health Insurance Cards
US Tax Documents (W-2, 1098, 1098-E, 1098-T)
Contracts
Vaccination Cards

But consider that an input document is a claim form from an insurance company. There is no pre-built model available to process the claim forms, so here we will create a custom model for intelligent document processing that will read and extract the information from scanned and handwritten claim forms submitted by policyholders.

Creating a Custom Model From Azure Form Recognizer

Azure Form Recognizer | document processing

Creating a Custom model involves four major steps,

Prepare
Label
Train
Analyse

For Prepare step, we need to have a minimum of five sample files (Claim Forms), and we need to label those sample files and train them to do further analysis of the model.

To create a custom model from the Azure portal, we need to follow the below steps.

Select Custom Model from the Azure Form Recognizer Studio
Create a New Project, Give the appropriate Project name and description, and click continue.

In the next pop-up, choose the appropriate Azure Subscription and Rescource group where you created your Azure Form recognizer Resource, choose the latest API version from the list, and click continue.

In this section, We need to connect to the source dataset container. For that, choose the appropriate subscription, resource group and storage account and container where you are going to keep your training dataset. Click Continue to Review and Create the Project.

Prepare

As discussed earlier, we need to have a minimum of five sample files for labeling and training the model. So we need to upload those five sample files in the container or directly in the UI we can upload/Drag and Drop.

Label

Once the sample files are available, Then we need to Start Labeling our sample files.

After labeling (Minimum Five files), click on the Train button.

Train: It will ask us to choose from two different build modes (Template and Neural)

Template: This mode is for Structure-based extraction, and it takes only 1-5 minutes to train the model, and it supports 164 languages.

Neural: This mode is for Structured, semi-structured, and Unstructured based extraction. It takes around 30 minutes to train a model and it supports only English Language documents.

In our case, the claim forms come under Structured/Semi-Structured, so we can choose Template mode and click on Train.

Within a few seconds to minutes, the Model will get created in the Models Section.

Now we can start analyzing our new Claim forms.
Upload any new claim form we want to process and click the Analyse button.
Now the document will get analyzed based on the model that we created, and it will give the extracted information in the portal.

Test and evaluate your custom model: After the training is complete, you can test your custom model by submitting test documents to it and reviewing the output.

Problem Statement

Organizations that deal with huge volumes of documents are facing a significant challenge in processing a large number of scanned and handwritten documents and forms received from their customers. These documents and forms contain a vast amount of critical information, such as personal details, medical history, and damage assessment reports, which must be accurately extracted and processed for efficient claim processing. However, manual processing has become increasingly time-consuming, error-prone, and resource-intensive due to the sheer volume of documents. This has led to delays in claims processing, increased operational costs, and dissatisfied policyholders.

To address this challenge, they need a solution that can automate the document processing and data extraction processes, improve accuracy and reduce the overall processing time.

Proposed Solution For Real-Time Use Case

To overcome the above-mentioned problem, we can use the Azure Form Recognizer service for intelligent document processing along with other data engineering methodologies to process the documents on a large scale with a lesser operational cost. We can also extract the data from the document, do a transformation, and then visualize it in a dashboard, which helps the organizations analyze their KPIs and allows them to make some business decisions.

Automation Process

We can create a custom model based on our requirements. But now, the challenging part is how we are going to consume that data. We cannot consume the data from UI directly. And also, we need to automate a process, like whenever a new scanned document lands in the storage container, it needs to get processed by the form recognizer, and the extracted information needs to be saved as a file. Further, the file should get visualized in PowerBI.

The architecture below clearly explains how we are going to achieve the above challenge.

Once the claim forms land in the storage container, the Azure function equipped with Blob trigger will trigger the function (once it identifies a new blob activity) and run the code inside the Azure function to call the Azure forms recognizer, extract the data, process the data using simple Pandas code, and save it as a csv file in the blob storage container. Once we got the csv file in the output container, we could connect that storage path with Power BI and visualize the data.

Find the Below sample API calls for different pre-built and custom models.

Layout API:

https://{endpoint}/formrecognizer/v2.0/layout/analyze

Receipt API:

https://{endpoint}/formrecognizer/v2.0/prebuilt/receipt/analyze[?includeTextDetails]

Custom Model API:

https://{endpoint}/formrecognizer/v2.0/custom/models/{modelId}/analyze[?includeTextDetails]

With the help of these sample APIs, we can embed our form recognizer service into various other services based on our requirements. In this particular scenario, we are going to create an Azure Function App, call the Form Recognizer using the API, and process our document.

In the below code snippet, we will call the Layout API and process a document that is in pdf format, extract the document’s content, convert it into a csv file, and push it to a separate container (output container).

Whenever a file lands in the input container, the blob trigger will call the Layout API and process the document, and the Azure function will push the converted csv file into the output container.

import logging
from azure.storage.blob import BlobServiceClient
import azure.functions as func
import json
import time
from requests import get, post
import os
import requests
from collections import OrderedDict
import numpy as np
import pandas as pd

def main(myblob: func.InputStream):
    logging.info(f"Python blob trigger function processed blob \n"
                 f"Name: {myblob.name}\n"
                 f"Blob Size: {myblob.length} bytes")

    # This is the call to the Form Recognizer endpoint
    endpoint = r"https://myformrecognizername.cognitiveservices.azure.com/"
    apim_key = "***************************"
    post_url = endpoint + "/formrecognizer/v2.1/layout/analyze"
    source = myblob.read()

    headers = {
    # Request headers
    'Content-Type': 'application/pdf',
    'Ocp-Apim-Subscription-Key': apim_key,
    }

    text1=os.path.basename(myblob.name)

    resp = requests.post(url= post_url, data= source, headers= headers)

    if resp.status_code != 202:
        print("POST analyze failed:\n%s" % resp.text)
        quit()
    print("POST analyze succeeded:\n%s" % resp.headers)
    get_url = resp.headers["operation-location"]

    wait_sec = 25

    time.sleep(wait_sec)
    # The layout API is async therefore the wait statement

    resp = requests.get(url=get_url, headers={"Ocp-Apim-Subscription-Key": apim_key})

    resp_json = json.loads(resp.text)

    status = resp_json["status"]

    if status == "succeeded":
        print("POST Layout Analysis succeeded:\n%s")
        results = resp_json
    else:
        print("GET Layout results failed:\n%s")
        quit()

    results = resp_json
    print("i came here")

    # This is the connection to the blob storage, with the Azure Python SDK
    blob_service_client = BlobServiceClient.from_connection_string("DefaultEndpointsProtocol=https;AccountName=storageaccountname;AccountKey={***key***}==;EndpointSuffix=core.windows.net")
    container_client=blob_service_client.get_container_client("output")
    print("storage")
    # The code below extracts the json format into tabular data.
    # Please note that you need to adjust the code below to your form structure.
    # It probably won't work out-of-the-box for your specific form.
    pages = results["analyzeResult"]["pageResults"]
    
    def make_page(p):
        res=[]
        res_table=[]
        y=0
        page = pages[p]
        for tab in page["tables"]:
            for cell in tab["cells"]:
                res.append(cell)
                res_table.append(y)
            y=y+1

        res_table=pd.DataFrame(res_table)
        res=pd.DataFrame(res)
        res["table_num"]=res_table[0]
        h=res.drop(columns=["boundingBox","elements"])
        h.loc[:,"rownum"]=range(0,len(h))
        num_table=max(h["table_num"])
        return h, num_table, p

    h, num_table, p= make_page(0)

    for k in range(num_table+1):
        new_table=h[h.table_num==k]
        new_table.loc[:,"rownum"]=range(0,len(new_table))
        row_table=pages[p]["tables"][k]["rows"]
        col_table=pages[p]["tables"][k]["columns"]
        b=np.zeros((row_table,col_table))
        b=pd.DataFrame(b)
        s=0
        for i,j in zip(new_table["rowIndex"],new_table["columnIndex"]):
            b.loc[i,j]=new_table.loc[new_table.loc[s,"rownum"],"text"]
            s=s+1
            
    # Here is the upload to the blob storage
    tab1_csv=b.to_csv(header=False,index=False,mode='w')
    name1=(os.path.splitext(text1)[0]) +'.csv'
    container_client.upload_blob(name=name1,data=tab1_csv)

Data Visualization

Once the csv file is created in the Output container, we can create a visualization in Power BI Desktop using the Azure Blob Storage Connector.

Once we connect to the storage account, we can create a simple visualization in Power BI from the csv file available in the output container.

Sample Visualization from the Processed Data | azure

The above report can be published further to Power BI services with proper dataset refresh intervals to get real-time reporting.

Conclusion

The intelligent data processing solution using Azure Form Recognizer, Azure Function, and Power BI visualization provides a powerful tool for industries to automate data extraction and analysis from a wide range of forms, documents, and receipts. This solution offers numerous benefits, including increased efficiency, accuracy, and cost savings for businesses by reducing manual data entry and errors and providing timely insights for better decision-making.

The key takeaways from this article are:

Intelligent Document Processing (IDP) is a technology that automates the extraction of data from documents using machine learning algorithms.
Azure Form Recognizer is a cloud-based IDP service offered by Microsoft Azure that can extract structured data from various types of documents, such as invoices, receipts, and forms.
Azure Functions is a serverless computing service offered by Microsoft Azure that enables developers to run code in response to events and triggers without the need to manage infrastructure.
By combining Azure Form Recognizer with Azure Functions, developers can create intelligent document processing workflows that automatically extract data from documents and integrate it into other applications or systems.
At the end, we also discussed how to implement the end-to-end architecture, including Power BI for visualization.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.