Microsoft Azure Form Recognizer is an AI service that uses Optical Character Recognition (OCR) to extract text and structure from documents and images. It is quite exciting to easily integrate Microsoft Azure Form Recognizer with Dynamics 365.
Here is how the end result will look.
We will see:
- How to set up Form Recognizer Azure service.
- How to train the model on our custom dataset.
- How to process documents to extract text and structure.
Our dataset is an Invoice .jpg file.
We will need to extract the following information from the .jpg file.
- Invoice Number
- Invoice Date
- Invoice Line: Description
- Invoice Line: Quantity
- Invoice Line: Unit Price
We will define 2 subsets "Train Dataset" and "Test Dataset".
Train Dataset - Dataset that is used to teach a machine learning model.
Test Dataset - Dataset that is used to test a machine learning model.
In our example, we have 4 Invoice picture files for the Train dataset and 1 picture file for the test dataset.
We will need the following Azure components.
Form recognizer – OCR service to extract text and structure from the image.
Storage account – Azure storage to store train data.
Resource group – Container to store "Form Recognizer" and "Storage Account".
Let's start by creating a new resource group.
Under the resource group create a form recognizer component.
Under the resource group create a storage account component.
After creating all required components your resource group should contain the following items.
Navigate to the storage account and create a new container called "train-data".
Open the container and upload the training datase.
From the container navigate to the "Shared access tokens" and create a new token.
Copy and save generated container's "Blob SAS URL".
Navigate back to the storage account and enable CORS for the https://fott-2-1.azurewebsites.net/ site.
Do the same for the "File service", "Queue service" and "Table service" tabs.
Navigate to the "Keys and Endpoints" section of the form recognizer and save "Endpoint" and "Key 1".
Navigate to https://fott-2-1.azurewebsites.net/ and click on settings. Click on the "Application Settings" left-bottom icon and add a new security token name and key.
Save Token Name and Key.
Click on the "Connections" left-side icon and click on add a new connection. In the the "SAS URI" field specify "Blob SAS URL" of the container.
Click "Save Connection".
Click on the left-side home icon and click on "Use Custom to train a model with labels and get key value pairs".
Click on "New Project".
In the "Project Settings" window add the following information.
- Project Name
- Security Token - Project security token generated in the previous step.
- Source Connection - Connection to the storage account container. Created on a previous step.
- Form recognizer data - Form recognizer URL and key, saved from the previous step.
Click "Save Project".
In the new window add "Invoice Number" and "Invoice Date" tags from the top-right side.
Add a new table tag from the top-right side called "Invoice Lines" with the type "Row dynamic". Add "Description", "Quantity" and "Unit Price" column tags.
Click "Save".
We need to perform annotation (map created tags to the piece of text on invoice form) for all training data files.
For that click on the invoice number piece of text on the form and then click on the "Invoice Number" tag on the top-right. Do the same for the Invoice date.
Values should appear under tag names.
Click on the "Click to assign labels" under the "Invoice Lines" table tag. Assign "Description", "QTY" and "Unit Price" rows from the invoice form.
Click "Done".
Do the same annotation for all train data.
After labeling all data, click on the left-side "Train" icon.
Give a name to the model and click "Train".
Click on the "Analyze" icon and select a new file from the test dataset.
Save Model Id. We will use it in the next article to call API from JavaScript.
Click "Run Analysis".
And we can see that the Machine Learning model successfully recognized tagged information.
We successfully tested our model and it's ready. In the next article, we will discuss how to call Azure Form Recognizer API from Dynamics 365 JavaScript web resource. Stay tuned! :)
In this section, we will discuss how to re-open the existing model project if you closed https://fott-2-1.azurewebsites.net site.
-
Navigate to https://fott-2-1.azurewebsites.net.
-
Click on the "Application Settings" left-bottom icon and add the security token name and key, that you saved from the previous step.
Click "Save Settings".
- Click on the "Connections" left-side icon and click on add a new connection. In the "SAS URI" field specify the "Blob SAS URL" of the container, that you saved from the previous step.
- Click on the left-side home icon and click on "Use Custom to train a model with labels and get key value pairs".
Click on "Open Cloud Project".
- Click on cloud connection "Blob-Connection".
- Click on the .fott file.
And that will open a project.