How to import data from Azure Blob Storage to Azure Search Service and do the Search with OCR Enabled


Sathish Nadarajan
SharePoint MVP
Published On :   21 Oct 2018
Visit Count
Today :  5    Total :   775




Recently met with an interesting requirement regarding the OCR. After investigating many solutions, implemented the Azure Search Service. Thought of sharing with the community.

In this article, let us see how to use Azure Search Service for the content reading and searching. The raw data is in the Azure Blob Storage.

Request to read the article about the Azure Blob Storage creation for better understanding of Blob Storage Creation.

I am assuming that, we have the Azure Blob Storage available on the Azure Portal.

1. Go to the Azure Portal and Add New Resource – Storage Account- blob, data lake

clip_image002

2. Once the validation succeeded, then click on Create.

clip_image004

3. Click on Blobs.

clip_image006

4. Add a container.

clip_image008

5. Give a name and click OK.

clip_image010

6. Upload Files.

clip_image012

clip_image014

With this, we are done with the Containers.

Now, let us go back to the home of the Azure portal and Create an Azure Search Service.

1. Click Create a resource.

2. Search for Azure Search and Click on the resource.

clip_image016

3. Click on Create.

clip_image018

4. Give the appropriate inputs and Click Create.

clip_image020

5. Once, it got created, we have two resources. One is the storage account and the other one is the search service.

clip_image022

6. Go to Search Service.

clip_image024

7. On the Search Service, there are few important things to be noted.

a. Data Source

b. Index

c. SkillSet

d. Indexer.

8. We can discuss in detail about the above concepts in another article. But as of now, let us see how to create them.

9. Click on Import Data.

clip_image026

10. As I said earlier, there are 4 things which we are going to create.

clip_image028

11. Let us create the Data Source. Select the Azure Blob Storage.

clip_image030

12. Give a name and click on the Storage Container.

clip_image032

13. Select the Storage Account which we created.

clip_image034

14. Select the Container.

clip_image036

15. If at all, we want to Index any specific folder, give the folder name on the screen. Otherwise leave empty. It will crawl all the files and folders.

clip_image038

16. Now, create the skill set. Enter the skillset name and select the OCR Enabled content.

clip_image040

17. Now, create the Index. Give the appropriate index and make sure that the fields are filled up with appropriate “Retrievable”, “Searchable” based on our requirement.

18. One important thing is, don’t make the content and merged_content fields as “Searchable”, Filterable, Sortable, Facetable. Since they may contain a large content, these properties should not be selected for those fields.

clip_image042

19. Now create the Indexer.

clip_image043

20. After selecting all the parameters properly, click ok.

clip_image045

21. We can see the Search Service as below.

clip_image047

22. Go to the Indexer and it is in Progress state.

clip_image048

23. It will take few mins to index the content. It depends up on the size of the blob files.

24. Once, the Indexer runs successfully, we can search the content.

clip_image050

25. Enter the Keywords and click on Search.

clip_image052

26. We will get the results in JSON format.

clip_image054

In this article, we saw how to use the Blob storage service and the Search Service with OCR enabled in Azure. In the upcoming article, we will see how to do them programmatically.

Happy Coding,

Sathish Nadarajan.

Categories