Pushing documents(blobs) for indexing - Azure Sear

2019-08-25 21:52发布

问题:

I've been working in Azure Search + Azure Blob Storage for while, and I'm getting trouble indexing the incremental changes for new files uploaded.

How can I refresh the index after upload a new file into my blob container? Following my steps after upload file(I'm using rest service to perform these actions): I'm using the Microsoft Azure Storage Explorer [link].

Through this App I've uploaded my new file to a folder already created before. After that, I used the Http REST to perform a 'Run' indexer command, you can see in this [link].

The indexer shows me that my new file was successfully added, but when I go to search the content in this new file is not found.

Please, anybody knows how to add this new file in Index and also how to find this new file by searching for his content?

I'm following Microsoft tutorials, but for this issue, I couldn't find a solution.

Thanks, guys!

回答1:

Assuming everything is set up correctly, you don't need to do anything special - new blobs will be picked up and indexed the next time indexer runs according to its schedule, or you run the indexer on demand.

However, when you run the indexer on demand, successful completion of the Run Indexer API means that the request to run the indexer has been submitted; it does not mean that the indexer has finished running. To determine when the indexer has actually finished running (and observe the errors, if any), you should use Indexer Status API.

If you still have questions, please let us know your service name and indexer name and we can take a closer look at the telemetry.



回答2:

I'll try to describe how can I figured out this issue.

Firstly, I've created a DataSource through this command:

POST https://[service name].search.windows.net/datasources?api-version=[api-version]

https://docs.microsoft.com/en-us/rest/api/searchservice/create-data-source.

Secondly, I created the Index:

POST https://[servicename].search.windows.net/indexes?api-version=[api-version] 

https://docs.microsoft.com/en-us/rest/api/searchservice/create-index

Finally, I created the Indexer. The problem happened at this moment because it is where all configurations are setted.

POST https://[service name].search.windows.net/indexers?api-version=[api-version]

https://docs.microsoft.com/en-us/rest/api/searchservice/create-indexer

After all these things done. The Index starts indexing all contents automatically (once we have contents into blob storage).

The crucial thing comes now. while your index is trying to extract all 'text' into your files, could occur some issue when the type of file is not 'indexable'. For example, there are two properties that you must pay attention excluded extensions, indexed extensions.

If you don't write the types properly, the Index throws an exception. Then, The Feedback Message(in my opinion is not good, was like a 'miss lead') says to avoid this error you should set the Indexer to '"dataToExtract" : "storageMetadata"'.

This command means that you are trying just index the metadata and no more the content of your files, then you cannot search by this and retrieve.

After that, the same message at the bottom says to avoid these issue you should set two properties (who solved the problem)

"failOnUnprocessableDocument" : false,"failOnUnsupportedContentType" : false

In addition, now everything is working properly. I appreciate your help @Eugene Shvets, and I hope this could be useful for someone else.