Is it possible to upload a document to a blob storage and do the following:
- Grab contents of document and add to index.
- Grab key phrases from contents in point 1 and add to index.
I want the key phrases then to be searchable.
I have code that can upload documents to a blobstorage which works perfect, but the only way to get this indexed(that I know of) is by using the "Import Data" within the Azure Search service, which creates and index with predefined fields - as below:
This works great when only needing these fields and the index gets updated automatically every 5 min. But becomes a problem when I want to have a custom Index
However, the only fields I DO want, are the following:
- fileId
- fileText(this is the content of the document)
- blobURL(To allow downloading of the document)
- keyPhrases(Which are to be pulled from fileText - I have code that does this as well)
The only issue I have is that I need to be able to retrieve the Document content(fileText) to be able to get the keyPhrases, but to my understanding, I can only do this if the Document Content is already in an index for me to access that Content?
I have very limited knowledge with Azure and struggling to find anything that similar to what I want to do.
The code that I am using to upload a document to my blob storage is as follows:
public CloudBlockBlob UploadBlob(HttpPostedFileBase file)
{
string searchServiceName = ConfigurationManager.AppSettings["SearchServiceName"];
string blobStorageKey = ConfigurationManager.AppSettings["BlobStorageKey"];
string blobStorageName = ConfigurationManager.AppSettings["BlobStorageName"];
string blobStorageURL = ConfigurationManager.AppSettings["BlobStorageURL"];
string UserID = User.Identity.GetUserId();
string UploadDateTime = DateTime.Now.ToString("yyyyMMddhhmmss").ToString();
try
{
var path = Path.Combine(Server.MapPath("~/App_Data/Uploads"), UserID + "_" + UploadDateTime + "_" + file.FileName);
file.SaveAs(path);
var credentials = new StorageCredentials(searchServiceName, blobStorageKey);
var client = new CloudBlobClient(new Uri(blobStorageURL), credentials);
// Retrieve a reference to a container. (You need to create one using the mangement portal, or call container.CreateIfNotExists())
var container = client.GetContainerReference(blobStorageName);
// Retrieve reference to a blob named "myfile.gif".
var blockBlob = container.GetBlockBlobReference(UserID + "_" + UploadDateTime + "_" + file.FileName);
// Create or overwrite the "myblob" blob with contents from a local file.
using (var fileStream = System.IO.File.OpenRead(path))
{
blockBlob.UploadFromStream(fileStream);
}
System.IO.File.Delete(path);
return blockBlob;
}
catch (Exception e)
{
var r = e.Message;
return null;
}
}
I hope I havnt given too much information, but I dont know how else to explain what I am looking for. If I am not making sense, please let me know so that I can fix my question.
I am not looking for handout code, just looking for a shove in the right direction.
I would appreciate any help.
Thanks!
We can use Azure Search to index document by Azure Search REST API and .NET SDK. According to your description, I create a demo with .NET SDK and test it successfully. The following is my details steps:
Create custom index field model
[SerializePropertyNamesAsCamelCase] public class TomTestModel { [Key] [IsFilterable] public string fileId { get; set; } [IsSearchable] public string fileText { get; set; } public string blobURL { get; set; } [IsSearchable] public string keyPhrases { get; set; } }
4.Create DataSource
var definition = new Index() { Name = "tomcustomindex", Fields = FieldBuilder.BuildForType<TomTestModel>() }; //create Index if (serviceClient.Indexes.Exists(definition.Name)) { serviceClient.Indexes.Delete(definition.Name); } var index = serviceClient.Indexes.Create(definition);
Upload document to the index,more information about operation storage using SDK please refer to document
Check the search result from the search explore.
Page.config file:
TomTestModel file: