Full text indexing on large files (more than 32k)

2019-04-15 21:15发布

Is it possible to use Azure Search on blobs over 32kB size? I have around 500GB of text files stored as blobs on Azure. Average blob size is around 1MB. I was so exited to try Azure Search to have full text search on files. However, it looks like index field Edm.String cannot be more than 32kB. I couldn't find this exact limit anywhere, I extracted this information from error message in the portal.

Is there any out of the box solution on Azure that I can use to add full text search functionality on Blobs? Does Azure team plan to remove 32kB field size?

1条回答
Emotional °昔
2楼-- · 2019-04-15 21:42

Two different limits are potentially relevant here:

  1. Azure Search has a limit on how many characters it will extract from a blob, depending on the pricing tier. For free tier, that limit is 32*1024 characters. For the Standard S1 and S2 pricing tiers, it's 4 million characters.

  2. Separately, there's a limit on the size of a single term in the search index - it also happens to be 32KB. If the content field in your search index is marked as filterable, facetable or sortable then you'll hit this limit (regardless of whether the field is marked as searchable or not). Typically for large searchable content you want to enable searchable and sometimes retrievable but not the rest. That way you won't hit limits on content length from the index side.

We realize that the first limit especially isn't documented now; we'll reflect this in our Quotas and Limits page soon.

查看更多
登录 后发表回答