Is there any method to search content of binary files like PPT, PDF etc other than
converting it into X HTML format using xdmp:document-filter() function and searching on it ?
Is there any method to search content of binary files like PPT, PDF etc other than
converting it into X HTML format using xdmp:document-filter() function and searching on it ?
Basically no. You have to pull out the readable text out of the binary format to allow MarkLogic to index it. You can extract that text with
xdmp:document-filter()
or with functions likexdmp:pdf-convert()
andxdmp:word-convert()
, but there is no way to index binary nodes directly.HTH!