I want to convert this PDF file compiled with LaTeX (XeLaTeX engine so that to use an Arabic font) and I want to upload it to the web and prevent copy and paste of its content.
Since I am looking for a freeware to do that, I came across two powerful beasts to do this job, namely, ImageMagick
and Ghostscript
. All what I need is to convert one text PDF to image PDF in one go, preferably with batch processing if possible (to convert many PDFs in one go).
I run this code in command line and it works fine for English-written PDFs:
convert someenglish.pdf output.pdf
Now when I do the same for an Arabic PDF I get this error:
convert.exe: PDFDelegateFailed `[ghostscript library] -q -dQUIET -dSAFER -dBATCH
-dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sD
EVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" "-sOutputFile
=C:/Users/doctorate/AppData/Local/Temp/magick-65203BNMxTDhXtkF%d" "-fC:/Users/doctorate/Ap
pData/Local/Temp/magick-65206AK54hOoKA62" "-fC:/Users/doctorate/AppData/Local/Temp/ma
gick-6520hDn-KMyTyxy2"': **** Error reading a content stream. The page may be
incomplete.
**** Incorrect object count in object stream.
Error: /rangecheck in resolveobjectstream
Operand stack:
78424 10 1 10 --dict:7/15(L)-- 26 --nostringval-- 35 --nostri
ngval-- --dict:2/2(L)-- --dict:2/2(L)-- --dict:2/2(L)-- --dict:2/2(L)--
--dict:4/4(L)-- --dict:4/4(L)-- --dict:4/4(L)-- --dict:4/4(L)-- --dict
:4/4(L)-- --dict:3/3(L)-- --dict:2/2(L)-- --nostringval-- --dict:7/7(L)-
- --dict:10/10(L)-- --nostringval-- --nostringval-- Type Font Subtyp
e CIDFontType2 BaseFont MYCROL+(AH
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-
- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- fa
lse 1 %stopped_push 1983 1 3 %oparray_pop 1982 1 3 %oparray_
pop 1966 1 3 %oparray_pop --nostringval-- --nostringval-- --nostri
ngval-- --nostringval-- --nostringval-- --nostringval-- --nostringval--
--nostringval-- --nostringval--
Dictionary stack:
--dict:1193/1684(ro)(G)-- --dict:1/20(G)-- --dict:82/200(L)-- --dict:82
/200(L)-- --dict:116/127(ro)(G)-- --dict:280/300(ro)(G)-- --dict:24/32(L)-
-
Current allocation mode is local
GPL Ghostscript 9.15: Unrecoverable error, exit code 1
@ error/pdf.c/InvokePDFDelegate/263.
convert.exe: no images defined `test.pdf' @ error/convert.c/ConvertImageCommand/
3210.
Question
What am I missing here? I am not a programmer, so please consider this in your answer. I am very grateful if you could show how to do this in batch process.
Notes
Windows 7 32bit
Ghostscript version 9.15
Quality of image is not an issue for me even 72dpi will be fine
I want to strike a balance between size of the output and clarity of text. I just want the text to be readable on the web and not to do some OCR processing with it, so image doesn't need to be very sharp. Size of output is more important, the less the better and honestly I am clueless as to what might works better; to convert the PDF file into PNG or into JPEG in this case.
I don't want to burst a PDF into multiple serially named PNGs or JPEGs, simply one PDF to another PDF but as images inside and no more copy&paste-prone text.
Update
I tried to make a minimal working example PDF to mimic the original PDF and found that problem arises by including a certain Arabic font called (AH) Manal Black
. Running pdffonts
from command line on this MWE PDF gives:
Syntax Error (18062): Illegal character ')'
Syntax Error (18076): Dictionary key must be a name object
Syntax Error (18085): Dictionary key must be a name object
Syntax Error (18248): Illegal character ')'
Syntax Error (18248): Dictionary key must be a name object
Syntax Error (18253): Dictionary key must be a name object
Syntax Error (18599): Illegal character ')'
Syntax Error (18599): Dictionary key must be a name object
Syntax Error (18607): Dictionary key must be a name object
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
GAKHDJ+(AH CID TrueType yes yes yes 5 0
HTCSVQ+Amiri-Regular CID TrueType yes yes yes 7 0
By excluding this Arabic font when compiling the document using LaTeX/XeTeX engine, the convert command works just fine like in other English PDFs. So most probably this problem is linked to parsing of the fonts.
Update: A minimally working example is here: https://www.dropbox.com/s/qdeuzips0ivas4q/mwe_ar.pdf?dl=0