I need to convert a pdf in grayscale if it does contain colors.
For this purpose i found a script which can determine if the pdf is already in grayscale or not.
convert "source.pdf" -colorspace RGB -unique-colors txt:- 2> /dev/null \
| egrep -m 2 -v "#([0-9|A-F][0-9|A-F])\1{3}" \
| wc -l
This counts how many colors with different values of RGB (so they are not gray) are present in the document.
If the pdf is not already a grayscale document i proceed with the conversion with ghostscript
gs \
-sOutputFile=temp.pdf \
-sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray \
-dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 \
-dNOPAUSE \
-dBATCH \
source.pdf < /dev/null
If i open the output document with a PDF viewer it shows without colors correctly. But if i try the first script on the new generated document it turns out that it still does contain some colors. How can i convert a document to precise grayscale? I need this because if i print this document with a color printer, the printer will use colors and not black to print gray.
I value ImageMagick in general very much -- but don't trust convert
to count the colors correctly with the command you're using...
May I suggest a different method to discover if a PDF page uses color? It is based on a (relatively new) Ghostscript device called inkcov
(you need Ghostscript v9.05 or newer). It displays the ink coverage of CMYK for each single page (for RGB colors, it does a silent conversion to CMYK internally).
First, generate an example PDF with the help of Ghostscript:
gs \
-o test.pdf \
-sDEVICE=pdfwrite \
-g5950x2105 \
-c "/F1 {100 100 moveto /Helvetica findfont 42 scalefont setfont} def" \
-c "F1 (100% 'pure' black) show showpage" \
-c "F1 .5 .5 .5 setrgbcolor (50% 'rich' rgbgray) show showpage" \
-c "F1 .5 .5 .5 0 setcmykcolor (50% 'rich' cmykgray) show showpage" \
-c "F1 .5 setgray (50% 'pure' gray) show showpage"
While all the pages do appear to the human eye to not use any color at all, pages 2 and 3 do indeed mix their apparent gray values from color.
Now check each page's ink coverage:
gs -o - -sDEVICE=inkcov test.pdf
[...]
Page 1
0.00000 0.00000 0.00000 0.02230 CMYK OK
Page 2
0.02360 0.02360 0.02360 0.02360 CMYK OK
Page 3
0.02525 0.02525 0.02525 0.00000 CMYK OK
Page 4
0.00000 0.00000 0.00000 0.01982 CMYK OK
(A value of 1.00000 maps to 100% ink coverage for the respective color channel. So 0.02230
in the first line of the result means 2.23 %
of the page area is covered by black ink.) Hence the result given by Ghostscript's inkcov
is exactly the expected one:
- pages 1 + 4 don't use any of C (cyan), M (magenta), Y (yellow) colors, but only K (black).
- pages 2 + 3 do use ink of C (cyan), M (magenta), Y (yellow) colors, but no K (black) at all.
Now let's convert all pages of the original PDF to use the DeviceGray
colorspace:
gs \
-o temp.pdf \
-sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray \
-sProcessColorModel=DeviceGray \
test.pdf
...and check for the ink coverage again:
gs -q -o - -sDEVICE=inkcov temp.pdf
0.00000 0.00000 0.00000 0.02230 CMYK OK
0.00000 0.00000 0.00000 0.02360 CMYK OK
0.00000 0.00000 0.00000 0.02525 CMYK OK
0.00000 0.00000 0.00000 0.01982 CMYK OK
Again, exactly the expected result in case of succesful color conversions! (BTW, your convert
command returns 2
for me for both files, the [original] test.pdf
as well as the [gray-converted] temp.pdf
-- so this command cannot be right...)
Maybe your document contains transparent figures. Try passing option
-dHaveTransparency=false
to your ghostscript conversion command. The full list of options for the pdfwrite device can be found at http://ghostscript.com/doc/current/Ps2pdf.htm#Options