Is it possible to extract fonts that are embedded in a PDF file to an external ttf file using some utility or script?
If the fonts that are embedded (or not embedded) to a PDF file are present in system. Using pdf2swf and swfextract tools from swftools I am able to determine names of the fonts used in a PDF file. Then I can compile respective system font(s) at run-time and then load to my AIR application.
BUT if the fonts used in the PDF are absent in the system there are two possibilities:
2.1. If they are absent in the PDF files as well (not embedded), we can only use similar system font basing on the font name.
2.2. If they are embedded in the PDF file, then I want to know is it possible at all to extract them to external ttf file so that I can compile each of them to separate swf files at run-time?
I know it's been a while since you asked this, but I figured I might be able to help.
I don't know if there is any utility that will allow you to extract the Font files, but you can do it manually.
Basically a PDF file is a text file with different objects. You can open it with any text editor and look for the fonts.
The fonts are specified in FontDescriptor objects, e.g:
<</Type/FontDescriptor/FontName/ABCDEE+Algerian ... /FontFile2 24 0 R>>
This basically says, a font with the name Algerian is specified on the object 24. You can search the document for the object 24 with the line "24 0 obj", after this line, it displays the properties of the stream with the font file and after the "stream" keyword it starts (its length is defined in the line after the obj).
This stream contains the ttf file, compressed, to decompress it you can use this method:
private static byte[] DecodeFlateDecodeData(byte[] data)
{
MemoryStream outputStream;
using (outputStream = new MemoryStream())
{
using (var compressedDataStream = new MemoryStream(data))
{
// Remove the first two bytes to skip the header (it isn't recognized by the DeflateStream class)
compressedDataStream.ReadByte();
compressedDataStream.ReadByte();
var deflateStream = new DeflateStream(compressedDataStream, CompressionMode.Decompress, true);
var decompressedBuffer = new byte[1024];
int read;
while ((read = deflateStream.Read(decompressedBuffer, 0, decompressedBuffer.Length)) != 0)
{
outputStream.Write(decompressedBuffer, 0, read);
}
outputStream.Flush();
compressedDataStream.Close();
}
return GetStreamBytes(outputStream);
}
}
I hope this helps you... or helps somebody else
It's a late answer but I found a way to do this using freely available windows programs. Won't require scripting or compiling or cygwin. It's a few steps but not as bad as it looks.
Install mupdf
link - http://mupdf.googlecode.com/files/mupdf-0.8.15-windows.zip
and copy your pdf to mupdf's installation folder. Let's say it's called whatever.pdf.
Open a dos/command prompt. Navigate to your mupdf install folder.
example: cd C:\Program Files\mupdf
...If that goes smoothly, your prompt should now look like this: C:\Program Files\mupdf>
Now type the following command:
pdfextract whatever.pdf
Afterwards, within the mupdf program folder, you'll have one or more font files. They'll have names like ABCDEF+Fontname-12.cff ...Right now they're in the unusable .cff format but we'll fix that. I recommend renaming this to something less awkward... for example whatever.cff
More DOS, sorry. You need a tool called cfftot1.exe. Here's a link:
ftp://tug.org/texlive/Contents/live/bin/win32/cfftot1.exe
...Copy it to your mupdf folder. Then type this:
cfftot1 whatever.cff whatever.pfb
You now have an almost usable font file called whatever.pfb. I say 'almost' because usually PFB font files also come with a 2nd file, a PFM file which contains spacing information. Without this file the font won't install and the spacing will be screwed up. But the font will still open in font editors like fontlab. You can save the font from there to TTF or OTF. You can also try fixing the spacing yourself.
If you don't have a font editor, you can use crossfont. Crossfont can take the PFB and generate the necessary PFM file so you can at least install and use the font.
link - http://crossfont.en.softonic.com/
That's it.
A few years ago I have designed a special font. It took me about a year of on and off work. One day my Maxtor HDD died and there was no way I could recover my work. But I had the font embedded in some PDF files for my clients. Then I have the the ideea to extract fonts from these files. After a year or so of looking online for an answer I put together a method to extract fonts from PDF. I have presented this method on my blog at http://pdffontextract.blogspot.com . Since I have come up with this solution many alternetives emerged but there nothing wrong with diversity. I made this post to help other that need to recover their lost work. Have fun and if you need any help don't hesitate to contact me.
The link to get the cfftot1.exe has changed to ftp://tug.org/texlive/Contents/live/bin/i386-linux/
Minor update - some PDFs contain fonts embedded in another unique format, as .CID files.
This format is made for fonts that support a lot of characters (ex. Asian language fonts) and don't map the glyphs to letters in a typical way.
You can still get usable fonts out of a .CID file, you just need to add a step to my answer above.
Run your PDF through a program called PStill (GPStill). The website is here:
http://www.wizards.de/~frank/pstill.html
When choosing your input, change the dropdown from Postscript File to PDF File.
Your output PDF will have _new appended to it.
If you need to unlock a PDF, you can use Advanced PDF Password Recovery from Elcomsoft.
What this step does is convert the CID fonts embedded in the PDF to PFA type 1 fonts. So after running PDFextract, instead of a bunch of useless .CID files, you have .PFA files that can be imported into Fontlab and possibly Crossfont. Be aware that the letters probably won't be mapped correctly, so you really want something like Fontlab to move them around so that e.g. typing A on your keyboard doesn't result in the letter R.
As always if the font was only embedded as a subset, you won't get the whole font, just a limited set of letters.