How do I extract the list of supported Unicode characters from a TrueType or embedded OpenType font on Linux?
Is there a tool or a library I can use to process a .ttf or a .eot file and build a list of code points (like U+0123, U+1234, etc.) provided by the font?
I just had the same problem, and made a HOWTO that goes one step further, baking a regexp of all the supported Unicode code points.
If you just want the array of codepoints, you can use this when peeking at your
ttx
xml in Chrome devtools, after runningttx -t cmap myfont.ttf
and, probably, renamingmyfont.ttx
tomyfont.xml
to invoke Chrome's xml mode:(Also relies on
fonttools
from gilamesh's suggestion;sudo apt-get install fonttools
if you're on an ubuntu system.)You can do this on Linux in Perl using the Font::TTF module.
The Linux program xfd can do this. It's provided in my distro as 'xorg-xfd'. To see all characters for a font, you can run this in terminal:
The character code points for a ttf/otf font are stored in the
CMAP
table.You can use
ttx
to generate a XML representation of theCMAP
table. see here.You can run the command
ttx.exe -t cmap MyFont.ttf
and it should output a fileMyFont.ttx
. Open it in a text editor and it should show you all the character code it found in the font.Here is a method using the FontTools module (which you can install with something like
pip install fonttools
):The script takes as argument the font path :
If you ONLY want to "view" the fonts, the following might be helpful (if your terminal supports the font in question):
An unsafe, but easy way to view:
Thanks to Janus (https://stackoverflow.com/a/19438403/431528) for the answer above.