How to use non-ASCII characters in Matlab figures

2019-01-18 12:34发布

问题:

I am using including Matlab-drawn figures into LaTeX. My usual workflow is as following:

  1. Script in matlab creates figure(s),
  2. I tweak what I find needs to be tweaked in visual figure editor,
  3. Figure is saved as .fig (for future modification) and .eps (for including in LaTeX),
  4. I convert .eps files to .pdf,
  5. PDF files are referenced in LaTeX source code.

To the point: when I try to use in axis labels, legend, titles, etc. non-ASCII chars, (to be exact: Polish national chars e.g. 'ą', 'ę', 'ś', 'ć') encoding in Matlab figure editor is fine and characters display properly. After exporting to .eps, they are all wrong (example: "Głębokość" turns into "G³êbokoœæ").

Does there exist a way to do this properly, either by tuning Matlab options or changing my workflow?

Note: I found that export to .png or other non-vector formats handles character encoding properly, but I would like to avoid having to do that -- I'm asking for a way to "keep it vector". Export directly to .pdf produces the same effect as .eps, e.g. it is producing wrong results.

PS. Matlab is R2008a, .latex files are compiled with pdflatex, .eps files with epstopdf from MikTeX 2.9 (all under Win7).

回答1:

You could have a look at psfrag, that's what I usually use when I try to use Matlab figures in LaTeX. You basically put just tags into the figure in Matlab and replace those tags with LaTeX text afterwards. The biggest benefit is that this allows you to have identical symbols in text and figures.

Edit: when looking for the psfrag-URL, I found a Matlab script to simplify this: LaPrint.



回答2:

Another possible solution would be to use matlab2tikz. It creates a tikz/pgfplot source file that may be included directly by your latex source. This means that it uses LaTeX's facilities for font rendering. You may directly edit the generated file to tweak the labels and such. Unfortunately, it doesn't work for all MATLAB figures.



回答3:

char(2048) will be shown by `print -depsc` as 'à ',
char(5064) as 'á',
char(28808) as 'ç',
char(37000) as 'é',
char(32904) as 'è', ...

For other characters in latin1 charset, Look at:

for j=0:4*64;clf;subplot(1,1,1);plot(eye(2));leg='';for i=4*(j+1)-1:-1:max(1,4*j);
str=['     ',num2str(i*64)];leg(i,:)=[str(end-4:end),':',char(64*i+(0:63))];
end;
title(leg,'interpreter','none');print('-depsc',['ascii',num2str(j),'.ps']);
end;

I am using pdflatex, so psfrag is not an option, and pdfrack seems to be broken.



回答4:

For exporting a Matlab figure with non-ASCII ISO-8859-1 characters, there is no problem on Windows, but on Linux with a UTF-8 locale there is a Matlab bug and a workaround. The question here targets characters that are not in ISO-8859-1, which is more tricky. Here is a solution that I posted on a related question.

If the number of characters needed is less than 256 (8-bit format) and ideally in a standard encoding set, then one solution is to:

  1. Convert the octal code into the Unicode character;
  2. Save the file into the target encoding standard (in a 8-bit format);
  3. Add the encoding vector for the target encoding set.

For example, if you want to export Polish text, you need to convert the file into ISO-8859-2. Here is an implementation with Python (multi-platform):

#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys,codecs
input = sys.argv[1]
fo = codecs.open(input[:-4]+'_latin2.eps','w','latin2')
with codecs.open(input,'r','string_escape') as fi:
    data = fi.readlines()
with open('ISOLatin2Encoding.ps') as fenc:
    for line in data:
        fo.write(line.decode('utf-8').replace('ISOLatin1Encoding','MyEncoding'))
        if line.startswith('%%EndPageSetup'):
            fo.write(fenc.read())
fo.close()

saved as eps_lat2.py; then running the command python eps_lat2.py file.eps, where file.eps is the eps created by Matlab, creates file_latin2.eps with Latin-2 encoding. The file ISOLatin2Encoding.ps contains the encoding vector:

/MyEncoding
% The first 144 entries are the same as the ISO Latin-1 encoding.
ISOLatin1Encoding 0 144 getinterval aload pop
% \22x
    /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
    /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
% \24x
    /nbspace /Aogonek /breve /Lslash /currency /Lcaron /Sacute /section
    /dieresis /Scaron /Scedilla /Tcaron /Zacute /hyphen /Zcaron /Zdotaccent
    /degree /aogonek /ogonek /lslash /acute /lcaron /sacute /caron
    /cedilla /scaron /scedilla /tcaron /zacute /hungarumlaut /zcaron /zdotaccent
% \30x
    /Racute /Aacute /Acircumflex /Abreve /Adieresis /Lacute /Cacute /Ccedilla
    /Ccaron /Eacute /Eogonek /Edieresis /Ecaron /Iacute /Icircumflex /Dcaron
    /Dcroat /Nacute /Ncaron /Oacute /Ocircumflex /Ohungarumlaut /Odieresis /multiply
    /Rcaron /Uring /Uacute /Uhungarumlaut /Udieresis /Yacute /Tcedilla /germandbls
% \34x
    /racute /aacute /acircumflex /abreve /adieresis /lacute /cacute /ccedilla
    /ccaron /eacute /eogonek /edieresis /ecaron /iacute /icircumflex /dcaron
    /dcroat /nacute /ncaron /oacute /ocircumflex /ohungarumlaut /odieresis /divide
    /rcaron /uring /uacute /uhungarumlaut /udieresis /yacute /tcedilla /dotaccent
256 packedarray def

Here is another implementation on Linux with Bash:

#!/bin/bash
name=$(basename "$1" .eps)
ascii2uni -a K "$1" > /tmp/eps_uni.eps
iconv -t ISO-8859-2 /tmp/eps_uni.eps -o "$name"_latin2.eps
sed -i -e '/%EndPageSetup/ r ISOLatin2Encoding.ps' -e 's/ISOLatin1Encoding/MyEncoding/' "$name"_latin2.eps

saved as eps_lat2; then running the command sh eps_lat2 file.eps creates file_latin2.eps with Latin-2 encoding.

It can easily be adapted to other 8-bit encoding standards by changing the encoding vector and the iconv (or codecs.open) parameter in the script.