I'm trying to enter some UTF-8 characters into a LaTeX file in TextMate (which says its default encoding is UTF-8), but LaTeX doesn't seem to understand them.
Running cat my_file.tex
shows the characters properly in Terminal. Running ls -al
shows something I've never seen before: an "@" by the file listing:
-rw-r--r--@ 1 me users 2021 Feb 11 18:05 my_file.tex
(And, yes, I'm using \usepackage[utf8]{inputenc}
in the LaTeX.)
I've found iconv
, but that doesn't seem to be able to tell me what the encoding is -- it'll only convert once I figure it out.
Which LaTeX are you using? When I was using teTeX, I had to manually download the unicode package and add this to my .tex files:
Now, I've switched over to XeTeX from the TeXlive 2008 package (here), it is even more simple:
As for detection of a file's encoding, you could play with
file(1)
(but it is rather limited) but like someone else said, it is difficult.I implemented the bash script below, it works for me.
It first tries to
iconv
from the encoding returned byfile --mime-encoding
toutf-8
.If that fails, it goes through all encodings and shows the diff between the original and re-encoded file. It skips over encodings that produce a large diff output ("large" as defined by the
MAX_DIFF_LINES
variable or the second input argument), since those are most likely the wrong encoding.If "bad things" happen as a result of using this script, don't blame me. There's a
rm -f
in there, so there be monsters. I tried to prevent adverse effects by using it on files with a random suffix, but I'm not making any promises.Tested on Darwin 15.6.0.
Using the
-I
(that's a capital i) option on the file command seems to show the file encoding.