I'm trying to view a UTF-8 text file/stream in less
, and even if I invoke it like this:
cat file | LESSCHARSET=utf-8 less
the non-ASCII compatible UTF-8 characters don't display correctly. Instead, their hex values appear highlighted in brackets, e.g. <F4>
.
The reading the same text in vim with UTF-8 encoding poses no problems. So I'm thinking something is wrong with the way I'm invoking less
.
My locale
output is the following
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
My less version is the one installed by XCode on OSX Leopard:
$ less --version | sed 's/^/ /'
less 394
Copyright (C) 1984-2005 Mark Nudelman
less comes with NO WARRANTY, to the extent permitted by law.
For information about the terms of redistribution,
see the file named README in the less distribution.
Homepage: http://www.greenwoodsoftware.com/less
locale -a | grep US | sed 's/^/ /'
outputs the following:
en_AU.US-ASCII
en_CA.US-ASCII
en_GB.US-ASCII
en_NZ.US-ASCII
en_US
en_US.ISO8859-1
en_US.ISO8859-15
en_US.US-ASCII
en_US.UTF-8
Try the command
file file.txt
. If, for example, the output is "ISO-8859 English text" then change the encoding of the file from ISO-8859 to UTF-8 via the commandiconv -f ISO-8859-1 -t UTF-8 -o testfile.txt file.txt
. Ifless testfile.txt
displays correctly, finish withmv testfile.txt file.txt
.On Mac OS a charset have to be uppercase:
Here I found list of charsets:
and their aliases:
What does the
locale
command output? Is it a UTF-8 locale?Are you sure your terminal is set to display UTF-8? Does
echo -e '\xe2\x82\xac'
produce the € (euro) sign?Is the locale that you have set even installed on the system? Is it present in the list that
locale -a
outputs?What version of
less
are you using? (Runless --version
to find out.) Really, really old versions did not even supportLESSCHARSET
. This is less likely to be the case, because I have a Debian "sarge" system withless
version 382, and it does not even need LESSCHARSET if the locale is set correctly.My guess is that your file isn't UTF8 but rather ISO8859. (Is the <F4> character supposed to be a 'ô'?)
Start an xterm with
LANG=en_US.ISO-8859-1 xterm
. Then verify the locale (the output oflocale
should be something like en_US.ISO-8859-1). Then use less to view the file. Does it display correctly?Note that it isn't enough to just use
LESSCHARSET=iso8859
without starting a new terminal.LESSCHARSET
tells less to think that the terminal can interpret iso8859, but your terminal probably displays UTF8, since the euro sign displays correctly. But as \xf4 isn't a valid utf8 character, the terminal will probably show something like '�'.