How to know character encoding of file names depen

2020-07-10 07:26发布

问题:

I would like to know the character encoding of the file names in a filesystem in order to display correctly them in a GUI.

How should I do this ?

I suppose I get different character encoding depending on the file system (FAT, NTFS, ext3, etc.)

Thank you

(I work in C++ but this topic is not language related)

回答1:

NTFS is Unicode (UTF-16). exFAT is Unicode as well.

Original FAT and fAT32 use OEM character set (read more on MSDN).

On Linux and Unix filename may contain any bytes except NUL and the charater set is not defined. Consequently each application decides itself which one to use. Many applications use UTF8. See more in this question.

The above unix approach is used on most filesystems (mainly because the "charset" concept has more meaning on the OS level than on the storage level). You can check FS capabilities and requirements regarding filename characters here (table 2 column 3).



回答2:

In Linux run then following command: locale | egrep "LANG=" | cut -d . -f 2

On Unix-like systems, the encoding of file names is not set at the filesystem level, but rather in the user environment. For instance, UTF-8 is the default setting in Ubuntu.

On Windows default encoding is CP-1252 (AKA ISO-8859-1 or Latin-1), but FS uses Unicode via UTF-16 encoding. See http://en.wikipedia.org/wiki/Filename.

But if you use Qt, you can build the following with Qt Creator and result be the current user encoding name.

#include <QTextCodec>
#include <iostream>

using namespace std;
int main(int argc, char *argv[])
{
  Q_UNUSED(argc); Q_UNUSED(argv);
  QTextCodec* tc = QTextCodec::codecForLocale();

  cout << "Current names text codec: " << tc->name().data() << endl;
  return 0;
}