How to read UTF-8 file data in C++?

2019-07-11 06:24发布

问题:

I have a list of IPA (UTF-8) symbols in a text file called ipa.txt with numbers assigned to them. How do I cross reference it with a source file which is also a text file that contains a bunch of words and their corresponding IPA, to return a text file for every names with their names as their filename and inside the text file should contain their corresponding numbers of IPA.

Below is what I've tried but didn't work, only outputs were mostly 000000.

int main()
{
    std::unordered_map <wchar_t, int> map;
    std::wifstream file;
    file.open("ipa.txt");
    if (file.is_open()) {
        std::cout << "opened ipa file";
    }

    wchar_t from;
    int to;
    while (file >> from >> to) {
        map.insert(std::make_pair(from, to));
    }

    std::wifstream outfile;
    outfile.open("source.txt");
    if (outfile.is_open()) {
        std::cout << "opened source file";
    }

    std::wstring id;
    std::wstring name;
    while (outfile >> id >> name) {
        std::ofstream outputfile;
        outputfile.open(id + L".txt");
        for (wchar_t c : name)  outputfile << map[c]; 
    }

    system("pause");

    return 0;
}

回答1:

I believe you are using the wrong type for c used in the iteration over name. As c is used as key for the map, and name is a wstring, you should use:

for (wchar_t c : name)  outputfile << map[c]; 

instead of:

for (char c : name)  outputfile << map[c]; 

Isn't it?

Hope this may help, Stefano



回答2:

First thought:

map <- std::unordered_map<char, int>
open ipa.txt:
    for each line in file:
        map[line[0]] = line[1]
open source.txt:
    for each line in file:
        create and open line[0].txt:
            for each char in line[1]:
                write map[char] to line[0].txt

Regarding the actual C++ implementation, AFAIK utf-8 should fit inside char and std::string so you don't have to do anything special there. If you need utf-8 string literals you must use the u8 prefix: u8"literal". Everything else should be standard file IO.

EDIT: Here are some links to the relevant documentation to help you get started:

  • ifstream (for reading from files)
  • ofstream (for writing to files)
  • unordered_map (for mapping 'keys' to 'values')

Outside of that it will probably just take a little Googling. File IO is very common so I'm sure you can find some good examples online. As long as your file format is consistent you shouldn't have too much trouble with the file parsing. Then the rest of it is just storing values in the map and then looking them up when you need them, which is pretty simple.



标签: c++ utf-8 io