Concatenate Multiple Data Files [closed]

2019-09-20 12:32发布

问题:

I have several data files that look like this:

HR0
012312010
001230202

HR1
012031020
012320102
012323222
012321010

HR2
321020202
...

To explain: there is a line that defines the field (HR"n"), a variable number of lines with quaternary numbers (321020202) and then an extra newline between two fields. I want to combine equivalent HR fields. So in a sense, I want to zipper these files into one large file. I think using sed is the answer, but I don't know where to start.

And I'm thinking of using a shell script over python or a c++ program because I feel it might be faster in both writing and execution. Thoughts?

回答1:

This is pretty easy to do in C++, made more so if you have C++17. You can write a function for reading a multimap<int, int> something like:

multimap<int, int> read(istream& input) {
    multimap<int, int> output;
    string i;

    while(input >> i) {
        const auto key = std::atoi(data(i) + 2);

        transform(istream_iterator<int>(input), istream_iterator<int>(), inserter(output, begin(output)), [key](const auto value){ return make_pair(key, value); });
        input.clear();
    }
    return output; 
}

So you'll call that function with each file's ifstream and use merge to dump the return into your acumulating multimap<int, int> output.

Then you'll just dump output to your output file, say it had been opened with ofstream filep you could dump like this:

auto key = cbegin(output)->first;

filep << key << ":\n" << setfill('0');

for(const auto& it : output) {
    if(it.first == key) {
        filep << '\t' << setw(9) << it.second << endl;
    } else {
        key = it.first;
        filep << key << ":\n\t" << setw(9) << it.second << endl;
    }
}

I've written a live example only involving one file here: http://ideone.com/n47MnS