I have several data files that look like this:
HR0
012312010
001230202
HR1
012031020
012320102
012323222
012321010
HR2
321020202
...
To explain: there is a line that defines the field (HR"n"), a variable number of lines with quaternary numbers (321020202) and then an extra newline between two fields. I want to combine equivalent HR fields. So in a sense, I want to zipper these files into one large file. I think using sed is the answer, but I don't know where to start.
And I'm thinking of using a shell script over python or a c++ program because I feel it might be faster in both writing and execution. Thoughts?
This is pretty easy to do in C++, made more so if you have C++17.
You can write a function for reading a multimap<int, int>
something like:
multimap<int, int> read(istream& input) {
multimap<int, int> output;
string i;
while(input >> i) {
const auto key = std::atoi(data(i) + 2);
transform(istream_iterator<int>(input), istream_iterator<int>(), inserter(output, begin(output)), [key](const auto value){ return make_pair(key, value); });
input.clear();
}
return output;
}
So you'll call that function with each file's ifstream
and use merge
to dump the return into your acumulating multimap<int, int> output
.
Then you'll just dump output
to your output file, say it had been opened with ofstream filep
you could dump like this:
auto key = cbegin(output)->first;
filep << key << ":\n" << setfill('0');
for(const auto& it : output) {
if(it.first == key) {
filep << '\t' << setw(9) << it.second << endl;
} else {
key = it.first;
filep << key << ":\n\t" << setw(9) << it.second << endl;
}
}
I've written a live example only involving one file here: http://ideone.com/n47MnS