I have large strings that resemble the following...
some_text_token 24.325973 -20.638823 -1.964366 0.753947 -1.290811 -3.547422 0.813014 -3.547227 0.472015 3.723311 -0.719116 3.676793 other_text_token 24.325973 20.638823 -1.964366 0.753947 -1.290811 -3.547422 -1.996611 -2.877422 0.813014 -3.547227 1.632365 2.083673 0.472015 3.723311 -0.719116 3.676793 ...
...from which I'm trying to efficiently, and in the interleaved sequence they appear in the string, grab...
- the text tokens
- the float values
- the blank lines
...but I'm having trouble.
I've tried strtod and successfully grabbed the floats from the string, but I can't seem to get a loop using strtod to report back to me the interleaved text tokens and blank lines. I'm not 100% confident strtod is the "right track" given the interleaved tokens and blank lines that I'm also interested in.
The tokens and blank lines are present in the string to give context to the floats so my program knows what the float values occurring after each token are to be used for, but strtod seems more geared, understandably, toward just reporting back floats it encounters in a string without regard for silly things like blank lines or tokens.
I know this isn't very hard conceptually, but being relatively new to C/C++ I'm having trouble judging what language features I should focus on to take best advantage of the efficiency C/C++ can bring to bear on this problem.
Any pointers? I'm very interested in why various approaches function more or less efficiently. Thanks!!!
This is a bit crude and untested, but the general idea is to try parsing each line and see what's there:
Using C, I would do something like this (untested):
The check for blank line is first because it's relatively inexpensive. Depending upon your needs:
MAX
,buf
ends with a newline, if it doesn't, then the line was too long (go to 1 or 3 in that case),malloc()
andrealloc()
to dynamically allocate the buffer (see this for more),sscanf()
returns the number of input items successfully matched and assigned.I am also assuming that blank lines are really blank (just the newline character by itself). If not, you will need to skip leading white-space.
isspace()
inctype.h
is useful in that case.fp
is a validFILE *
object returned byfopen()
.Wow, I don't write many parsers in C any more
This has been tested on the OP's input