Trying to tweak sscanf() to ignore \\n and \\t [du

2019-01-12 11:43发布

问题:

This question already has an answer here:

  • C: How can I make it so scanf() input has one of two formats? 2 answers

I'm developing a triangle calculation and trying to tweak my sscanf to ignore spaces, newlines \n and tabs \t. How can I do that?

I have:

if(sscanf(str, "{ [ %lf ; %lf ] , [ %lf ; %lf ] , [ %lf ; %lf ] }", &x_1, &y_1, &x_2, &y_2, &x_3, &y_3) == 6)

which perfectly works for inputs like these:

1) {[0;0],[19;10],[0;10]}

2) {     [  0 ; 0],             [12;0],[0    ;10]     }

but it doesn't work for something like this

1) {
[
0
;
15
]
,   [   112 ;   0   ]   ,[112;15]}

What do I need to fix?

回答1:

Note that sscanf() largely ignores white space by default. A white space in the format string matches zero or more white space characters — blanks, tabs, newlines. All but three of the conversion specifications ignore leading white space anyway — the three exceptions being %c, %[…] (scan sets) and %n — so with all the white space in your format string, neither blanks nor tabs should affect the conversion, unless there is white space before the first { (which you could manage by adding a space in the format string before the {). The problem, therefore, is probably not in the sscanf() but in the characters in the data.

If you are reading the data with fgets(), you'll need to accumulate the multiple lines into a single string buffer. If you replaced sscanf() with either scanf() or fscanf(), then your format string would read the newlines as needed (but both would leave any newline after the close brace } to be read by future input operations). When sscanf() processes the data, the string must contain all the requisite characters, nominally up to the end } though up to the last digit of the sixth number would be sufficient.

Given this code — where the third string represents what you say is in your third multiline data (I'm assuming that the leading 1) and 2) plus space are noise in the question since the format string does not attempt to parse those):

#include <stdio.h>

static char *data[] =
{
    "{[0;0],[19;10],[0;10]}",
    "{     [  0 ; 0],             [12;0],[0    ;10]     }",
    "{\n[\n0\n;\n15\n]\n,   [   112 ;   0   ]   ,[112;15]}\n",
};
enum { NUM_DATA = sizeof(data) / sizeof(data[0]) };

int main(void)
{
    for (int i = 0; i < NUM_DATA; i++)
    {
        printf("String: @@%s@@@\n", data[i]);
        double x_1 = -9.9, y_1 = -9.9;
        double x_2 = -9.9, y_2 = -9.9;
        double x_3 = -9.9, y_3 = -9.9;
        int rc = sscanf(data[i], "{ [ %lf ; %lf ] , [ %lf ; %lf ] , [ %lf ; %lf ] }",
                        &x_1, &y_1, &x_2, &y_2, &x_3, &y_3);
        printf("rc = %d: ", rc);
        printf(" 1 = (%.1lf,%.1lf)", x_1, y_1);
        printf(" 2 = (%.1lf,%.1lf)", x_2, y_2);
        printf(" 3 = (%.1lf,%.1lf)", x_3, y_3);
        putchar('\n');
    }
    return 0;
}

I get the output:

String: @@{[0;0],[19;10],[0;10]}@@@
rc = 6  1 = (0.0,0.0) 2 = (19.0,10.0) 3 = (0.0,10.0)
String: @@{     [  0 ; 0],             [12;0],[0    ;10]     }@@@
rc = 6  1 = (0.0,0.0) 2 = (12.0,0.0) 3 = (0.0,10.0)
String: @@{
[
0
;
15
]
,   [   112 ;   0   ]   ,[112;15]}
@@@
rc = 6  1 = (0.0,15.0) 2 = (112.0,0.0) 3 = (112.0,15.0)

As you can see, all three scan operations are successful. That suggests that what you think you've got as the third (multiline) string isn't what you've actually got.

I recommend doing a byte-by-byte dump of the data that fails, to see where the problem is. Also notice that I capture and print the return value from sscanf(); that will help you determine where the faulty character is. And note that sscanf() has no way to tell you that it failed to match the final ] or } in the format string — you'll never know whether those were matched OK with the current format.



回答2:

As said in the comments, fgets doesn't get you the whole thing but only a line at a time. As an alternative to concatenating multiple fgets reads, how about processing your input char by char and using } as termination:

#include <stdio.h>

size_t get_tuple(char s[], size_t sz)
{
    int c;
    size_t i = 0;
    while (i < sz-1) {
        c = getchar();
        if (c == EOF) {
            break;
        }
        else {
            if (c != ' ' && c != '\t' && c != '\n') {
                s[i] = (char) c;
                ++i;
                if (c == '}') {
                    break;
                }
            }
        }
    }
    if (i < sz) {
        s[i] = '\0';
    }
    return i;
}

int main(void)
{
    char s[512];
    double x_1, x_2, x_3, y_1, y_2, y_3;
    size_t len;
    len = get_tuple(s, 512);
    while (len > 0) {
        printf("%s\n",s);
        if(sscanf(s, "{ [ %lf ; %lf ] , [ %lf ; %lf ] , [ %lf ; %lf ] }", &x_1, &y_1, &x_2, &y_2, &x_3, &y_3) == 6) {
            printf("read: x_1=%f, y_1=%f, x_2=%f, y_2=%f, x_3=%f, y_3=%f\n", x_1, y_1, x_2, y_2, x_3, y_3);
        }
        else {
            printf("scanf failed\n"); /* error */
        }
        len = get_tuple(s, 512);
    }
    return 0;
}

Test

Input file

{[0;0],[19;10],[0;10]}

 {     [  0 ; 0],             [12;0],[0    ;10]     }

 {
[
0
;
15
]
,   [   112 ;   0   ]   ,[112;15]}

Output

$ ./main<test.txt 
{[0;0],[19;10],[0;10]}
read: x_1=0.000000, y_1=0.000000, x_2=19.000000, y_2=10.000000, x_3=0.000000, y_3=10.000000
{[0;0],[12;0],[0;10]}
read: x_1=0.000000, y_1=0.000000, x_2=12.000000, y_2=0.000000, x_3=0.000000, y_3=10.000000
{[0;15],[112;0],[112;15]}
read: x_1=0.000000, y_1=15.000000, x_2=112.000000, y_2=0.000000, x_3=112.000000, y_3=15.000000


标签: c c89