fscanf() only picking up first line of file

2019-07-20 02:08发布

问题:

I have a tab delimited file that I am trying to convert to a tab delimited file. I am using C. I am getting stuck on trying to read the second line of the file. Now I just have an tens of thousand of lines repeating the first line.

#include <stdio.h>
#include <string.h>
#define SELLERCODE  A2LQ9QFN82X636

int main ()
{
     typedef char* string;
     FILE* stream;
     FILE* output;
     string asin[200];
     string sku[15];
     string fnsku[15];
     int quality = 0;

     stream = fopen("c:\\out\\a.txt", "r");
     output = fopen("c:\\out\\output.txt", "w");

     if (stream == NULL)
     { 
         perror("open");
         return 0;
      }

     for(;;)
     {
       fscanf(stream, "%[^\t]\t%[^\t]", sku, fnsku);
       printf("%s\t%s\n",  sku, fnsku);
       fprintf(output, "%s\t%s\t%\t%s\t%s\t%i\n", sku, fnsku, asin, quality);
     }

}

回答1:

Prefer fgets() to read the input and parse the lines in your program, using, for example, sscanf() or strtok().

fscanf is notoriously difficult to use.
Your fscanf is not performing any conversions after the first line.
It reads characters up to a TAB, then ignores the TAB, and reads more characters up to the next TAB. On the 2nd time through the loop, there is no data for sku: the 1st character is a TAB.

Do check the return value though. It helps enormously.

chk = fscanf(stream, "%[^\t]\t%[^\t]", sku, fnsku);
/* 2 conversions: sku and fnsku */
if (chk != 2) {
    /* something went wrong */
}


回答2:

You are reading with

   fscanf(stream, "%[^\t]\t%[^\t]", sku, fnsku);

After the first line is read, which should ends with a tab character (as in "%[^\t]\t%[^\t]"). The input buffer has the last tab character '\t' which is not read by the above function call. So in the next iteration it gets read at the beginning with your format string. But the fcanf in the next iteration immediately returns as it has encountered a tab character '\t' at the very beginning ("%[^\t]") , so the buffers still have the last read in value. From now on each iteration tries to read the file with the fscanf but fails every time encountering a '\t' at the very beginning. So you do not progress reading the file, and the first read values from your program buffers are shown on and on.

You need to read out the last character which terminated the scanset matching. You can either use a fgetc (stream) after the fscanf () call or use the following format string: "%[^\t]\t%[^\t]%*c" . The %*c is the assignment suppression syntax. This will make one character read from the input file but then discard it.

Also you should check what the fscanf () returns. If it does not return 2 (the number of elements to read) then there is a problem which you should handle. This way you can ensure the correct number of elements were read at one call.

So either you can do:

 while (!feof (stream))
 {
   fscanf(stream, "%[^\t]\t%[^\t]", sku, fnsku);
   fgetc (stream);
   printf("%s\t%s\n",  sku, fnsku);
   fprintf(output, "%s\t%s\t%\t%s\t%s\t%i\n", sku, fnsku, asin, quality);
 }

Or you can do:

 while (!feof (stream))
 {
   fscanf(stream, "%[^\t]\t%[^\t]%*c", sku, fnsku);
   printf("%s\t%s\n",  sku, fnsku);
   fprintf(output, "%s\t%s\t%\t%s\t%s\t%i\n", sku, fnsku, asin, quality);
 }

But i will recommend to read it with fgets () and then parse it inside your program with strtok () or other means and ways.

EDIT1:

Note that if you have the original file terminated with a '\n' then after you read the lines as above an extra newline would be added into your buffers. If you still consider to directly read the fields with fscanf () where each line has multiple fields seperated with '\t' and an entry is terminated with a '\n' then you should use the following format string: "%[^\t]\t%[^\t]\n".

It is difficult to answer while we do not get the exact format of the file. Does the file contain only one single line with fields seperated with tabs? Or there are multiple lines, with each line having tab separated fields. If the later is true, best is to scan the whole line at once and then parse it internally.



回答3:

Ok, here is what is actually happening. You are reading the first line, and from then on you aren't reading anything and just reusing those values. You should check the return value of fscanf and exit the loop if it is less than two (which it will be after the first iteration). Your fscanf line should look like this:

if( fscanf(stream, "%[^\t]\t%[^\t]\n", sku, fnsku) < 2 ) break;

The key is the newline at the end, which will eat the newline in the input.

There are some problems with your printf as well. (Incorrect number of formatting strings.) I'll leave that to you.