I have a tab delimited file that I am trying to convert to a tab delimited file. I am using C. I am getting stuck on trying to read the second line of the file. Now I just have an tens of thousand of lines repeating the first line.
#include <stdio.h>
#include <string.h>
#define SELLERCODE A2LQ9QFN82X636
int main ()
{
typedef char* string;
FILE* stream;
FILE* output;
string asin[200];
string sku[15];
string fnsku[15];
int quality = 0;
stream = fopen("c:\\out\\a.txt", "r");
output = fopen("c:\\out\\output.txt", "w");
if (stream == NULL)
{
perror("open");
return 0;
}
for(;;)
{
fscanf(stream, "%[^\t]\t%[^\t]", sku, fnsku);
printf("%s\t%s\n", sku, fnsku);
fprintf(output, "%s\t%s\t%\t%s\t%s\t%i\n", sku, fnsku, asin, quality);
}
}
Prefer fgets()
to read the input and parse the lines in your program, using, for example, sscanf()
or strtok()
.
fscanf
is notoriously difficult to use.
Your fscanf is not performing any conversions after the first line.
It reads characters up to a TAB, then ignores the TAB, and reads more characters up to the next TAB. On the 2nd time through the loop, there is no data for sku
: the 1st character is a TAB.
Do check the return value though. It helps enormously.
chk = fscanf(stream, "%[^\t]\t%[^\t]", sku, fnsku);
/* 2 conversions: sku and fnsku */
if (chk != 2) {
/* something went wrong */
}
You are reading with
fscanf(stream, "%[^\t]\t%[^\t]", sku, fnsku);
After the first line is read, which should ends with a tab character (as in "%[^\t]\t%[^\t]"
). The input buffer has the last tab character '\t' which is not read by the above function call. So in the next iteration it gets read at the beginning with your format string. But the fcanf
in the next iteration immediately returns as it has encountered a tab character '\t' at the very beginning ("%[^\t]"
) , so the buffers still have the last read in value. From now on each iteration tries to read the file with the fscanf
but fails every time encountering a '\t'
at the very beginning. So you do not progress reading the file, and the first read values from your program buffers are shown on and on.
You need to read out the last character which terminated the scanset matching. You can either use a fgetc (stream)
after the fscanf ()
call or use the following format string: "%[^\t]\t%[^\t]%*c"
. The %*c
is the assignment suppression syntax. This will make one character read from the input file but then discard it.
Also you should check what the fscanf ()
returns. If it does not return 2 (the number of elements to read) then there is a problem which you should handle. This way you can ensure the correct number of elements were read at one call.
So either you can do:
while (!feof (stream))
{
fscanf(stream, "%[^\t]\t%[^\t]", sku, fnsku);
fgetc (stream);
printf("%s\t%s\n", sku, fnsku);
fprintf(output, "%s\t%s\t%\t%s\t%s\t%i\n", sku, fnsku, asin, quality);
}
Or you can do:
while (!feof (stream))
{
fscanf(stream, "%[^\t]\t%[^\t]%*c", sku, fnsku);
printf("%s\t%s\n", sku, fnsku);
fprintf(output, "%s\t%s\t%\t%s\t%s\t%i\n", sku, fnsku, asin, quality);
}
But i will recommend to read it with fgets ()
and then parse it inside your program with strtok ()
or other means and ways.
EDIT1:
Note that if you have the original file terminated with a '\n'
then after you read the lines as above an extra newline would be added into your buffers. If you still consider to directly read the fields with fscanf ()
where each line has multiple fields seperated with '\t'
and an entry is terminated with a '\n'
then you should use the following format string: "%[^\t]\t%[^\t]\n"
.
It is difficult to answer while we do not get the exact format of the file. Does the file contain only one single line with fields seperated with tabs? Or there are multiple lines, with each line having tab separated fields. If the later is true, best is to scan the whole line at once and then parse it internally.
Ok, here is what is actually happening. You are reading the first line, and from then on you aren't reading anything and just reusing those values. You should check the return value of fscanf
and exit the loop if it is less than two (which it will be after the first iteration). Your fscanf
line should look like this:
if( fscanf(stream, "%[^\t]\t%[^\t]\n", sku, fnsku) < 2 ) break;
The key is the newline at the end, which will eat the newline in the input.
There are some problems with your printf as well. (Incorrect number of formatting strings.) I'll leave that to you.