How to read correctly certain strings from file in

2019-08-22 01:05发布

问题:

I am trying to write a program in c that compare strings. The strings are given in pairs and in the top of the file there is the number of the pairs. The file has a form like the following:

2
a: 01010100000101011111
   01001010100000001111
   00000000000011110000
b: 00000111110000010001
   10101010100111110001
a: 00000011111111111100
   00111111111111000
b: 00000001111001010101

My problem is to read the strings properly in order to execute comparisons etc

Here is my code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>

#define NCHAR 32

int main (int argc, char **argv) {
    char *word1 = NULL;
    FILE *fp = NULL;
    for (int i = 0; i<pairs; i++){

        if (i == 0)
        {
            word1 = readWord(fp, &word1);//read a:
            while(strcmp(word1, "") == 0) word1 = readWord(fp, &word1);
        }

        word1 = readWord(fp, &word1);//read string
        while(strcmp(word1, "") == 0) word1 = readWord(fp, &word1);

        aline = malloc(amaxsize);
        strncpy(aline, word1, amaxsize);

        word1 = readWord(fp, &word1); 
        while(strcmp(word1, "") == 0) word1 = readWord(fp, &word1);

        while (strcmp(word1, "b:")!=0){
            aline = concat(aline, word1);

            word1 = readWord(fp, &word1); 
            while(strcmp(word1, "") == 0) word1 = readWord(fp, &word1);
        }

        fprintf(fpw, "a: %s\n", aline); //write to the file..
        free (word1);
        word1 = NULL;

        word1 = readWord(fp, &word1); //read string after b:
        while(strcmp(word1, "") == 0) word1 = readWord(fp, &word1);
        bline = malloc(bmaxsize);
        strncpy(bline, word1, bmaxsize);

        word1 = readWord(fp, &word1); 
        while(strcmp(word1, "") == 0) word1 = readWord(fp, &word1);

        if (i == (pairs-1))
        {

            while (strcmp(word1, "")!=0){
                bline = concat(bline, word1);
                word1 = readWord(fp, &word1);

            }
        }
        else 
        {
            while (strcmp(word1, "a:")!=0){
                bline = concat(bline, word1);
                word1 = readWord(fp, &word1);
                while(strcmp(word1, "") == 0) word1 = readWord(fp, &word1);
            }
        }
        fprintf(fpw, "b: %s\n", bline); //write to the file..
        free (word1);
        word1 = NULL;

        fprintf(fpw,"\n");
}

    char *readWord(FILE *fp, char **buffer)
    {
        int ch, nchar = NCHAR;
        int buflen = 0;
        *buffer = malloc (nchar);

        if(*buffer){
            while ((ch = fgetc(fp)) != '\n' && ch != EOF && ch != '\t' && ch != ' ') 
            {
                if (ch!='\t' && ch!= ' ' && ch != '\n') (*buffer)[buflen++] = ch;

                if (buflen + 1 >= nchar) {  /* realloc */
                    char *tmp = realloc (*buffer, nchar * 2);
                    if (!tmp) {

                        (*buffer)[buflen] = 0;

                        return *buffer;
                    }
                    *buffer = tmp;
                    nchar *= 2;
                }
            }
            (*buffer)[buflen] = 0;           /* nul-terminate */

            if (buflen == 0 && ch == EOF) {  /* return NULL if nothing read */
                free (*buffer);
                *buffer = NULL;
            }
            return *buffer;
        }
        else {
            fprintf (stderr, "Error...\n");
            return NULL;
        }
    }

readWord function reads a word per time. What I am trying to do is reading the file in words and concatenate them to get the full string a and save it in aline so I can work on it. Same with b. The problem is that the file is not read properly, for example instead of getting the whole a of the first pair, I'm getting only the first part of it. Is there any idea?

回答1:

The read you are attempting from the file is non-trivial but can be handled fairly simply by setting a flag telling you whether you are already seen an 'a' or 'b', skipping all whitespace and ':' characters, storing all other characters in your buffer, reallocating as needed, and then when the second 'a' or 'b'is found, putting that character back in the FILE* stream with ungetc, nul-terminating and returning your buffer.

Sounds easy enough -- right? Well, that's pretty much it. Let's look at what would be needed in your readword() function.

First, since you are allocating for buffer in readword(), there is no need to pass char **buffer as a parameter. You have already declared readword as char *readword(...) so just pass the FILE* pointer as a parameter and return a pointer to your allocated, filled and nul-terminated buffer.

You can handle the reallocation scheme any way you like, You can either start with some reasonable number of characters allocated and then double (or add some multiple to) the current size, or just add a fixed amount each time you run out. The example below simply starts with a 32-char buffer and then adds another 32-chars each time reallocation is needed. (if the data size was truly unknown, I would probably start with 32-chars and then double each time I ran out -- completely up to you).

Using the isspace() function found in ctype.h ensures all whitespace is handled correctly.

The last few issues are simply ensuring you return a nul-terminated string in buffer and making sure you re-initialize your pointer to the end of your buffer in each new block of memory when realloc is called.

Putting it altogether, you could do something similar to the following. A simple example program is added after the readword() function to read your example file and output the combined strings read from the file,

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

#define NCHR  32

char *readword (FILE *fp)
{
    int c,                      /* current character */
        firstline = 0;          /* flag for 'a' or 'b' found at 1st char */
    size_t n = 0, nchr = NCHR;  /* chars read, number of chars allocated */
    char *buffer = NULL, *p;    /* buffer to fill, pointer to buffer */

    buffer = malloc (nchr);             /* allocate initial NCHR */
    if (!buffer) {                      /* validate */
        perror ("malloc-buffer");
        return NULL;
    }
    p = buffer;                         /* set pointer to buffer */

    while ((c = fgetc (fp)) != EOF) {   /* read each char */
        if (isspace (c) || c == ':')    /* skip all whitespace and ':' */
            continue;
        if (c == 'a' || c == 'b') {     /* begins with 'a' or 'b' */
            if (firstline) {            /* already had a/b line */
                ungetc (c, fp);         /* put the char back */
                *p = 0;                 /* nul-terminate */
                return buffer;          /* return filled buffer */
            }
            firstline = 1;              /* set firstline flag */
            continue;
        }
        else {
            if (n == nchr - 2) {        /* check if realloc needed */
                void *tmp = realloc (buffer, nchr + NCHR);
                if (!tmp)               /* validate */
                    exit (EXIT_FAILURE);
                buffer = tmp;           /* assign new block to buffer */
                p = buffer + n;         /* set p at buffer end */
                nchr += NCHR;           /* update no. chars allocated */
            }
            *p++ = c;       /* assign the current char and advance p */
            n++;            /* increment your character count */
        }
    }
    *p = 0;         /* nul-terminate */

    return buffer;
}

int main (int argc, char **argv) {

    char buf[NCHR], *word;
    int nwords, toggle = 0;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    if (!fgets (buf, NCHR, fp)) {
        fputs ("error: read of line 1 failed.\n", stderr);
        return 1;
    }
    if (sscanf (buf, "%d", &nwords) != 1) {
        fputs ("error: invalid file format.\n", stderr);
        return 1;
    }
    nwords *= 2;   /* actual number of words is twice the number of pairs */

    while (nwords-- && (word = readword (fp))) {
        printf ("%c: %s\n", toggle ? 'b' : 'a', word);
        free (word);
        if (toggle) {
            putchar ('\n');
            toggle = 0;
        }
        else
            toggle = 1;
    }

    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    return 0;
}

(note: above the toggle is simply a 1 or 0 flag used to either output "a:" or "b:" at the beginning of the appropriate line and add a '\n' between the pairs of lines read.)

Example Use/Output

$ ./bin/read_multiline_pairs dat/pairsbinline.txt
a: 010101000001010111110100101010000000111100000000000011110000
b: 0000011111000001000110101010100111110001

a: 0000001111111111110000111111111111000
b: 00000001111001010101

Memory Use/Error Check

Always verify your memory use when you dynamically allocate storage and ensure you have freed all the memory you allocate.

$ valgrind ./bin/read_multiline_pairs dat/pairsbinline.txt
==14257== Memcheck, a memory error detector
==14257== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==14257== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==14257== Command: ./bin/read_multiline_pairs dat/pairsbinline.txt
==14257==
a: 010101000001010111110100101010000000111100000000000011110000
b: 0000011111000001000110101010100111110001

a: 0000001111111111110000111111111111000
b: 00000001111001010101

==14257==
==14257== HEAP SUMMARY:
==14257==     in use at exit: 0 bytes in 0 blocks
==14257==   total heap usage: 8 allocs, 8 frees, 872 bytes allocated
==14257==
==14257== All heap blocks were freed -- no leaks are possible
==14257==
==14257== For counts of detected and suppressed errors, rerun with: -v
==14257== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Look things over and let me know if you have questions. The largest part of the problem was handling the read and concatenation of all the lines for each pair. The rest of the coding is left to you.