Reading a line from file in C, dynamically

2019-07-24 16:36发布

问题:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    FILE *input_f;

    input_f = fopen("Input.txt", "r"); //Opens the file in read mode.
    if (input_f != NULL)
    {
        char line[2048];

        while( fgets(line, sizeof line, input_f) != NULL )
        {
            //do something
        }
        fclose(input_f); //Close the input file.
    }
    else
    {
        perror("File couldn't opened"); //Will print that file couldn't opened and why.
    }
    return 0;
}

Hi. I know I can read line by line with this code in C, but I don't want to limit line size, say like in this code with 2048.

I thought about using malloc, but I don't know the size of the line before I read it, so IMO it cannot be done.

Is there a way to not to limit line size?

This question is just for my curiosity, thank you.

回答1:

When you are allocating memory dynamically, you will want to change:

char line[2048];

to

#define MAXL 2048           /* the use of a define will become apparent when you  */
size_t maxl = MAXL;         /* need to check to determine if a realloc is needed  */
char *line = malloc (maxl * sizeof *line);
if (!line)                  /* always check to insure allocation succeeded */
    ...error.. memory allocation failed

You read read up to (maxl -1) chars or a newline (if using fgetc, etc..) or read the line and then check whether line [strlen (line) - 1] == '\n' to determine whether you read the entire line (if using fgets). (POSIX requires all lines terminate with a newline) If you read maxl characters (fgetc) or did not read the newline (fgets), then it is a short read and more characters remain. Your choice is to realloc (generally doubling the size) and try again. To realloc:

char *tmp = realloc (line, 2 * maxl)
if (tmp) {
    line = tmp;
    maxl *= 2;
}

Note: never reallocate using your original pointer (e.g. line = realloc (line, 2 * maxl) because if realloc fails, the memory is freed and the pointer set to NULL and you will lose any data that existed in line. Also note that maxl is typically doubled each time you realloc. However, you are free to choose whatever size increasing scheme you like. (If you are concerned about zeroing all new memory allocated, you can use memset to initialize the newly allocated space to zero/null. Useful in some situations where you want to insure your line is always null-terminated)

That is the basic dynamic allocation/reallocation scheme. Note you are reading until you read the complete line, so you will need to restructure your loop test. And lastly, since you allocated the memory, you are responsible for freeing the memory when you are done with it. A tool you cannot live without is valgrind (or similar memory checker) to confirm you are not leaking memory.

Tip if you are reading and want to insure your string is always null-terminated, then after allocating your block of memory, zero (0) all characters. As mentioned earlier, memset is available, but if you choose calloc instead of malloc it will zero the memory for you. However, on realloc the new space is NOT zero'ed either way, so calling memset is required regardless of what function originally allocated the block.

Tip2 Look at the POSIX getline. getline will handle the allocation/reallocation needed so long as line is initialized to NULL. getline also returns the number of characters actually read dispensing with the need to call strlen after fgets to determine the same.

Let me know if you have additional questions.



回答2:

Consider 2 thoughts:

  1. An upper bound of allocated memory is reasonable. The nature of the task should have some idea of a maximum line length, be it 80, 1024 or 1 Mbyte.

  2. With a clever OS, actual usage of allocated memory may not occur until needed. See Why is malloc not "using up" the memory on my computer?

So let code allocate 1 big buffer to limit pathological cases and let the underlying memory management (re-)allocate real memory as needed.

#define N (1000000)
char *buf = malloc(N);
...
while (fgets(buf, N, stdin) != NULL)) {
  size_t len = strlen(buf);
  if (len == N-1) {
    perror("Excessive Long Line");
    exit(EXIT_FAILURE);
  }
}
free(buf);