weird malloc behavior in C

2019-01-15 17:58发布

问题:

I have the following ANSI C code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
    char *buffer = 0;
    int length = 0;
    FILE *f = fopen("text.txt", "r");
    if(f) {
        fseek(f, 0, SEEK_END);
        length = ftell(f);
        fseek(f, 0, SEEK_SET);
        buffer = malloc(length);
        fread(buffer, 1, length, f);
        fclose (f);
    }
    printf("File size: %d\nBuffer size: %d\nContent: %s\n=END=", length, strlen(buffer), buffer);
    return 0;
}

Which for some reason after the malloc alocates more memory than needed and output extra garbage from the memory, example: First run:

File size: 12
Buffer size: 22
Content: 123456789012les=$#▬rW|
=END=

Second run:

File size: 12
Buffer size: 22
Content: 123456789012les↔1↕.'
=END=

Third run:

File size: 12
Buffer size: 22
Content: 123456789012les=▬kπà
=END=

Could someone please help me with this and also explain why my version is behaving weird? I use MingW TDM-GCC 4.9.2 32bit for compilation (gcc)

回答1:

You have undefined behavior (this explains why you should be afraid of UB) -because of buffer overflow. You forgot to add a terminating null byte.

Replace the faulty lines:

    // WRONG CODE:
    buffer = malloc(length);
    fread(buffer, 1, length, f);

with

    buffer = malloc(length+1);
    if (!buffer) 
      { perror("malloc"); exit(EXIT_FAILURE); };
    memset (buffer, 0, length+1);
    if (fread(buffer, 1, length, f) < length) 
      { perror("fread"); exit(EXIT_FAILURE); };

(You could zero just the ending byte; I prefer to clear with memset the entire buffer)

BTW, ANSI C is obsolete. You should use a C11 compliant compiler (e.g. a recent GCC used as gcc -std=c11 -Wall -Wextra -g) and target C11 compliance (or at least C99). Learn to use the debugger (e.g. gdb)

Read carefully the documentation of malloc(3), fread(3), perror(3) etc....



回答2:

The use of fseek(f, 0, SEEK_END); invokes undefined behavior. First, you're not reading in binary mode, so the number of bytes in the file isn't necessarily the number of bytes that will be read.

But if you switch to a binary stream, per 7.19.9.2 of the C Standard:

A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.

and

Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state.



标签: c malloc