Why, or when, do you need to dynamically allocate

2020-01-27 03:20发布

问题:

Dynamic memory allocation is a very important topic in C programming. However, I've been unable to find a good explanation of what this enables us to do, or why it is required.

Can't we just declare variables and structs and never have to use malloc()?

As a side note, what is the difference between:

ptr_one = (int *)malloc(sizeof(int));

and

int *ptr_one = malloc(sizeof(int));

回答1:

You need to use dynamic memory when:

  • You cannot determine the maximum amount of memory to use at compile time;
  • You want to allocate a very large object;
  • You want to build data structures (containers) without a fixed upper size;

You don't always know how much memory you will need to set aside at compile time. Imagine processing a data file (a time series of temperatures, say), where the number of records in the file isn't fixed. You could have as few as 10 records or as many as 100000. If you want to read all that data into memory to process it, you won't know how much memory to allocate until you read the file. If the file is structured so that the very first value is the number of records, you could do something like this:

size_t recs = 0;
double *temps = NULL;

FILE *fp = fopen ( filename, "r" );
if ( fp )
{
  if ( fscanf( fp, "%zu", &recs ) == 1 )
  {
    temps = malloc( sizeof *temps * recs );
    if ( temps )
    {
      // read contents of file into temps
    }
  }
}

Sometimes you need to allocate a very large object, something like

int ginormous[1000][1000][1000];

Assuming a 4-byte integer, this array will require 4GB. Unfortunately, stack frames (where local variables are kept on most architectures) tend to be much smaller than that, so trying to allocate that much memory may lead to a run-time error (and typically does). The dynamic memory pool (a.k.a. the heap) is typically much larger than the stack, much less any one stack frame. so for something that obnoxious you'd need to write something like

int (*ginormous)[1000][1000] = malloc( sizeof *ginormous * 1000 );

It's still possible for a request like that to fail; if your heap is fragemented enough, you may not have a single contiguous block large enough to hande the request. If necessary, you could do a piecemeal allocation; rows won't necessarily be adjacent in memory, but it's more likely you'll be able to grab all the memory you need:

int ***ginormous = malloc( sizeof *ginormous * 1000 );
if ( ginormous )
{
  for ( size_t i = 0; i < 1000; i++ )
  {
    ginormous[i] = malloc( sizeof *ginormous[i] * 1000 );
    if ( ginormous[i] )
    {
      ginormous[i][j] = malloc ( sizeof *ginormous[i][j] * 1000 );
      if ( ginormous[i][j] )
      {
        // initialize ginormous[i][j][k]
      }
    }
  }
}

And finally, dynamic memory allows you to build containers that can grow and shrink as you add or remove data, such as lists, trees, queues, etc. You could even build your own real "string" data type that can grow as you append characters to it (similar to the string type in C++).



回答2:

Dynamic allocation is required when you don't know the worst case requirements for memory. Then, it is impossible to statically allocate the necessary memory, because you don't know how much you will need.

Even if you know the worst case requirements, it may still be desirable to use dynamic memory allocation. It allows for the system memory to be used more efficiently by multiple processes. All processes could statically commit their worst case memory requirements, but that puts a cap on how many running processes can exist on the system. If it is never the case that all processes use the worst case at the same time, then the system memory constantly runs underutilized, which is a waste of resources.

As to your side question, you should not cast the result of a call to malloc() in C. It can hide the bug of a missing declaration (implicit declarations were permitted prior to C.99), and results in undefined behavior. Always prefer taking the result of malloc() without a cast. malloc() is declared to return void *, and in C, a conversion between void * and another pointer type is always permitted (modulo type qualifiers like const).



回答3:

As a side note, what is the difference between: ptr_one = (int *)malloc(sizeof(int)) and int *ptr_one = malloc(sizeof(int))

See this.

Firstly, I know this is likely a ridiculous question, as dynamic memory allocation is a very important topic in C programming. However, I've been unable to find a good explanation of what this enables us to do, or why it is required.

The memory pool (or more commonly the heap) is very large in comparison to the stack. Consider these two examples for why it's useful to use the memory pool over the stack:

1. What if you defined an array and wanted it to persist amongst multiple stack frames? Sure, you could declare it as a global variable and it will be stored in the global data section of memory, however this will get cluttered as your program gets larger and larger. Alternatively, you could store it on the memory pool.

int *func( int k ) {
  assert( k >= 1 );

  int *ptr_block = malloc( sizeof( int ) * k );

  if ( ptr_block == NULL ) exit( EXIT_FAILURE );

  for ( int i = 0; i < k; i++ ) {
    ptr_block[ i ] = i + 1;
  }

  return ptr_block; // Valid.
}

... this would however not work if you defined your array on the stack. Reason being, once a stack frame is popped all the memory addresses can be used by another stack frame (and hence overwritten), whereas using memory from the memory pool will persist until freed by the user (you, or client).

2. What if you wanted to implement a dynamic array to handle reading an arbitrary large sequence of numbers? You would not be able to do this defining your array on the stack, you would need to use the memory pool. Recall that it's extremely common (and highly recommended unless you explicitly need to copy a struct) to be passing a pointer to a struct, never the struct itself (as they can be rather large). Consider this small implementation of a dynamic array:

struct dyn_array {
  int *arr;
  int len;
  int cap;
};

typedef struct dyn_array *DynArray;

void insert_item( int const item, DynArray dyn_arr ) {
  // Checks pre conditions.
  assert( dyn_arr != NULL );

  // Checks if the capacity is equal to the length. If so, double.
  if ( dyn_arr->cap == dyn_arr->len ) {
    dyn_arr->cap *= 2;

    DynArray new_dyn_arr = malloc( sizeof( int ) * dyn_arr->cap ); // [oo]

    // ... copy, switch pointers and free...
  }

  // ... insert, increase length, etc.
}

... on line [oo] notice that if this were defined on the stack then once this stack frame is popped all the memory addresses for the array would no longer be allocated. Meaning, another stack frame (likely the next one) will be using those memory addresses (or some subset of it).

Remark: From my snippet of code, ptr_block is stored on the stack: hence &ptr_block is a stack address, however the value of ptr_block is somewhere from the memory pool.