How to distinguish between strings in heap or lite

2020-08-09 08:37发布

问题:

I have a use case where I can get pointers of strings allocated either in memory or literals. Now the latter can't be freed so that's a problem if I pass the wrong one. Is there a way to know which one is allocated and which not?

char *b = "dont free me!";
if(!IS_LITERAL(b)) {
    free(b);
}

I imagine something like that.

My example:

Scenario 1: literal

char *b = "dont free me!";
scruct elem* my_element = mylib_create_element(b);
// do smth
int result = mylib_destroy_element(my_element); // free literal, very bad

Scenario 2: in heap

char *b = malloc(sizeof(char)*17); // example
strncpy(b, "you can free me!",17);

scruct elem* my_element = mylib_create_element(b);
// do smth
int result = mylib_destroy_element(my_element); // free heap, nice

How the user calls mylib_create_element(b); is not under my control. If he frees before mylib_destroy_element it can crash. So it has got to be mylib_destroy_element that cleans up.

回答1:

I've had a similar case recently. Here's what I did:

If you're making an API that accepts a string pointer and then uses it to create an object (mylib_create_element), a good idea would be to copy the string to a separate heap buffer and then free it at your discretion. This way, the user is responsible for freeing the string he used in the call to your API, which makes sense. It's his string, after all.

Note that this won't work if your API depends on the user changing the string after creating the object!



回答2:

On most Unixes, there are values 'etext' and 'edata'. If your pointer is between 'etext' and 'edata', then it shall be statically initialized. Those values are not mentioned in any standard, so the usage is non portable and at your own risk.

Example:

#include<stdio.h>
#include<stdlib.h>

extern char edata;
extern char etext;

#define IS_LITERAL(b) ((b) >= &etext && (b) < &edata)

int main() {
    char *p1 = "static";
    char *p2 = malloc(10);
    printf("%d, %d\n", IS_LITERAL(p1), IS_LITERAL(p2));
}


回答3:

You can only ask user to explicitly mark their input as literal or allocated string.

However, as @Mints97 mentions in his answer, basically this approach is architecturally incorrect: you force user of your library for some explicit actions, and if he forgets to, it leads most likely to a memory leak (or even to an application crash). So use it only if:

  1. You want to drastically reduce amount of allocations. In my case it was JSON node names, that never change during lifetime of a program.
  2. You have good control of code of consumers of your library. In my case, libraries are shipped with binaries and tightly bound to them.

Implementation example

#define AAS_DYNAMIC             'D'
#define AAS_STATIC              'S'

#define AAS_STATIC_PREFIX       "S"
#define AAS_CONST_STR(str)      ((AAS_STATIC_PREFIX str) + 1)

char* aas_allocate(size_t count) {
    char* buffer = malloc(count + 2);
    if(buffer == NULL)
        return NULL;

    *buffer = AAS_DYNAMIC;

    return buffer + 1;
}

void aas_free(char* aas) {
    if(aas != NULL) {
        if(*(aas - 1) == AAS_DYNAMIC) {
            free(aas - 1);
        }
    }
}

...

char* s1 = AAS_CONST_STR("test1");
char* s2 = aas_allocate(10);

strcpy(s2, "test2");

aas_free(s1);
aas_free(s2);

Testing performance (note #1)

I benchmarked my libtsjson library with following code (800k iterations):

    node = json_new_node(NULL);
    json_add_integer(node, NODE_NAME("i"), 10);
    json_add_string(node, NODE_NAME("s1"), json_str_create("test1"));
    json_add_string(node, NODE_NAME("s2"), json_str_create("test2"));
    json_node_destroy(node);

My CPU is Intel Core i7 860. If NODE_NAME is just a macro, time per iteration was 479ns If NODE_NAME is a memory allocation, time per iteration was 609ns

Hinting user or compiler (note #2)

  • Add a hint to all such pointers, i.e. Linux static source analyser Sparse may catch such issues

    char __autostring* s1 = aas_copy("test"); /* OK */
    char __autostring* s2 = strdup("test");   /* Should be fail? */
    char* s3 = s1;                            /* Confuses sparse */
    char* s4 = (char*) s1;                    /* Explicit conversion - OK */
    

(not completely sure about outputs of Sparse)

  • Use simple typedef to make compiler raise a warning when you do something wrong:

    #ifdef AAS_STRICT
    typedef struct { char a; } *aas_t;
    #else
    typedef char *aas_t;
    #endif
    

This approach is one more step to a world of a dirty C hacks, i.e. sizeof(*aas_t) is now > 1.

Full source with changes may be found here. If compiled with -DAAS_STRICT it will raise tons of errors: https://ideone.com/xxmSat Even for correct code it can complain about strcpy() (not reproduced on ideone).



回答4:

The simple answer is you cannot do this since C language does not demarcate stack, heap and data section.

If you wanted to have a guess - you could collect address of the first variable on the stack, address of the calling function and address of a byte of memory allocated to heap; and then compare it with your pointer - a very bad practice with no guarantees.

It's best for you to revamp your code such a way that you don't come across this issue.



回答5:

Here is a practical way:

Although the C-language standard does not dictate this, for all identical occurrences of a given literal string in your code, the compiler generates a single copy within the RO-data section of the executable image.

In other words, every occurrence of the literal string "dont free me!" in your code is translated into the same memory address.

So at the point where you want to deallocate that string, you can simply compare its address with the address of the literal string "dont free me!":

if (b != "dont free me!") // address comparison
    free(b);

To emphasize this again, it is not imposed by the C-language standard, but it is practically implemented by any decent compiler of the language.


The above is merely a practical trick referring directly to the question at hand (rather than to the motivation behind this question).

Strictly speaking, if you've reached a point in your implementation where you have to distinguish between a statically allocated string and a dynamically allocated string, then I would tend to guess that your initial design is flawed somewhere along the line.



回答6:

This is exactly why the rule is that only the piece of code or module that created a string may free it. In other words, every string or piece of data is "owned" by the code unit that created it. Only the owner can free it. A function should never free data structures that it received as arguments.



回答7:

You can do the following:

  typedef struct 
{
 int is_literal;
 char * array;
} elem;

Every time you you allocate the elem.array on the heap simply set the is_literal to 0. When you set the array to be literal, set the flag to a non-zero value, e.g.:

elem foo;
foo.array = "literal";
foo.is_literal = 1 ;

or

elem bar;
bar.array = (char*) (malloc(sizeof(char) * 10)) ;
bar.is_literal = 0;

Then at the client side:

if(!bar.is_literal) {
free(bar.array);
}

Simple as that.



回答8:

Back in the early days when a 80386 could have 8 megabytes of RAM maximum, and the ideas of making objects were being explained in every other magazine article, I did not like copying perfectly good literals into string objects (allocating and freeing the internal copy) and I asked Bjarne about that, since a crude string class was one of his examples of C++ fancy-stuff.

He said don't worry about it.

Is this having to do with literals vs other char* pointers? You can always own the memory. Imthink so, from your ideas of looking for different memory segments.

Or is it more generally that ownership may or may not be given, there is no way to tell, and need to store a flag: "hey, this is a heap object, but someone else is still using it and will take care of it later, OK?"

For the tractable case where it is "on the heap" or "not" (literals, globals, stack-based), you could have the free function know. If you supplied a matching set of allocate/maybe-free, it could be written to know what memory is under its control.



标签: c