Caching a const char * as a return type

2020-07-14 09:30发布

问题:

Was reading up a bit on my C++, and found this article about RTTI (Runtime Type Identification): http://msdn.microsoft.com/en-us/library/70ky2y6k(VS.80).aspx . Well, that's another subject :) - However, I stumbled upon a weird saying in the type_info-class, namely about the ::name-method. It says: "The type_info::name member function returns a const char* to a null-terminated string representing the human-readable name of the type. The memory pointed to is cached and should never be directly deallocated."

How can you implement something like this yourself!? I've been struggling quite a bit with this exact problem often before, as I don't want to make a new char-array for the caller to delete, so I've stuck to std::string thus far.

So, for the sake of simplicity, let's say I want to make a method that returns "Hello World!", let's call it

const char *getHelloString() const;

Personally, I would make it somehow like this (Pseudo):

const char *getHelloString() const
{
  char *returnVal = new char[13];
  strcpy("HelloWorld!", returnVal);

  return returnVal
}

.. But this would mean that the caller should do a delete[] on my return pointer :(

Thx in advance

回答1:

How about this:

const char *getHelloString() const
{
    return "HelloWorld!";
}

Returning a literal directly means the space for the string is allocated in static storage by the compiler and will be available throughout the duration of the program.



回答2:

I like all the answers about how the string could be statically allocated, but that's not necessarily true for all implementations, particularly the one whose documentation the original poster linked to. In this case, it appears that the decorated type name is stored statically in order to save space, and the undecorated type name is computed on demand and cached in a linked list.

If you're curious about how the Visual C++ type_info::name() implementation allocates and caches its memory, it's not hard to find out. First, create a tiny test program:

#include <cstdio>
#include <typeinfo>
#include <vector>    
int main(int argc, char* argv[]) {
    std::vector<int> v;
    const type_info& ti = typeid(v);
    const char* n = ti.name();
    printf("%s\n", n);
    return 0;
}

Build it and run it under a debugger (I used WinDbg) and look at the pointer returned by type_info::name(). Does it point to a global structure? If so, WinDbg's ln command will tell the name of the closest symbol:

0:000> ?? n
char * 0x00000000`00857290
 "class std::vector<int,class std::allocator<int> >"
0:000> ln 0x00000000`00857290
0:000>

ln didn't print anything, which indicates that the string wasn't in the range of addresses owned by any specific module. It would be in that range if it was in the data or read-only data segment. Let's see if it was allocated on the heap, by searching all heaps for the address returned by type_info::name():

0:000> !heap -x 0x00000000`00857290
Entry             User              Heap              Segment               Size  PrevSize  Unused    Flags
-------------------------------------------------------------------------------------------------------------
0000000000857280  0000000000857290  0000000000850000  0000000000850000        70        40        3e  busy extra fill 

Yes, it was allocated on the heap. Putting a breakpoint at the start of malloc() and restarting the program confirms it.

Looking at the declaration in <typeinfo> gives a clue about where the heap pointers are getting cached:

struct __type_info_node {
    void *memPtr;
    __type_info_node* next;
};

extern __type_info_node __type_info_root_node;
...
_CRTIMP_PURE const char* __CLR_OR_THIS_CALL name(__type_info_node* __ptype_info_node = &__type_info_root_node) const;

If you find the address of __type_info_root_node and walk down the list in the debugger, you quickly find a node containing the same address that was returned by type_info::name(). The list seems to be related to the caching scheme.

The MSDN page linked in the original question seems to fill in the blanks: the name is stored in its decorated form to save space, and this form is accessible via type_info::raw_name(). When you call type_info::name() for the first time on a given type, it undecorates the name, stores it in a heap-allocated buffer, caches the buffer pointer, and returns it.

The linked list may also be used to deallocate the cached strings during program exit (however, I didn't verify whether that is the case). This would ensure that they don't show up as memory leaks when you run a memory debugging tool.



回答3:

Well gee, if we are talking about just a function, that you always want to return the same value. it's quite simple.

const char * foo() 
{
   static char[] return_val= "HelloWorld!";
   return return_val;
}

The tricky bit is when you start doing things where you are caching the result, and then you have to consider Threading,or when your cache gets invalidated, and trying to store thing in thread local storage. But if it's just a one off output that is immediate copied, this should do the trick.
Alternately if you don't have a fixed size you have to do something where you have to either use a static buffer of arbitrary size.. in which you might eventually have something too large, or turn to a managed class say std::string.

const char * foo() 
{
   static std::string output;
   DoCalculation(output);
   return output.c_str();
}

also the function signature

const char *getHelloString() const;

is only applicable for member functions. At which point you don't need to deal with static function local variables and could just use a member variable.



回答4:

I think that since they know that there are a finite number of these, they just keep them around forever. It might be appropriate for you to do that in some instances, but as a general rule, std::string is going to be better.

They can also look up new calls to see if they made that string already and return the same pointer. Again, depending on what you are doing, this may be useful for you too.



回答5:

Be careful when implementing a function that allocates a chunk of memory and then expects the caller to deallocate it, as you do in the OP:

const char *getHelloString() const
{
  char *returnVal = new char[13];
  strcpy("HelloWorld!", returnVal);

  return returnVal
}

By doing this you are transferring ownership of the memory to the caller. If you call this code from some other function:

int main()
{
  char * str = getHelloString();
  delete str;
  return 0;
}

...the semantics of transferring ownership of the memory is not clear, creating a situation where bugs and memory leaks are more likely.

Also, at least under Windows, if the two functions are in 2 different modules you could potentially corrupt the heap. In particular, if main() is in hello.exe, compiled in VC9, and getHelloString() is in utility.dll, compiled in VC6, you'll corrupt the heap when you delete the memory. This is because VC6 and VC9 both use their own heap, and they aren't the same heap, so you are allocating from one heap and deallocating from another.



回答6:

Why does the return type need to be const? Don't think of the method as a get method, think of it as a create method. I've seen plenty of API that requires you to delete something a creation operator/method returns. Just make sure you note that in the documentation.

/* create a hello string
 * must be deleted after use
 */
char *createHelloString() const
{
  char *returnVal = new char[13];
  strcpy("HelloWorld!", returnVal);

  return returnVal
}


回答7:

What I've often done when I need this sort of functionality is to have a char * pointer in the class - initialized to null - and allocate when required.

viz:

class CacheNameString
{
    private: 
        char *name;
    public:
        CacheNameString():name(NULL)  { }

    const char *make_name(const char *v)
    {
        if (name != NULL)
            free(name);

        name = strdup(v);

        return name;
    }

};


回答8:

Something like this would do:

const char *myfunction() {
    static char *str = NULL; /* this only happens once */
    delete [] str; /* delete previous cached version */
    str = new char[strlen("whatever") + 1]; /* allocate space for the string and it's NUL terminator */
    strcpy(str, "whatever");
    return str;
}

EDIT: Something that occurred to me is that a good replacement for this could be returning a boost::shared_pointer instead. That way the caller can hold onto it as long as they want and they don't have to worry about explicitly deleting it. A fair compromise IMO.



回答9:

The advice given that warns about the lifetime of the returned string is sound advise. You should always be careful about recognising your responsibilities when it comes to managing the lifetime of returned pointers. The practise is quite safe, however, provided the variable pointed to will outlast the call to the function that returned it. Consider, for instance, the pointer to const char returned by c_str() as a method of class std::string. This is returning a pointer to the memory managed by the string object which is guaranteed to be valid as long as the string object is not deleted or made to reallocate its internal memory.

In the case of the std::type_info class, it is a part of the C++ standard as its namespace implies. The memory returned from name() is actually pointed to static memory created by the compiler and linker when the class was compiled and is a part of the run time type identification (RTTI) system. Because it refers to a symbol in code space, you should not attempt to delete it.



回答10:

I think something like this can only be implemented "cleanly" using objects and the RAII idiom. When the objects destructor is called (obj goes out of scope), we can safely assume that the const char* pointers arent be used anymore.

example code:

class ICanReturnConstChars
{
    std::stack<char*> cached_strings
    public:
    const char* yeahGiveItToMe(){
        char* newmem = new char[something];
        //write something to newmem
        cached_strings.push_back(newmem);
        return newmem;
    }
    ~ICanReturnConstChars(){
        while(!cached_strings.empty()){
            delete [] cached_strings.back()
            cached_strings.pop_back()
        }
    }
};

The only other possibility i know of is to pass a smart_ptr ..



回答11:

It's probably done using a static buffer:

const char* GetHelloString()
{
    static char buffer[256] = { 0 };
    strcpy( buffer, "Hello World!" );
    return buffer;
}

This buffer is like a global variable that is accessible only from this function.



回答12:

You can't rely on GC; this is C++. That means you must keep the memory available until the program terminates. You simply don't know when it becomes safe to delete[] it. So, if you want to construct and return a const char*, simple new[] it and return it. Accept the unavoidable leak.