In C++, why overload a function on a `const char a

2019-07-14 20:41发布

问题:

I recently ran into a fascinating class in the ENTT library. This class is used to calculate hashes for strings like so:

std::uint32_t hashVal = hashed_string::to_value("ABC");

hashed_string hs{"ABC"};
std::uint32_t hashVal2 = hs.value();

While looking at the implementation of this class I noticed that the none of the constructors or hashed_string::to_value member functions take a const char* directly. Instead, they take a simple struct called const_wrapper. Below is a simplified view of the class' implementation to illustrate this:

/*
   A hashed string is a compile-time tool that allows users to use
   human-readable identifers in the codebase while using their numeric
   counterparts at runtime
*/
class hashed_string
{
private:

    struct const_wrapper
    {
        // non-explicit constructor on purpose
        constexpr const_wrapper(const char *curr) noexcept: str{curr} {}
        const char *str;
    };

    inline static constexpr std::uint32_t calculateHash(const char* curr) noexcept
    {
        // ...
    }

public:

    /*
       Returns directly the numeric representation of a string.
       Forcing template resolution avoids implicit conversions. An
       human-readable identifier can be anything but a plain, old bunch of
       characters.
       Example of use:
       const auto value = hashed_string::to_value("my.png");
    */
    template<std::size_t N>
    inline static constexpr std::uint32_t to_value(const char (&str)[N]) noexcept
    {
        return calculateHash(str);
    }

    /*
       Returns directly the numeric representation of a string.
       wrapper parameter helps achieving the purpose by relying on overloading.
    */
    inline static std::uint32_t to_value(const_wrapper wrapper) noexcept
    {
        return calculateHash(wrapper.str);
    }

    /*
       Constructs a hashed string from an array of const chars.
       Forcing template resolution avoids implicit conversions. An
       human-readable identifier can be anything but a plain, old bunch of
       characters.
       Example of use:
       hashed_string hs{"my.png"};
    */
    template<std::size_t N>
    constexpr hashed_string(const char (&curr)[N]) noexcept
        : str{curr}, hash{calculateHash(curr)}
    {}

    /*
       Explicit constructor on purpose to avoid constructing a hashed
       string directly from a `const char *`.
       wrapper parameter helps achieving the purpose by relying on overloading.
    */
    explicit constexpr hashed_string(const_wrapper wrapper) noexcept
        : str{wrapper.str}, hash{calculateHash(wrapper.str)}
    {}

    //...

private:
    const char *str;
    std::uint32_t hash;
};

Unfortunately I fail to see the purpose of the const_wrapper struct. Does it have something to do with the comment at the top, which states "A hashed string is a compile-time tool..."?

I am also unsure about what the comments that appears above the template functions mean, which state "Forcing template resolution avoids implicit conversions." Is anyone able to explain this?

Finally, it is interesting to note how this class is used by another class that maintains an std::unordered_map of the following type: std::unordered_map<hashed_string, Resource>

This other class offers a member function to add resources to the map using strings like keys. A simplified view of its implementation looks like this:

bool addResource(hashed_string id, Resource res)
{
    // ...
    resourceMap[id] = res;
    // ...
}

My question here is: what is the advantage of using hashed_strings as the keys to our map instead of std::strings? Is it more efficient to work with numeric types like hashed_strings?

Thank you for any information. Studying this class has helped me learn so much.

回答1:

The author is trying to help you avoid accidental performance problems that happen when you repeatedly hash strings. Since hashing strings is expensive, you probably want to do it once and cache it somewhere. If they have an implicit constructor, you could hash the same string repeatedly without knowing or intending to do so.

So the library provides implicit construction for string literals, which can be computed at compile-time via constexpr but explicit construction for const char* in general since those can't generally be done at compile-time and you want to avoid doing it repeatedly or accidentally.

Consider:

void consume( hashed_string );

int main()
{
    const char* const s = "abc";
    const auto hs1 = hashed_string{"my.png"}; // Ok - explicit, compile-time hashing
    const auto hs2 = hashed_string{s};        // Ok - explicit, runtime hashing

    consume( hs1 ); // Ok - cached value - no hashing required
    consume( hs2 ); // Ok - cached value - no hashing required

    consume( "my.png" ); // Ok - implicit, compile-time hashing
    consume( s );        // Error! Implicit, runtime hashing disallowed!
                         // Potential hidden inefficiency, so library disallows it.
}

If I remove the last line, you can see how the compiler applies the implicit conversions for you at C++ Insights:

consume(hashed_string(hs1));
consume(hashed_string(hs2));
consume(hashed_string("my.png"));

But it's refusing to do so for the line consume(s) because of the implict/explicit constructors.

Note, however, this attempt at protecting the user isn't foolproof. If you declare your string as an array rather than as a pointer, you can accidentally re-hash:

const char s[100] = "abc";
consume( s );  // Compiles BUT it's doing implicit, runtime hashing. Doh.

// Decay 's' back to a pointer, and the library's guardrails return
const auto consume_decayed = []( const char* str ) { consume( str ); }
consume_decayed( s ); // Error! Implicit, runtime hashing disallowed!

This case is less common, and such arrays typically get decayed into pointers as they are passed to other functions, which would then behave as above. The library could conceivably enforce compile-time hashing for string literals with if constexpr and the like and forbid it for non-literal arrays like s above. (There's your pull request to give back to the library!) [See comments.]

To answer your final question: The reasons for doing this are to have faster performance for hash-based containers like std::unordered_map. It minimizes the number of hashes you have to do by computing the hash once and caching it inside the hashed_string. Now, a key lookup in the map just has to compare the pre-computed hash values of the keys and the lookup string.