I recently ran into a fascinating class in the ENTT library. This class is used to calculate hashes for strings like so:
std::uint32_t hashVal = hashed_string::to_value("ABC");
hashed_string hs{"ABC"};
std::uint32_t hashVal2 = hs.value();
While looking at the implementation of this class I noticed that the none of the constructors or hashed_string::to_value
member functions take a const char*
directly. Instead, they take a simple struct called const_wrapper
. Below is a simplified view of the class' implementation to illustrate this:
/*
A hashed string is a compile-time tool that allows users to use
human-readable identifers in the codebase while using their numeric
counterparts at runtime
*/
class hashed_string
{
private:
struct const_wrapper
{
// non-explicit constructor on purpose
constexpr const_wrapper(const char *curr) noexcept: str{curr} {}
const char *str;
};
inline static constexpr std::uint32_t calculateHash(const char* curr) noexcept
{
// ...
}
public:
/*
Returns directly the numeric representation of a string.
Forcing template resolution avoids implicit conversions. An
human-readable identifier can be anything but a plain, old bunch of
characters.
Example of use:
const auto value = hashed_string::to_value("my.png");
*/
template<std::size_t N>
inline static constexpr std::uint32_t to_value(const char (&str)[N]) noexcept
{
return calculateHash(str);
}
/*
Returns directly the numeric representation of a string.
wrapper parameter helps achieving the purpose by relying on overloading.
*/
inline static std::uint32_t to_value(const_wrapper wrapper) noexcept
{
return calculateHash(wrapper.str);
}
/*
Constructs a hashed string from an array of const chars.
Forcing template resolution avoids implicit conversions. An
human-readable identifier can be anything but a plain, old bunch of
characters.
Example of use:
hashed_string hs{"my.png"};
*/
template<std::size_t N>
constexpr hashed_string(const char (&curr)[N]) noexcept
: str{curr}, hash{calculateHash(curr)}
{}
/*
Explicit constructor on purpose to avoid constructing a hashed
string directly from a `const char *`.
wrapper parameter helps achieving the purpose by relying on overloading.
*/
explicit constexpr hashed_string(const_wrapper wrapper) noexcept
: str{wrapper.str}, hash{calculateHash(wrapper.str)}
{}
//...
private:
const char *str;
std::uint32_t hash;
};
Unfortunately I fail to see the purpose of the const_wrapper
struct. Does it have something to do with the comment at the top, which states "A hashed string is a compile-time tool..."?
I am also unsure about what the comments that appears above the template functions mean, which state "Forcing template resolution avoids implicit conversions." Is anyone able to explain this?
Finally, it is interesting to note how this class is used by another class that maintains an std::unordered_map
of the following type: std::unordered_map<hashed_string, Resource>
This other class offers a member function to add resources to the map using strings like keys. A simplified view of its implementation looks like this:
bool addResource(hashed_string id, Resource res)
{
// ...
resourceMap[id] = res;
// ...
}
My question here is: what is the advantage of using hashed_strings as the keys to our map instead of std::strings? Is it more efficient to work with numeric types like hashed_strings?
Thank you for any information. Studying this class has helped me learn so much.
The author is trying to help you avoid accidental performance problems that happen when you repeatedly hash strings. Since hashing strings is expensive, you probably want to do it once and cache it somewhere. If they have an implicit constructor, you could hash the same string repeatedly without knowing or intending to do so.
So the library provides implicit construction for string literals, which can be computed at compile-time via
constexpr
but explicit construction forconst char*
in general since those can't generally be done at compile-time and you want to avoid doing it repeatedly or accidentally.Consider:
If I remove the last line, you can see how the compiler applies the implicit conversions for you at C++ Insights:
But it's refusing to do so for the line
consume(s)
because of the implict/explicit constructors.Note, however, this attempt at protecting the user isn't foolproof. If you declare your string as an array rather than as a pointer, you can accidentally re-hash:
This case is less common, and such arrays typically get decayed into pointers as they are passed to other functions, which would then behave as above.
The library could conceivably enforce compile-time hashing for string literals with[See comments.]if constexpr
and the like and forbid it for non-literal arrays likes
above. (There's your pull request to give back to the library!)To answer your final question: The reasons for doing this are to have faster performance for hash-based containers like
std::unordered_map
. It minimizes the number of hashes you have to do by computing the hash once and caching it inside thehashed_string
. Now, a key lookup in the map just has to compare the pre-computed hash values of the keys and the lookup string.