I weren't able to find that question, and it's an actual problem I'm facing.
I have a file loading utility that returns std::vector<unsigned char>
containing whole file contents.
However, the processing function requires contiguos array of char
(and that cannot be changed - it's a library function). Since the class that's using the processing function stores a copy of the data anyway, I want to store it as vector<char>
. Here's the code that might be a bit more illustrative.
std::vector<unsigned char> LoadFile (std::string const& path);
class Processor {
std::vector<char> cache;
void _dataOperation(std::vector<char> const& data);
public:
void Process() {
if (cache.empty())
// here's the problem!
cache = LoadFile("file.txt");
_dataOperation(cache);
}
};
This code doesn't compile, because (obviously) there's no appropriate conversion. We can be sure, however, that the temporary vector will ocupy the same amount of memory (IOW sizeof(char) == sizeof(unsigned char)
)
The naive solution would be to iterate over the contents of a temporary and cast every character. I know that in normal case, the operator= (T&&)
would be called.
In my situation it's safe to do reinterpreting conversion, because I am sure I am going to read ASCII characters only. Any other character would be caught in _dataOperation
anyway.
So, my question is : how to properly and safely convert the temporary vector in a way that involves no copying?
If it isn't possible, I would prefer the safe way of copying rather than unsafe noncopying. I could also change LoadFile
to return either vector<char>
or vector<unsigned char>
.
In C++11, [basic.lval]p10 says,
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
- ...
- a char or unsigned char type.
(the exact location may be different in other versions of C++, but the meaning is the same.)
That means that you can take a vector<unsigned char> cache
and access its contents using the range [reinterpret_cast<char*>(cache.data()), reinterpret_cast<char*>(cache.data()) + cache.size())
. (@Kerrek SB mentioned this.)
If you store a vector<unsigned char>
in Processor
to match the return type of LoadFile
, and _dataOperation()
actually takes an array of char
(meaning a const char*
and a size), then you can cast when you're passing the argument to _dataOperation()
However, if _dataOperation()
takes a vector<char>
specifically and you store a vector<unsigned char> cache
, then you cannot pass it reinterpret_cast<vector<char>&>(cache)
. (i.e. @André Puel is totally wrong. Do not listen to him.) That violates the aliasing rules, and the compiler will attempt to anger your customers at 2am. (And if this version of your compiler doesn't manage it, the next version will keep trying.)
One option is, as you mentioned, to template LoadFile()
and have it return (or fill in) a vector of the type you want. Another is to copy the result, for which the concise version is again the reinterpret_cast
of the source vector's .data()
. [basic.fundamental]p1 mentions that "For character types, all bits of the object representation participate in the value representation.", meaning that you're not going to lose data with that reinterpret_cast
. I don't see a firm guarantee that no bit pattern of an unsigned char
can cause a trap if reinterpret_cast'ed
to char
, but I don't know of any modern hardware or compilers that do it.