I was recently in a C++ technical interview, where I was given a bit of simple string manipulation code, which is intended to take a string and return a string that is comprised of the first and last n-characters, and then proceed to correct any bugs and to also make the function as efficient as possible, I came up with the solution below, however the interviewer claimed there was an even faster more optimal way:
Original code:
std::string first_last_n(int n, std::string s)
{
std::string first_n = s.substr(0,n);
std::string last_n = s.substr(s.size()-n-1,n);
return first_n + last_n;
}
My code:
bool first_last_n(const std::size_t& n, const std::string& s, std::string& r)
{
if (s.size() < n)
return false;
r.reserve(2 * n);
r.resize(0);
r.append(s.data(),s.data() + n);
r.append(s.data() + (s.size() - n), s.data() + s.size());
return true;
}
Summary of my changes:
Changed the interface to take a return string as reference (assuming RVO and rvalues are not available yet)
Removed temporary strings being constructed via substr
Passed input string as a const reference inorder to get around temporary instantiation of input
Fixed off-by-1 error in last_n string
Reduced the number of times each character is touched down to once, or twice (in the event of an overlapping scenario)
Placed a check in the event the string s's size is less than n, returning false for failure.
Assuming only native C++ is allowed, is there some other way of doing the above more efficiently or optimally?
Note 1: The original input string instance is not to be modified.
Note 2: All solutions must pass the following test case, otherwise they are not valid.
void test()
{
{
std::string s = "0123456789";
std::string r = first_last_n(10,s);
assert(r == "01234567890123456789");
}
{
std::string s = "0123456789ABC0123456789";
std::string r = first_last_n(10,s);
assert(r == "01234567890123456789");
}
{
std::string s = "1234321";
std::string r = first_last_n(5,s);
assert(r == "1234334321");
}
}
If you must pass the tests then you're going to have to write inefficient code, because you must return a copy of a string. This implies you must use dynamic allocation, possibly multiple times because of the copy.
So change the tests and change the signature.
Then call it like so:
This allows you to use dynamic programming, and it eliminates the need of dynamic allocation in the function.
I like my algorithms minimalistic, and I purposely eliminated the check for
n
being smaller or equal to the size of the string. I replace the check with a pre-condition for the function. Preconditions are faster than checks: they have zero overhead.How about removing the middle N-2n characters, where N is the length of the source string?
My only thought is that if this function is only called with C null-terminated strings, you might be requiring the extra construction of a std::string for parameter 's'.
Possibly the 'more' efficient method would be to allow either a std::string or const char *s passed in.
This implementation should be fast:
It passes all three unit tests.
When using GNU libstdc++, the line that declares & initializes
ret
is extremely fast because libstdc++ uses a global "empty string" variable. Thus, it's simply a pointer copy. Calls tobegin
andend
ons
are also fast because they will resolve to the const versions ofbegin
andend
,begin() const
andend() const
, so the internal representation ofs
is not "leaked". With libstdc++,std::string::const_iterator
isconst char*
, which is a pointer type and random access iterator. Thus, whenstd::string::append<const char*>(const char*, const char*)
callsstd::distance
to obtain the length of the input range, it is a pointer difference operation. Also,std::string::append<const char*>(const char*, const char*)
results in something like amemmove
. Finally, thereserve
operation ensures that enough memory is available for the return value.EDIT: For the curious, here is the initialization of
ret
in the assembly output of MinGW g++ 4.5.0:It's simply copying the pointer to the global "empty representation".
EDIT2: Okay. I have now tested four variants with g++ 4.5.0 and Visual C++ 16.00.30319.01:
Variant 1 (the "c_str variant"):
Variant 2 (the "data string" variant):
Variant 3:
Variant 4 (my original code):
The results for g++ 4.5.0 are:
The results for VC++ 16.00.30319.01 are:
Unsurprisingly, the variant that is fastest depends on the compiler. However, not knowing which compiler will be used I think that my variant is best because it is a familiar style of C++, it is the fastest when using g++, and it is not that much slower than variants 1 or 2 when using VC++.
One thing interesting from the VC++ results is that using
c_str
rather thandata
is faster. Perhaps that is why your interviewer said that there is a faster way than your implementation.EDIT3:
Actually, I just thought about another variant:
Variant 5:
It's just like variant 4 except that the begin and end iterators for
s
are saved.When variant 5 is tested, it actually beats out variant 2 (the data string variant) when using VC++:
If you don't need to maintain the contents of the original string, then you can copy the last n characters into positions
[n+1, 2n]
of the original string and truncate it at2n
. You will have to be careful to first expand the string and also be careful not to overwrite any characters before writing to them if the string is shorter than2n
.This will halve the number of operations to construct the string, as well as remove the need to create a new string. So its theoretically between 2 and 4 times faster. But of course you have just destroyed the original string, which you'd have to ask the interviewer if it is acceptable.
Memcpy is a cheat?
EDIT So I removed the other one. This is not a cheat.