C++ String Interview Question

I was recently in a C++ technical interview, where I was given a bit of simple string manipulation code, which is intended to take a string and return a string that is comprised of the first and last n-characters, and then proceed to correct any bugs and to also make the function as efficient as possible, I came up with the solution below, however the interviewer claimed there was an even faster more optimal way:

Original code:

std::string first_last_n(int n, std::string s)
{
   std::string first_n = s.substr(0,n);
   std::string last_n = s.substr(s.size()-n-1,n);
   return first_n + last_n;
}

My code:

bool first_last_n(const std::size_t& n, const std::string& s, std::string& r)
{
   if (s.size() < n)
      return false;
   r.reserve(2 * n);
   r.resize(0);
   r.append(s.data(),s.data() + n);
   r.append(s.data() + (s.size() - n), s.data() + s.size());
   return true;
}

Summary of my changes:

Changed the interface to take a return string as reference (assuming RVO and rvalues are not available yet)
Removed temporary strings being constructed via substr
Passed input string as a const reference inorder to get around temporary instantiation of input
Fixed off-by-1 error in last_n string
Reduced the number of times each character is touched down to once, or twice (in the event of an overlapping scenario)
Placed a check in the event the string s's size is less than n, returning false for failure.

Assuming only native C++ is allowed, is there some other way of doing the above more efficiently or optimally?

Note 1: The original input string instance is not to be modified.

Note 2: All solutions must pass the following test case, otherwise they are not valid.

void test()
{
   {
      std::string s = "0123456789";
      std::string r = first_last_n(10,s);
      assert(r == "01234567890123456789");
   }

   {
      std::string s = "0123456789ABC0123456789";
      std::string r = first_last_n(10,s);
      assert(r == "01234567890123456789");
   }

   {
      std::string s = "1234321";
      std::string r = first_last_n(5,s);
      assert(r == "1234334321");
   }

}

标签： c++ optimization string processing-efficiency

7条回答

欢心

2楼-- · 2019-03-25 02:12

If you must pass the tests then you're going to have to write inefficient code, because you must return a copy of a string. This implies you must use dynamic allocation, possibly multiple times because of the copy.

So change the tests and change the signature.

template<class Out>
Out first_last_n(const std::string::size_type& n, const std::string& s, Out r)
{
    r = copy_n(s.begin(), n, r);
    std::string::const_iterator pos(s.end());
    std::advance(pos, -n);
    return copy_n(pos, n, r);
}

Then call it like so:

std::string s("Hello world!");
char r[5];
r[4] = 0;
first_last_n(2, s, r);

This allows you to use dynamic programming, and it eliminates the need of dynamic allocation in the function.

I like my algorithms minimalistic, and I purposely eliminated the check for n being smaller or equal to the size of the string. I replace the check with a pre-condition for the function. Preconditions are faster than checks: they have zero overhead.

0人赞添加讨论(0) 举报

时光不老，我们不散

3楼-- · 2019-03-25 02:15

How about removing the middle N-2n characters, where N is the length of the source string?

0人赞添加讨论(0) 举报

男人必须洒脱

4楼-- · 2019-03-25 02:27

My only thought is that if this function is only called with C null-terminated strings, you might be requiring the extra construction of a std::string for parameter 's'.

Possibly the 'more' efficient method would be to allow either a std::string or const char *s passed in.

0人赞添加讨论(0) 举报

趁早两清

5楼-- · 2019-03-25 02:29

This implementation should be fast:

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
    n = std::min(n, s.size());
    std::string ret;
    ret.reserve(2*n);
    ret.append(s.begin(), s.begin() + n);
    ret.append(s.end() - n, s.end());
    return ret;
}

It passes all three unit tests.

When using GNU libstdc++, the line that declares & initializes ret is extremely fast because libstdc++ uses a global "empty string" variable. Thus, it's simply a pointer copy. Calls to begin and end on s are also fast because they will resolve to the const versions of begin and end, begin() const and end() const, so the internal representation of s is not "leaked". With libstdc++, std::string::const_iterator is const char*, which is a pointer type and random access iterator. Thus, when std::string::append<const char*>(const char*, const char*) calls std::distance to obtain the length of the input range, it is a pointer difference operation. Also, std::string::append<const char*>(const char*, const char*) results in something like a memmove. Finally, the reserve operation ensures that enough memory is available for the return value.

EDIT: For the curious, here is the initialization of ret in the assembly output of MinGW g++ 4.5.0:

    movl    $__ZNSs4_Rep20_S_empty_rep_storageE+12, (%ebx)

It's simply copying the pointer to the global "empty representation".

EDIT2: Okay. I have now tested four variants with g++ 4.5.0 and Visual C++ 16.00.30319.01:

Variant 1 (the "c_str variant"):

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret;
   ret.reserve(2*n);
   const char *s_cStr = s.c_str(), *s_cStr_end = s_cStr + s_size;
   ret.append(s_cStr, s_cStr + n);
   ret.append(s_cStr_end - n, s_cStr_end);
   return ret;
}

Variant 2 (the "data string" variant):

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret;
   ret.reserve(2*n);
   const char *s_data = s.data(), *s_data_end = s_data + s_size;
   ret.append(s_data, s_data + n);
   ret.append(s_data_end - n, s_data_end);
   return ret;
}

Variant 3:

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret(s);
   std::string::size_type d = s_size - n;
   return ret.replace(n, d, s, d, n);
}

Variant 4 (my original code):

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   n = std::min(n, s.size());
   std::string ret;
   ret.reserve(2*n);
   ret.append(s.begin(), s.begin() + n);
   ret.append(s.end() - n, s.end());
   return ret;
}

The results for g++ 4.5.0 are:

Variant 4 is the fastest
Variant 3 is second (5% slower than variant 4)
Variant 1 is third (2% slower than variant 3)
Variant 2 is fourth (0.2% slower than variant 1)

The results for VC++ 16.00.30319.01 are:

Variant 1 is the fastest
Variant 2 is second (3% slower than variant 1)
Variant 4 is third (4% slower than variant 2)
Variant 3 is fourth (17% slower than variant 4)

Unsurprisingly, the variant that is fastest depends on the compiler. However, not knowing which compiler will be used I think that my variant is best because it is a familiar style of C++, it is the fastest when using g++, and it is not that much slower than variants 1 or 2 when using VC++.

One thing interesting from the VC++ results is that using c_str rather than data is faster. Perhaps that is why your interviewer said that there is a faster way than your implementation.

EDIT3:

Actually, I just thought about another variant:

Variant 5:

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   n = std::min(n, s.size());
   std::string ret;
   ret.reserve(2*n);
   std::string::const_iterator s_begin = s.begin(), s_end = s.end();
   ret.append(s_begin, s_begin + n);
   ret.append(s_end - n, s_end);
   return ret;
}

It's just like variant 4 except that the begin and end iterators for s are saved.

When variant 5 is tested, it actually beats out variant 2 (the data string variant) when using VC++:

Variant 1 is the fastest
Variant 5 is second (1.6% slower than variant 1)
Variant 2 is third (1.4% slower than variant 5)
Variant 4 is third (4% slower than variant 2)
Variant 3 is fourth (17% slower than variant 4)

0人赞添加讨论(0) 举报

太酷不给撩

6楼-- · 2019-03-25 02:32

If you don't need to maintain the contents of the original string, then you can copy the last n characters into positions [n+1, 2n] of the original string and truncate it at 2n. You will have to be careful to first expand the string and also be careful not to overwrite any characters before writing to them if the string is shorter than 2n.

This will halve the number of operations to construct the string, as well as remove the need to create a new string. So its theoretically between 2 and 4 times faster. But of course you have just destroyed the original string, which you'd have to ask the interviewer if it is acceptable.

0人赞添加讨论(0) 举报

仙女界的扛把子

7楼-- · 2019-03-25 02:35

Memcpy is a cheat?

#include <cstring>
#include <iostream>
#include <string>

std::string first_last_n(int n, const std::string& s)
{
  if (s.size() < n)
      return "";

    char str[n*2];
    memcpy(str, s.data(), n);
    memcpy(str+n, s.data() + s.size()-n, n);

    return (const char *)str;
}

int main()
{
    std::cout << first_last_n(2, "123454321") << std::endl;
}

EDIT So I removed the other one. This is not a cheat.

0人赞添加讨论(0) 举报

1 2 下一页

C++ String Interview Question

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间