C++ string memory management

2019-02-02 12:47发布

Last week I wrote a few lines of code in C# to fire up a large text file (300,000 lines) into a Dictionary. It took ten minutes to write and it executed in less than a second.

Now I'm converting that piece of code into C++ (because I need it in an old C++ COM object). I've spent two days on it this far. :-( Although the productivity difference is shocking on its own, it's the performance that I would need some advice on.

It takes seven seconds to load, and even worse: it takes just exactly that much time to free all the CStringWs afterwards. This is not acceptable, and I must find a way to increase the performance.

Are there any chance that I can allocate this many strings without seeing this horrible performace degradation?

My guess right now is that I'll have to stuff all the text into a large array and then let my hash table point to the beginning of each string within this array and drop the CStringW stuff.

But before that, any advice from you C++ experts out there?

EDIT: My answer to myself is given below. I realized that that is the fastest route for me, and also step in what I consider the right direction - towards more managed code.

10条回答
我欲成王,谁敢阻挡
2楼-- · 2019-02-02 13:30

It's no wonder that CLR's memory management is better than the bunch of old and dirty tricks MFC is based on: it is at least two times younger than MFC itself, and it is pool-based. When I had to work on similar project with string arrays and WinAPI/MFC, I just used std::basic_string instantiated with WinAPI's TCHAR and my own allocator based on Loki::SmallObjAllocator. You can also take a look at boost::pool in this case (if you want it to have an "std feel" or have to use a version of VC++ compiler older than 7.1).

查看更多
混吃等死
3楼-- · 2019-02-02 13:32

You are stepping into the shoes of Raymond Chen. He did the exact same thing, writing a Chinese dictionary in unmanaged C++. Rico Mariani did too, writing it in C#. Mr. Mariani made one version. Mr. Chen wrote 6 versions, trying to match the perf of Mariani's version. He pretty much rewrote significant chunks of the C/C++ runtime library to get there.

Managed code got a lot more respect after that. The GC allocator is impossible to beat. Check this blog post for the links. This blog post might interest you too, instructive to see how the STL value semantics are part of the problem.

查看更多
Root(大扎)
4楼-- · 2019-02-02 13:35

Yikes. get rid of the CStrings...

try a profiler as well. are you sure you were not just running debug code?

use std::string instead.

EDIT:

I just did a simple test of ctor and dtor comparisons.

CStringW seems to take between 2 and 3 times the time to do a new/delete.

iterated 1000000 times doing new/delete for each type. Nothing else - and a GetTickCount() call before and after each loop. Consistently get twice as long for CStringW.

That doesn't address your entire issue though I suspect.

EDIT: I also don't think that using string or CStringW is the real the problem - there is something else going on that is causing your issue.

(but for god's sake, use stl anyway!)

You need to profile it. That is a disaster.

查看更多
家丑人穷心不美
5楼-- · 2019-02-02 13:37

If it is a read-only dictionary then the following should work for you.

Use fseek/ftell functionality, to find the size of the text file.

Allocate a chunk of memory of that size + 1 to hold it.

fread the entire text file, into your memory chunk.

Iterate though the chunk.

    push_back into a vector<const char *> the starting address of each line.

    search for the line terminator using strchr.

    when you find it, deposit a NUL, which turns it into a string.
    the next character is the start of the next line

until you do not find a line terminator.

Insert a final NUL character.

You can now use the vector, to get the pointer, that will let you access the corresponding value.

When you are finished with your dictionary, deallocate the memory, let the vector die when going out of scope.

[EDIT] This can be a little more complicated on the dos platform, as the line terminator is CRLF.

In that case, use strstr to find it, and increment by 2 to find the start of the next line.

查看更多
登录 后发表回答