I have a C++ application where the "delete" function is slow to run. What might cause this and where should I begin my search for a solution?
Background:
This C++ code is in an ARX file running inside of AutoCAD, which is basically just a DLL.
The specific computer where delete is slow is running AutoCAD 2011, Windows 7, 64-bit. ARX's for AutoCAD 2011 have to be compiled using Visual Studio 2008 Service Pack 1.
The computer with the problem is a customer's computer. It does not have any version of Visual Studio installed on it.
On my development computer, the code does not have any problem in AutoCAD 2011.
To test, I have some code that deletes a linked list. On the computer with the problem, it takes 0.7 seconds to delete the list. On the computers and configurations without the problem, the same code takes 0.02 seconds. The specific times are not important--the large different between the two numbers is.
I made sure to be running the same version of the code on both computers, so it is not a release versus debug build problem.
How is the retlist being generated? Additionally, Is there a reason you are allocating and deleting the resbufs manually instead of using acutNewRb and acutRelRb?
Also, you have probably checked these, but does the user have the default drawing loaded in both AutoCAD 2009 and 2011? If not, are the drawings the same (except for ACAD version), and are they located locally, or on a network drive? You might also look at whether the user has the same lisp/.Net/objectARX applications running in both instances. Also, is the AutoCAD 2011 a network or local installation?
Finally, you might want to add autocad and objectarx tags to the question.
Roughly in the order I'd check them:
new
's take unusually long? How are the individualdelete
times distributed?We run into this all the time. It's nothing wrong with your code, deleting thousands of items can take several seconds (and I've seen it reach minutes, too) even in Release mode.
The answer is to not delete. Get yourself a real memory allocator, and instead of individually allocating each object create a memory pool (or a custom heap or whatever you want to call it). Nedmalloc is what we use and recommend, and you can create a "nedpool." Basically, a pool is a block of memory where your objects will be allocated from. You still allocate memory for each object, but it gets taken from the pool rather than directly from the OS.
When the time to delete comes, you just delete the entire pool rather than deleting the objects one by one. Use a different pool for each batch of objects that will expire at the same time. You do not need to allocate memory for the whole pool upfront, but you can only delete the whole thing at once.
Could be due to different cache efficiency between the working/failing system. There may be more memory fragmentation on the failing system which causes the large delete to thrash the cache. On a quiescent system the data may end up more sequential and get more cache hits during the big delete.
Try the Intel Performance Counter Monitor?
If it is acceptable and possible, then try to use a profiler on the customer's computer.
You may try AMD CodeAnalyst or the Intel profiler (though that one is not free).
If that is not possible, then add profiling code to your release builds and gather results from the customer. Even simple profiling code may help you find the real bottleneck.
It does not look like the delete itself is the problem, but the problem may be some other part of the code.
E.g. - what is the type of
head->resval.rstring
?