I have a system that loads some text files that are zipped into a ".log" file and parse then into informational classes using multiple threads that each deals with a different file and adds the parsed objects to a list. The file is loaded using TStringList, since it was the fastest method that I tested.
The number of text files is variable but normally I have to deal with something between 5 to 8 files ranging from 50Mb to 120Mb in one incursion.
My problem: The user can load the .log files as many times they desire, and after some of those processes I receive an EOutOfMemory exception when trying to use TStringList.LoadFromFile. Of course, the first thing that comes to mind to anyone that has ever used a StringList is that you should not use it when dealing with big textfiles, but this exception happens randomly and after the process has already been completed successfully at least once (the objects are destroyed before the start of a new parsing so the memory is retrieved correctly apart from some minor leaks)
I tried using textile and TStreamReader but it's not as fast as TStringList and the duration of the process is the greatest concern with this feature.
I'm using 10.1 Berlin, the parse process is a simple iteration trough the list of varied length lines and construction of objects based on the line info.
Essentially, my question is, what is causing this and how can i fix it. I may use other ways to load the file and read its contents but it must be as fast (or better) as the TStringList method.
Loading thread execute code:
TThreadFactory= class(TThread)
protected
// Class that holds the list of Commands already parsed, is owned outside of the thread
_logFile: TLogFile;
_criticalSection: TCriticalSection;
_error: string;
procedure Execute; override;
destructor Destroy; override;
public
constructor Create(AFile: TLogFile; ASection: TCriticalSection); overload;
property Error: string read _error;
end;
implementation
{ TThreadFactory}
constructor TThreadFactory.Create(AFile: TLogFile; ASection: TCriticalSection);
begin
inherited Create(True);
_logFile := AFile;
_criticalSection := ASection;
end;
procedure TThreadFactory.Execute;
var
tmpLogFile: TStringList;
tmpConvertedList: TList<TLogCommand>;
tmpCommand: TLogCommand;
tmpLine: string;
i: Integer;
begin
try
try
tmpConvertedList:= TList<TLogCommand>.Create;
if (_path <> '') and not(Terminated) then
begin
try
logFile:= TStringList.Create;
logFile.LoadFromFile(tmpCaminho);
for tmpLine in logFile do
begin
if Terminated then
Break;
if (tmpLine <> '') then
begin
// the logic here was simplified that's just that
tmpConvertedList.Add(TLogCommand.Create(tmpLine));
end;
end;
finally
logFile.Free;
end;
end;
_cricticalSection.Acquire;
_logFile.AddCommands(tmpConvertedList);
finally
_cricticalSection.Release;
FreeAndNil(tmpConvertedList);
end;
Except
on e: Exception do
_error := e.Message;
end;
end;
end.
Added: Thank you for all your feedback. I will address some issues that were discussed but I failed to mention in my initial question.
The .log file has multiple instances of .txt files inside but it can also have multiple .log files, each file represents a day worth of logging or a period selected by the user, since the decompression takes a lot of time a thread is started every time a .txt is found so I can start parsing immediately, this has shortened the noticeable waiting time for the user
The "minor leaks" are not shown by ReportMemoryLeaksOnShutdown and other methods like TStreamReader avoid this issue
The list of commands is held by TLogFile. There is only one instance of this class at any time and is destroyed whenever the user wants to load a .log file. All threads add commands to the same object, that's the reason for the critical section.
Can't detail the parse process since it would disclose some sensible information, but it's a simple information gathering from the string and the TCommand
Since the beginning I was aware of fragmentation but I never found concrete proof that TStringList causes the fragmentation only by loading multiple times, if this can be confirmed I would be very glad
Thank you for you attention. I ended up using an external library that was capable of reading lines and loading files with the same speed as TStringList
without the need to load the whole file into memory
TStringList
is slow class per se. It has a lot of -bells and whistles- extra features and functions that bog it down. Much faster containers would beTList<String>
or plain old dynamicarray of string
. SeeSystem.IOUTils.TFile.ReadAllLines
function.Read about Heap Memory Fragmentation, for example http://en.wikipedia.org/Heap_fragmentation
It can happen and break your application even without memory leaks. But since you say there are many small leaks - that is what most probably happen. You can more or less delay the crash by avoiding reading whole files into memory and operating with smaller chunks. But degradation would still go on, even slower, and in the end your program would crash again.
PS. General notes.
I think your team should reconsider how much need for multithreading you have. Frankly, I see none. You are loading files from HDD and probably you write processed and transformed files to the same (at best to some another) HDD. That means, your program speed is limited with disk speed. And that speed is MUCH less than speeds of CPU and RAM. By introducing multithreading you seem only to make your program more complex and fragile. Errors are much harder to detect, well known libraries may suddenly misbehave in MT mode, etc. And you probably get no performance increase, because the bottleneck is at disk I/O speed.
If you still want multithreading for the sake of it - then perhaps look into OmniThreading Library. It was designed to simplify developing "data streams" types of MT applications. Read the tutorials and examples.
I definitely suggest you to squash all those "some minor leaks" and as part of it to fix all compilation warnings. I know, it is hard when you are not the only programmer at the project and others do not care. Still "minor leaks" means none on your team knows how the program actually behaves or behaved. And non-deterministic random behavior in multi-threading environment can easily generate tonnes of random Shroeden-bugs which you would never be able to reproduce and fix.
Your
try-finally
pattern really is broken. The variable you clean up infinally
block should be assigned right beforetry
block, not within it!This is correct way:
So, sometimes,
This is also correct. The variable is set to be
nil
immediately before try-block is entered. If object creation fails, then when finally-blocks callsFree
method the variable was already assigned, andTObject.Free
(but NOTTObject.Destroy
) was designed to be able to work onnil
object references. By itself is just a noisy, overly verbose modification of the first one, but it serves as a foundation to few more derivatives.That pattern may be used when you do not know would you create an object or not.
Or when object creation is delayed, because you need to calculate some data for its creation, or because the object is very heavy (for example globally blocking access to some file) so you strive to keep its lifetime as short as possible.
That code though asks why the said "...some code" was not moved outside and before the try-block. Usually it can and should be. A rather rare pattern.
One more derivative from that pattern is used when creating several objects;
Goal is, if for example
o3
object creation fails, theno1
would get freed ando2
was not created and theFree
calls in finally-block would know it.That is semi-correct. It is assumed that destructing objects would never raise its own exceptions. Usually that assumption is correct, but not always. Anyway, this pattern lets you fuse several try-finally blocks into one, which makes source code shorter (easier to read and reason about) and execution a little bit faster. Usually this is also reasonably safe, but not always.
Now two typical misuses of the pattern:
If the code BETWEEN object creation and try-block raises some error - then there is no anyone to free that object. You just got a memory leak.
When you read Delphi sources you see maybe there a similar pattern
With all the wide-spread zeal against any use of
with
construct, this pattern precludes adding extra code between object creation and try-guarding. The typicalwith
drawbacks - possible namespaces collision and inability to pass this anonymous object to other functions as an argument - are included.One more unlucky modification:
This pattern is technically correct, but rather fragile at that. You do not immediately see the link between
o := nil
line and the try-block. When you would develop the program in the future, you may easily forget it and introduce the errors: like copy-pasting/moving the try-block into another function and forgetting the nil-initializing. Or extending the in-between code and making it use (thus - change) value of thato
. There is one case i sometimes use it, but it is very rare and comes with risks.Now,
This is what you write a lot without thinking how try-finally works and why it was invented. The problem is simple: when you enter the try-block your
o
variable is a container with random garbage. Now when you try to create the object, you may face some error raised. What then? Then you go into the finally-block and call(random-garbage).Free
- and what should it do? It would do the random garbage.So, to repeat all the above.
try
keyword. If you guard the file - then open it immediately beforetry
. If you guard against memory leak - create the object beforetry
. Etc. Do not do our first initialization AFTERtry
operator - WITHIN try-block - it is too late there.try
keyword. Better even, insert an empty line before that assignment. Make it jump into your (or any other reader's) eyes that this variable and this try are mutually dependent and should never be broken apart.