How Can I Efficiently Read The FIrst Few Lines of

2019-01-23 08:50发布

I have a "Find Files" function in my program that will find text files with the .ged suffix that my program reads. I display the found results in an explorer-like window that looks like this:

enter image description here

I use the standard FindFirst / FindNext methods, and this works very quickly. The 584 files shown above are found and displayed within a couple of seconds.

What I'd now like to do is add two columns to the display that shows the "Source" and "Version" that are contained in each of these files. This information is found usually within the first 10 lines of each file, on lines that look like:

1 SOUR FTM
2 VERS Family Tree Maker (20.0.0.368)

Now I have no problem parsing this very quickly myself, and that is not what I'm asking.

What I need help with is simply how to most quickly load the first 10 or so lines from these files so that I can parse them.

I have tried to do a StringList.LoadFromFile, but it takes too much time loading the large files, such at those above 1 MB.

Since I only need the first 10 lines or so, how would I best get them?

I'm using Delphi 2009, and my input files might or might not be Unicode, so this needs to work for any encoding.


Followup: Thanks Antonio,

I ended up doing this which works fine:

var
  CurFileStream: TStream;
  Buffer: TBytes;
  Value: string;
  Encoding: TEncoding;

try
  CurFileStream := TFileStream.Create(folder + FileName, fmOpenRead);
  SetLength(Buffer, 256);
  CurFileStream.Read(Buffer[0], 256);
  TEncoding.GetBufferEncoding(Buffer, Encoding);
  Value := Encoding.GetString(Buffer);
  ...
  (parse through Value to get what I want)
  ...
finally
  CurFileStream.Free;
end;

5条回答
在下西门庆
2楼-- · 2019-01-23 09:00

You can use a TStreamReader to read individual lines from any TStream object, such as a TFileStream. For even faster file I/O, you could use Memory-Mapped Views with TCustomMemoryStream.

查看更多
你好瞎i
3楼-- · 2019-01-23 09:01

Just open the file yourself for block reading (not using TStringList builtin functionality), and read the first block of the file, and then you can for example load that block to a stringlist with strings.SetText() (if you are using block functions) or simply strings.LoadFromStream() if you are loading your blocks using streams.

I would personally just go with FileRead/FileWrite block functions, and load the block into a buffer. You could also use similair winapi functions, but that's just more code for no reason.

OS reads files in blocks, which are at least 512bytes big on almost any platform/filesystem, so you can read 512 bytes first (and hope that you got all 10 lines, which will be true if your lines are generally short enough). This will be (practically) as fast as reading 100 or 200 bytes.

Then if you notice that your strings objects has only less than 10 lines, just read next 512 byte block and try to parse again. (Or just go with 1024, 2048 and so on blocks, on many systems it will probably be as fast as 512 blocks, as filesystem cluster sizes are generally larger than 512 bytes).

PS. Also, using threads or asynchronous functionality in winapi file functions (CreateFile and such), you could load that data from files asynchronously, while the rest of your application works. Specifically, the interface will not freeze during reading of large directories.

This will make the loading of your information appear faster, (since the file list will load directly, and then some milliseconds later the rest of the information will come up), while not actually increasing the real reading speed.

Do this only if you have tried the other methods and you feel like you need the extra boost.

查看更多
唯我独甜
4楼-- · 2019-01-23 09:07

Sometimes oldschool pascal stylee is not that bad. Even though non-oo file access doesn't seem to be very popular anymore, ReadLn(F,xxx) still works pretty ok in situations like yours.

The code below loads information (filename, source and version) into a TDictionary so that you can look it up easily, or you can use a listview in virtual mode, and look stuff up in this list when the ondata even fires.

Warning: code below does not work with unicode.

program Project101;
{$APPTYPE CONSOLE}

uses
  IoUtils, Generics.Collections, SysUtils;

type
  TFileInfo=record
    FileName,
    Source,
    Version:String;
  end;

function LoadFileInfo(var aFileInfo:TFileInfo):Boolean;
var
  F:TextFile;
begin
  Result := False;
  AssignFile(F,aFileInfo.FileName);
  {$I-}
  Reset(F);
  {$I+}
  if IOResult = 0 then
  begin
    ReadLn(F,aFileInfo.Source);
    ReadLn(F,aFileInfo.Version);
    CloseFile(F);
    Exit(True)
  end
  else
    WriteLn('Could not open ', aFileInfo.FileName);
end;

var
  FileInfo:TFileInfo;
  Files:TDictionary<string,TFileInfo>;
  S:String;
begin
  Files := TDictionary<string,TFileInfo>.Create;
  try
    for S in TDirectory.GetFiles('h:\WINDOWS\system32','*.xml') do
    begin
      WriteLn(S);
      FileInfo.FileName := S;
      if LoadFileInfo(FileInfo) then
        Files.Add(S,FileInfo);
    end;

    // showing file information...
    for FileInfo in Files.Values do
      WriteLn(FileInfo.Source, ' ',FileInfo.Version);
  finally
    Files.Free
  end;
  WriteLn;
  WriteLn('Done. Press any key to quit . . .');
  ReadLn;
end.
查看更多
别忘想泡老子
5楼-- · 2019-01-23 09:12

Use TFileStream and with Read method read number of bytes needed. Here is the example of reading bitmap info that is also stored on begining of the file.

http://www.delphidabbler.com/tips/19

查看更多
爷、活的狠高调
6楼-- · 2019-01-23 09:25

Okay, I deleted my first answer. Using Remy's first suggestion above, I tried again with built-in stuff. What I don't like here is that you have to create and free two objects. I think I would make my own class to wrap this up:

var
  fs:TFileStream;
  tr:TTextReader;
  filename:String;
begin
  filename :=  'c:\temp\textFileUtf8.txt';
  fs := TFileStream.Create(filename, fmOpenRead);
  tr := TStreamReader.Create(fs);
  try
      Memo1.Lines.Add( tr.ReadLine );

  finally
    tr.Free;
    fs.Free;
  end;   
end;

If anybody is interested in what I had here before, it had the problem of not working with unicode files.

查看更多
登录 后发表回答