how to make a multithread copy files

2020-05-06 11:42发布

问题:

I want to copy many files in one, but using multiThread,supposing that file A is the file in which different threads copy datas, in this case each thread is meant to copy one file in file A, using this procedure:

procedure ConcatenateFiles(const InFileNames: array of string;
const OutFileName: string);
var
i: Integer;
InStream, OutStream: TFileStream;
begin
OutStream := TFileStream.Create(OutFileName, fmCreate);
try
 for i := 0 to high(InFileNames) do
 begin
  InStream := TFileStream.Create(InFileNames[i], fmOpenRead);
  try
    OutStream.CopyFrom(InStream, InStream.Size);
  finally
    InStream.Free;
  end;
 end;
finally
 OutStream.Free;
end;

end;

First, is it possible to realise multithread copy files in this case, because OutFileName is a global variable, two threads can't use it at the same time, and this is the error that i get, if this is possible how can I synchronise threads to avoid the use of OutFileName by more than one processus in a moment? And is it really efficient to make a multithread copy files, I'm talking about the speed of copying files. thanks for your replies

回答1:

It's perfectly possible to copy files using multiple threads. You would typically use a single producer thread and multiple consumers to do the work. In your case you are concatenating. So you'd need to work out the start and end point of each source file, and then get the threads to write separate parts of the destination file at the pre-calculated positions. Certainly possible.

However, it's not a good idea idea. Multiple threading works well when the job is CPU bound. File copying is disk bound and no amount of extra threads can help. In fact you will likely end up making performance worse because the multiple threads will just get in each others way whilst fighting over the shared disk resource.



回答2:

If you want to concatenate multiple input files in parallel into a single destination file, you can do it this way:

  1. pre-allocate the destination file. Create the file, seek to the intended final concatenated file size, and set EOF to allocate the file on the file system. With a TFileStream, this can be accomplished by simply setting the TFileStream.Size property to the intended size. Otherwise, using the Win32 API directly, you would have to use CreateFile(), SetFilePointer(), and SetEndOfFile().

  2. Divide up the destination file into logical sections, each with a starting and ending offset within the file, and assign those sections to your threads as needed. Have each thread open its own local handle to the same destination file. That will allow each thread to seek and write independently. Make sure each thread does not leave its assigned section so it does not corrupt another thread's written data.

For example:

type
  TFileInfo = record
    InFileName: String;
    OutFileName: String;
    OutFileStart: Int64;
    OutFileSize: Int64;
  end;

  TCopyThread = class(TThread)
  protected
   FFileInfo: TFileInfo;
   procedure Execute;
  public
    constructor Create(const AFileInfo: TFileInfo);
  end;

constructor TCopyThread.Create(const AFileInfo: TFileInfo);
begin
  inherited Create(False);
  FFileInfo := AFileInfo;
 end;

procedure TCopyThread.Execute;
var
  InStream: TFileStream;
  OutStream: TFileStream;
begin
  InStream := TFileStream.Create(FFileInfo.InFileName, fmOpenRead or fmShareDenyWrite);
  try
    OutStream := TFileStream.Create(FFileInfo.OutFileName, fmOpenWrite or fmShareDenyNone);
    try
      OutStream.Position := FFileInfo.OutFileStart;
      OutStream.CopyFrom(InStream, FFileInfo.OutFileSize);
    finally
      OutStream.Free;
    end;
  finally
    InStream.Free;
  end;
end;

procedure ConcatenateFiles(const InFileNames: array of string; const OutFileName: string);
var
  i: Integer;
  OutStream: TFileStream;
  FileInfo: array of TFileInfo;
  TotalSize: Int64;
  sr: TSearchRec;
  Threads: array of TCopyThread;
  ThreadHandles: array of THandle;
  NumThreads: Integer;      
begin
  SetLength(FileInfo, Length(InFileNames));
  NumThreads := 0;
  TotalSize := 0;

  for i := 0 to High(InFileNames) do
  begin
    if FindFirst(InFileNames[i], faAnyFile, sr) <> 0 then
      raise Exception.CreateFmt('Cannot retrieve size of file: %s', [InFileNames[i]]);

    if sr.Size > 0 then
    begin
      FileInfo[NumThreads].InFileName := InFileNames[i];
      FileInfo[NumThreads].OutFileName := OutFileName;
      FileInfo[NumThreads].OutFileStart := TotalSize;
      FileInfo[NumThreads].OutFileSize := sr.Size;
      Inc(NumThreads);
      Inc(TotalSize, sr.Size);
    end;

    FindClose(sr); 
  end;

  OutStream := TFileStream.Create(OutFileName, fmCreate);
  try
    OutStream.Size := TotalSize;
  finally
    OutStream.Free;
  end;

  SetLength(Threads, NumThreads);
  SetLength(ThreadHandles, NumThreads);

  for i := 0 to NumThreads-1 do
  begin
    Threads[i] := TCopyThread.Create(FileInfo[i]);
    ThreadHandles[i] := Threads[i].Handle;
  end;

  i := 0;
  while i < NumThreads do
  begin
    WaitForMultipleObjects(Min(NumThreads-i, MAXIMUM_WAIT_OBJECTS), ThreadHandles[i], TRUE, INFINITE);
    Inc(i, MAXIMUM_WAIT_OBJECTS);
  end;

  for i := 0 to NumThreads-1 do
  begin
    Threads[i].Terminate;
    Threads[i].WaitFor;
    Threads[i].Free;
  end;
end;


回答3:

As it was mentioned already writing to same file from multiple threads is not so good idea.

If you try doing it in a way that multiple threads share same file handle you end up with big problem of making sure that one thread doesent move file position using Seek command while other one is trying to write some data.

If you try doing it in a way that each thread creates its own handle to the file then you end up with the problem that OS doesen't generally alow having multiple file handles with writing capability simuntaniously as this can be recipie for disaster (data coruption).

Now even if you somehow manage to get this working so that each tread is writing in its own section of the file and that they are not messing with each other you will still be losing some performance due to hard drive limitation (HDD head needs to be repositioned into corect place - lots of back and forth movment).

Hey but you could use miltiple threads to go and prepare the final file inside your memory before it is being written on your hard drive. This can be easily done since memory acces is so fast that you practically don't lose any pefrormance by jumping back and forth. The only problem with this is that you could quickly run out of memory if you are concating several larger files.

EDIT: BTW if you are interested I could share a code example of two threaded double-buffered file copy example that I made several years ago. Note it does not provide any data verification capabilities as it was only written to test a theory or shoud I say break a theory that it isn't possible to copy a file only with Delphi (without uising file copy API from Windows). When doing file copy on same HDD it is a bit slower than built in Windows routine but when copying from one HDD to another it reaches same speed as windows built in routines.