Accessing a single file with multiple threads

2019-01-11 04:28发布

问题:

I need to access a file concurrently with multiple threads. This needs to be done concurrently, without thread serialisation for performance reasons.

The file in particular has been created with the 'temporary' file attribute that encourages windows to keep the file in the system cache. This means most of the time the file read wont go near the disk, but will read the portion of the file from the system cache.

Being able to concurrently access this file will significantly improve performance of certain algorithms in my code.

So, there are two questions here:

  1. Is it possible for windows to concurrently access the same file from different threads?
  2. If so, how do you provide this ability? I've tried creating the temp file and opening the file again to provide two file handles, but the second open does not succeed.

Here's the create:

FFileSystem := CreateFile(PChar(FFileName),
                          GENERIC_READ + GENERIC_WRITE,
                          FILE_SHARE_READ + FILE_SHARE_WRITE,
                          nil,
                          CREATE_ALWAYS,
                          FILE_ATTRIBUTE_NORMAL OR
                          FILE_FLAG_RANDOM_ACCESS OR
                          FILE_ATTRIBUTE_TEMPORARY OR
                          FILE_FLAG_DELETE_ON_CLOSE,
                          0);

Here's the second open:

FFileSystem2 := CreateFile(PChar(FFileName),
                          GENERIC_READ,
                          FILE_SHARE_READ,
                          nil,
                          OPEN_EXISTING,
                          FILE_ATTRIBUTE_NORMAL OR
                          FILE_FLAG_RANDOM_ACCESS OR
                          FILE_ATTRIBUTE_TEMPORARY OR
                          FILE_FLAG_DELETE_ON_CLOSE,
                          0);

I've tried various combinations of the flags with no success so far. The second file open always fails, with messages to the affect that the file cannot be accessed as it is in use by another process.

Edit:

OK, some more information (I was hoping to not get lost in the weeds here...)

The process in question is a Win32 server process running on WinXP 64. It's maintaining large spatial databases and would like to keep as much of the spatial database as possible in memory in an L1/L2 cache structure. L1 already exists. L2 exists as a 'temporary' file that stays in the windows system cache (it's somewhat of a dirty trick, but gets around win32 memory limitations somewhat). Win64 means I can have lots of memory used by the system cache so memory used to hold the L2 cache does count towards process memory.

Multiple (potentially many) threads want to concurrently access information contained in the L2 cache. Currently, access is serialised, which means one thread gets to read it's data while most (or the rest) of the threads are blocked pending completion of that operation.

The L2 cache file does get written to, but I'm happy to globally serialise/interleave read and write type operations as long as I can perform concurrent reads.

I'm aware there are nasty potential thread concurrency issues, and I'm aware there are dozens of ways to skin this cat in other contexts. I have this particular context, and I'm trying to determine if there is a way to permit concurrent thread read access within the file and within the same process.

Another approach I have considered would be two split the L2 cache into multiple temporary files, where each file serialises thread access the way the current single L2 cache file does.

And yes, this somewhat desparate approach is because 64 bit Delphi wont be with us any time soon :-(

Thanks, Raymond.

回答1:

Yes, it's possible for a program to open the same file multiple times from different threads. You'll want to avoid reading from the file at the same time you're writing to it, though. You can use TMultiReadExclusiveWriteSynchronizer to control access to the entire file. It's less serialized than, say, a critical section. For more granular control, take a look at LockFileEx to control access to specific regions of the file as you need them. When writing, request an exclusive lock; when reading, a shared lock.

As for the code you posted, specifying File_Share_Write in the initial sharing flags means that all subsequent open operations must also share the file for writing. Quoting from the documentation:

If this flag is not specified, but the file or device has been opened for write access or has a file mapping with write access, the function fails.

Your second open request was saying that it did not want anybody else to be allowed to write to the file while that handle remained open. Since there was already another handle open that did allow writing, the second request could not be fulfilled. GetLastError should have returned 32, which is Error_Sharing_Violation, exactly what the documentation says should happen.

Specifying File_Flag_Delete_On_Close means all subsequent open requests need to share the file for deletion. The documentation again:

Subsequent open requests for the file fail, unless the FILE_SHARE_DELETE share mode is specified.

Then, since the second open request shares the file for deletion, all other open handles must have also shared it for deletion. The documentation:

If there are existing open handles to a file, the call fails unless they were all opened with the FILE_SHARE_DELETE share mode.

The bottom line is that either everybody shares alike or nobody shares at all.

FFileSystem := CreateFile(PChar(FFileName),
  Generic_Read or Generic_Write
  File_Share_Read or File_Share_Write or File_Share_Delete,
  nil,
  Create_Always,
  File_Attribute_Normal or File_Flag_Random_Access
    or File_Attribute_Temporary or File_Flag_Delete_On_Close,
  0);

FFileSystem2 := CreateFile(PChar(FFileName),
  Generic_Read,
  File_Share_Read or File_Share_Write or File_Share_Delete,
  nil,
  Open_Existing,
  File_Attribute_Normal or File_Flag_Random_Access
    or File_Attribute_Temporary or File_Flag_Delete_On_Close,
  0);

In other words, all the parameters are the same except for the fifth one.

These rules apply to two attempts to open on the same thread as well as attempts from different threads.



回答2:

Update #2

I wrote some test projects in C to try and figure this out- although Rob Kennedy beat me to the answer while I was away. Both conditions are possible, including cross-process, as he outlines. Here's a link if anyone else would like to see this in action.

SharedFileTests.zip (VS2005 C++ Solution) @ meklarian.com

There are three projects:

InProcessThreadShareTest - Test a creator and client thread.
InProcessThreadShareTest.cpp Snippet @ gist.github

SharedFileHost - Create a host that runs for 1 minute and updates a file.
SharedFileClient - Create a client that runs for 30 seconds and polls a file.
SharedFileHost.cpp and SharedFileClient.cpp Snippet @ gist.github

All of these projects assume the location C:\data\tmp\sharetest.txt is creatable and writable.


Update

Given your scenario, sounds like you need a very large chunk of memory. Instead of gaming the system cache, you can use AWE to have access to more than 4Gb of memory, although you will need to map portions at a time. This should cover your L2 scenario as you wish to ensure that physical memory is used.

Address Windowing Extensions @ MSDN

Use AllocateUserPhysicalPages and VirtualAlloc to reserve memory.

AllocateUserPhysicalPages Function (Windows) @ MSDN
VirtualAlloc Function (Windows) @ MSDN


Initial

Given that you are using the flag FILE_FLAG_DELETE_ON_CLOSE, is there any reason you wouldn't consider using a memory-mapped file instead?

Managing Memory-Mapped files in Win32 @ MSDN

From what I see in your CreateFile statements, it appears that you want to share data across-thread or across-process, with regard only to having the same file present while any sessions are open. A memory mapped file allows you to use the same logical filename in all sessions. Another benefit is that you can map views and lock portions of the mapped file with safety across all sessions. If you have a strict server with N-client scenario, it should be easy to implement. If you have a case where any client may be the opening server, you may wish to consider using some other mechanism to ensure that only one client gets to initiate the serving file first (via a global mutex, perhaps).

CreateMutex @ MSDN

If you only need one-way transmission of data, perhaps you could use named pipes instead.
(edit) This is best for 1 server to 1 client.

Named Pipes (Windows) @ MSDN



回答3:

You can do on that way...

First thread with read/write access must at first create file:

FileHandle := CreateFile(
  PChar(FileName),
  GENERIC_READ or GENERIC_WRITE,
  FILE_SHARE_READ,
  nil,
  CREATE_ALWAYS,
  FILE_ATTRIBUTE_NORMAL,
  0);

Sencond thread with only read access then opens the same file:

  FileHandle := CreateFile(
    PCHar(FileName),
    GENERIC_READ,
    FILE_SHARE_READ + FILE_SHARE_WRITE,
    nil,
    OPEN_EXISTING,
    FILE_ATTRIBUTE_NORMAL,
    0);

I didn't test if works with...

FILE_ATTRIBUTE_TEMPORARY,
FILE_FLAG_DELETE_ON_CLOSE

attributes...



回答4:

I need to access a file concurrently with multiple threads. This needs to be done concurrently, without thread serialisation for performance reasons.

Either you don't need to use the same file within different threads, or you do need some kind of serialization.

Otherwise, you're just setting yourself up for heartache down the road.