Speed up NTFS file enumeration (using FSCTL_ENUM_U

2019-03-31 23:25发布

问题:

I'm enumerating the files of a NTFS hard drive partition, by looking at the NTFS MFT / USN journal with:

HANDLE hDrive = CreateFile(szVolumePath, GENERIC_READ, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, NULL, NULL);
DWORD cb = 0;

MFT_ENUM_DATA med = { 0 };
med.StartFileReferenceNumber = 0;
med.LowUsn = 0;
med.HighUsn = MAXLONGLONG;      // no change in perf if I use med.HighUsn = ujd.NextUsn; where "USN_JOURNAL_DATA ujd" is loaded before

unsigned char pData[sizeof(DWORDLONG) + 0x10000] = { 0 }; // 64 kB

while (DeviceIoControl(hDrive, FSCTL_ENUM_USN_DATA, &med, sizeof(med), pData, sizeof(pData), &cb, NULL))
{
        med.StartFileReferenceNumber = *((DWORDLONG*) pData);    // pData contains FRN for next FSCTL_ENUM_USN_DATA

       // here normaly we should do: PUSN_RECORD pRecord = (PUSN_RECORD) (pData + sizeof(DWORDLONG)); 
       // and a second loop to extract the actual filenames
       // but I removed this because the real performance bottleneck
       // is DeviceIoControl(m_hDrive, FSCTL_ENUM_USN_DATA, ...)
}

It works, it is much faster than usual FindFirstFile enumeration techniques. But I see it's not optimal yet:

  • On my 700k files C:\, it takes 21 sec. (This measure has to be done after reboot, if not, it will be incorrect because of caching).

  • I have seen another indexing software (not Everything, another one) able to index C:\ in < 5 seconds (measured after Windows startup), without reading a pre-calculated database in a .db file (or other similar tricks that could speed up things!). This software does not use FSCTL_ENUM_USN_DATA, but low-level NTFS parsing instead.

What I've tried to improve performance:

  • Open file with another flag, like FILE_FLAG_SEQUENTIAL_SCAN, FILE_FLAG_RANDOM_ACCESS, or FILE_FLAG_NO_BUFFERING: same result: 21 seconds to read

  • Looking at Estimate the number of USN records on NTFS volume, Why file enumeration using DeviceIoControl is faster in VB.NET than in C++? I have studied them in depth but it doesn't provide an answer to this actual question.

  • Test another compiler: MinGW64 instead of VC++ Express 2013: same performance result, no difference

  • On VC++, I already have switched to Release instead of Debug: are there other Project Properties/Options that could speed up the progam?

Question:

Is it possible to improve performance DeviceIoControl(hDrive, FSCTL_ENUM_USN_DATA, ...)?

or is the only way to improve performance to do low-level manual parsing of NTFS?


Note: According to tests, the total size to be read during these DeviceIoControl(hDrive, FSCTL_ENUM_USN_DATA, ...) for my 700k files is only 84MB. 21 second to read 84MB is only 4 MB/sec (and I do have a SSD!). There is probably some room for performance improvement, don't you think so?