I need to create big relatively big (1-8 GB) files. What is the fastest way to do so on Windows using C or C++ ? I need to create them on the fly and the speed is really an issue. File will be used for storage emulation i.e will be access randomly in different offsets and i need that all storage will be preallocate but not initialized, currently we are writing all storage with dummy data and it's taking too long.
Thanks.
Use the Win32 API, CreateFile, SetFilePointerEx, SetEndOfFile, and CloseHandle. In that same order.
The trick is in the SetFilePointerEx function. From MSDN:
Note that it is not an error to set
the file pointer to a position beyond
the end of the file. The size of the
file does not increase until you call
the SetEndOfFile, WriteFile, or
WriteFileEx function.
Windows explorer actually does this same thing when copying a file from one location to another. It does this so that the disk does not need to re-allocate the file for a fragmented disk.
Check out memory mapped files.
They very much match the use case you describe, high performance and random access.
I believe they don't need to be created as large files. You just set a large max size on them and they will be expanded when you write to parts you haven't touched before.
Use "fsutil" command:
E:\VirtualMachines>fsutil file createnew
Usage : fsutil file createnew
Eg : fsutil file createnew C:\testfile.txt 1000
Reagds
P.S. it is for Windows: 2000/XP/7
Well this solution is not bad, but the thing you are looking for is SetFileValidData
As MSDN sais:
The SetFileValidData function allows you to avoid filling data with
zeros when writing nonsequentially to a file.
So this always leave disk data as it is, SetFilePointerEx
should set all data to zeros, so big allocation takes some time.
If you're using NTFS then sparse files are the way to go:
A file in which much of the data is
zeros is said to contain a sparse data
set. Files like these are typically
very large—for example, a file
containing image data to be processed
or a matrix within a high-speed
database. The problem with files
containing sparse data sets is that
the majority of the file does not
contain useful data and, because of
this, they are an inefficient use of
disk space.
The file compression in the NTFS file
system is a partial solution to the
problem. All data in the file that is
not explicitly written is explicitly
set to zero. File compression compacts
these ranges of zeros. However, a
drawback of file compression is that
access time may increase due to data
compression and decompression.
Support for sparse files is introduced
in the NTFS file system as another way
to make disk space usage more
efficient. When sparse file
functionality is enabled, the system
does not allocate hard drive space to
a file except in regions where it
contains nonzero data. When a write
operation is attempted where a large
amount of the data in the buffer is
zeros, the zeros are not written to
the file. Instead, the file system
creates an internal list containing
the locations of the zeros in the
file, and this list is consulted
during all read operations. When a
read operation is performed in areas
of the file where zeros were located,
the file system returns the
appropriate number of zeros in the
buffer allocated for the read
operation. In this way, maintenance of
the sparse file is transparent to all
processes that access it, and is more
efficient than compression for this
particular scenario.
I am aware that your question is tagged with Windows, and Brian R. Bondy gave you the best answer to your question if you know for certain you will not have to port your application to other platforms. However, if you might have to port your application to other platforms, you might want to do something more like what Adrian Cornish proposed as the answer for the question "How to create file of “x” size?" found at How to create file of "x" size?.
FILE *fp=fopen("myfile", "w");
fseek(fp, 1024*1024, SEEK_SET);
fputc('\n', fp);
fclose(fp);
Of course, there is an added twist. The answer proposed by Adrian Cornish makes use of the fseek function which has the following signature.
int fseek ( FILE * stream, long int offset, int origin );
The problem is that you want to create a very large file with a file size that is beyond the range of a 32-bit integer. You need to use the 64-bit equivalent of fseek. Unfortunately, on different platforms it has different names.
The header file LargeFileSupport.h found at http://mosaik-aligner.googlecode.com/svn-history/r2/trunk/src/CommonSource/Utilities/LargeFileSupport.h offers a solution to this problem.
This would allow you to write the following function.
#include "LargeFileSupport.h"
/* Include other headers. */
bool createLargeFile(const char * filename, off_type size)
{
FILE *fp = fopen(filename, "w");
if (!fp)
{
return false;
}
fseek64(fp, size, SEEK_SET);
fputc('\n', fp);
fclose(fp);
}
I thought I would add this just in case the information would be of use to you.