I wanted to know what the best way to write a large fortran array ( 5000 x 5000 real single precision numbers) to a file. I am trying to save the results of a numerical calculation for later use so they do not need to be repeated. From calculation 5000 x 5000 x 4bytes per number number is 100 Mb, is it possible to save this in a form that is only 100Mb? Is there a way to save fortran arrays as a binary file and read it back in for later use?
I've noticed that saving numbers to a text file produces a file much larger than the size of the data type being saved. Is this because the numbers are being saved as characters?
The only way I am familiar with to write to file is
open (unit=41, file='outfile.txt')
do i=1,len
do j=1,len
write(41,*) Array(i,j)
end do
end do
Although I'd imagine there is a better way to do it. If anyone could point me to some resources or examples to approve my ability to write and read larger files efficiently (in terms of memory) that would be great. Thanks!
OPEN the file for reading and writing as "unformatted", and READ and WRITE the data without supplying a format, as demonstrated in the program below.
Write data files in binary, unless you're going to actually be reading the output - and you're not going to be reading a 2.5 million-element array.
The reasons for using binary are threefold, in decreasing importance:
Accuracy concerns may be the most obvious. When you are converting a (binary) floating point number to a string representation of the decimal number, you are inevitably going to truncate at some point. That's ok if you are sure that when you read the text value back into a floating point value, you are certainly going to get the same value; but that is actually a subtle question and requires choosing your format carefully. Using default formatting, various compilers perform this task with varying degrees of quality. This blog post, written from the point of view of a games programmer, does a good job of covering the issues.
Let's consider a little program which, for a variety of formats, writes a single-precision real number out to a string, and then reads it back in again, keeping track of the maximum error it encounters. We'll just go from 0 to 1, in units of machine epsilon. The code follows:
and when we run it, we get:
Note that even using a format with 8 digits after the decimal place - which we might think would be plenty, given that single precision reals are only accurate to 6-7 decimal places - we don't get exact copies back, off by approximately 1e-8. And this compiler's default format does not give us accurate round-trip floating point values; some error is introduced! If you're a video-game programmer, that level of accuracy may well be enough. If you're doing time-dependant simulations of turbulent fluids, however, that might absolutely not be ok, particularly if there's some bias to where the error is introduced, or if the error occurs in what is supposed to be a conserved quantity.
Note that if you try running this code, you'll notice that it takes a surprisingly long time to finish. That's because, maybe surprisingly, performance is another real issue with text output of floating point numbers. Consider the following simple program, which just writes out your example of a 5000 × 5000 real array as text and as unformatted binary:
Here are the timing outputs, for writing to disk or to ramdisk:
Note that when writing to disk, the binary output is 352 times as fast as ASCII, and to ramdisk it's closer to 700 times. There are two reasons for this - one is that you can write out data all at once, rather than having to loop; the other is that generating the string decimal representation of a floating point number is a surprisingly subtle operation which requires a significant amount of computing for each value.
Finally, is data size; the text file in the above example comes out (on my system) to about 4 times the size of the binary file.
Now, there are real problems with binary output. In particular, raw Fortran (or, for that matter, C) binary output is very brittle. If you change platforms, or your data size changes, your output may no longer be any good. Adding new variables to the output will break the file format unless you always add new data at the end of the file, and you have no way of knowing ahead of time what variables are in a binary blob you get from your collaborator (who might be you, three months ago). Most of the downsides of binary output are avoided by using libraries like NetCDF, which write self-describing binary files that are much more "future proof" than raw binary. Better still, since it's a standard, many tools read NetCDF files.
There are many NetCDF tutorials on the internet; ours is here. A simple example using NetCDF gives similar times to raw binary:
but gives you a nice self-describing file:
and file sizes about the same as raw binary:
the code follows: