Reading records from a Fortran binary file in Pyth

2020-08-04 10:16发布

问题:

I need to read a Fortran binary file with python, named merg_2015041312_4km-pixel.Z (from here), that is compressed; the structure of uncompressed file is defined here. The definition says that

Each file contains 2 records: the 1st for the "on the hour" images (":00") and the 2nd for the "on the half hour" images (":30").

and

Each record is a 9896 x 3298 Fortran array of IR brightness temperatures that have been scaled to fit into 1-byte by subtracting "75" from each datum.

GrADS .ctl file description:

    DSET merg_1999042012_4km-pixel
    OPTIONS yrev little_endian template
    UNDEF  330
    TITLE  globally merged IR data
    XDEF 9896 LINEAR   0.0182 0.036378335
    YDEF 3298 LINEAR   -59.982 0.036383683
    ZDEF   01 LEVELS 1
    TDEF 99999  LINEAR   12z04Apr1999 30mn
    VARS 1
    ch4  1  -1,40,1,-1 IR BT  (add  '75' to this value)
    ENDVARS 

and I tried to write some python code:

>>> import struct
>>> file = open("merg_2015041312_4km-pixel", 'rb')
>>> data = struct.unpack('>h', file.read())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
struct.error: unpack requires a string argument of length 2

Unfortunately, I'm not used to binary files...

How can I obtain the second record (half hourly) from this file?

回答1:

From reading the dataset description and testing with a dataset it is clear that the file is a (compressed) Fortran direct-access file with 2 records of IR data of size (9896, 3298) downscaled to fit in 1 byte by subtracting 75 from the values. It is not clear if the resulting byte is unsigned or signed, because I have no experience with GrADS control definitions.

numpy.fromfile is a tool for easily reading binary Fortran direct-access files.

Use either int8 or uint8 as dtype to build your record data type object, check which one makes sense. For example if the data is IR temperatures in Kelvins, using uint8 would result in min 186 and max 330 after scaling (~ -87°C to 56.8°C). Upcast to big enough type for upscaling, used float here, could be int16, int32 etc.

H = 9896
W = 3298
Record = np.dtype(('uint8', H*W))
A = np.fromfile('merg_2015041312_4km-pixel',
                dtype=Record, count=2).astype('float') + 75

The resulting 1d-arrays have to be reshaped to correct dimensions and shape. Data type objects would support sub-arrays, but they are always read in C-contiguous memory layout.

I_on_the_hour = A[0].reshape((H, W), order='F')  # Fortran data order
I_on_the_half_hour = A[1].reshape((H, W), order='F')

Check that the results look sane (in ipython --pylab)

plt.imshow(I_on_the_half_hour)