reading a binary file in python

2019-05-04 23:09发布

问题:

I have to read a binary file in python. This is first written by a Fortran 90 program in this way:

open(unit=10,file=filename,form='unformatted')
write(10)table%n1,table%n2
write(10)table%nH
write(10)table%T2
write(10)table%cool
write(10)table%heat
write(10)table%cool_com
write(10)table%heat_com
write(10)table%metal
write(10)table%cool_prime
write(10)table%heat_prime
write(10)table%cool_com_prime
write(10)table%heat_com_prime
write(10)table%metal_prime
write(10)table%mu
if (if_species_abundances) write(10)table%n_spec
close(10)

I can easily read this binary file with the following IDL code:

n1=161L
n2=101L
openr,1,file,/f77_unformatted
readu,1,n1,n2
print,n1,n2
spec=dblarr(n1,n2,6)
metal=dblarr(n1,n2)
cool=dblarr(n1,n2)
heat=dblarr(n1,n2)
metal_prime=dblarr(n1,n2)
cool_prime=dblarr(n1,n2)
heat_prime=dblarr(n1,n2)
mu  =dblarr(n1,n2)
n   =dblarr(n1)
T   =dblarr(n2)
Teq =dblarr(n1)
readu,1,n
readu,1,T
readu,1,Teq
readu,1,cool
readu,1,heat
readu,1,metal
readu,1,cool_prime
readu,1,heat_prime
readu,1,metal_prime
readu,1,mu
readu,1,spec
print,spec
close,1

What I want to do is reading this binary file with Python. But there are some problems. First of all, here is my attempt to read the file:

import numpy
from numpy import *
import struct

file='name_of_my_file'
with open(file,mode='rb') as lines:
    c=lines.read()

I try to read the first two variables:

dummy, n1, n2, dummy = struct.unpack('iiii',c[:16])

But as you can see I had to add to dummy variables because, somehow, the fortran programs add the integer 8 in those positions.

The problem is now when trying to read the other bytes. I don't get the same result of the IDL program.

Here is my attempt to read the array n

 double = 8
 end = 16+n1*double
 nH = struct.unpack('d'*n1,c[16:end])

However, when I print this array I get non sense value. I mean, I can read the file with the above IDL code, so I know what to expect. So my question is: how can I read this file when I don't know exactly the structure? Why with IDL it is so simple to read it? I need to read this data set with Python.

回答1:

What you're looking for is the struct module.

This module allows you to unpack data from strings, treating it like binary data.

You supply a format string, and your file string, and it will consume the data returning you binary objects.

For example, using your variables:

import struct
content = f.read() #I'm not sure why in a binary file you were using "readlines",
                   #but if this is too much data, you can supply a size to read()
n, T, Teq, cool = struct.unpack("dddd",content[:32])

This will make n, T, Teq, and cool hold the first four doubles in your binary file. Of course, this is just a demonstration. Your example looks like it wants lists of doubles - conveniently struct.unpack returns a tuple, which I take for your case will still work fine (if not, you can listify them). Keep in mind that struct.unpack needs to consume the whole string passed into it - otherwise you'll get a struct.error. So, either slice your input string, or only read the number of characters you'll use, like I said above in my comment.

For example,

n_content = f.read(8*number_of_ns) #8, because doubles are 8 bytes
n = struct.unpack("d"*number_of_ns,n_content)


回答2:

Did you give scipy.io.readsav a try?

Simply read you file like this:

mydict = scipy.io.readsav('name_of_file')


回答3:

It looks like you are trying to read the cooling_0000x.out file generated by RAMSES.

Note that the first two integers (n1, n2) provide the dimensions of the two dimentional tables (arrays) that follow in the body of the file... So you need to first process those two integers before you know how much real*8 data is in the rest of the file.

scipy should be of help -- it lets you read arbitrary dimensioned binary data:

http://wiki.scipy.org/Cookbook/InputOutput#head-e35c7736718209eea00ebf37a7e1dfb91df696e1

If you already have this python code, please let me know as I was going to write it today (17Sep2014).

Rick