Using multiple genfromtxt on a single file

2019-06-24 00:31发布

问题:

I'm fairly new to Python and am currently having problems with handling my input file reads. Basically I want my code to take an input file, where the relevant info is contained in blocks of 4 lines. For my specific purpose, I only care about the info in lines 1-3 of each block.

A two-block example of the input I'm dealing with, looks like:

#Header line 1
#Header line 2
'Mn 1',       5130.0059,  -2.765,  5.4052,  2.5,  7.8214,  1.5, 1.310, 2.390, 0.500, 8.530,-5.360,-7.630,
'  LS                                                                       3d6.(5D).4p z6F*'
'  LS                                                                       3d6.(5D).4d e6F'
'K07           A   Kurucz MnI 2007    1 K07       1 K07       1 K07       1 K07       1 K07       1 K07       1 K07       1 K07       1 K07     Mn            '
'Fe 2',       5130.0127,  -5.368,  7.7059,  2.5, 10.1221,  2.5, 1.030, 0.860, 0.940, 8.510,-6.540,-7.900,
'  LS                                                                     3d6.(3F2).4p y4F*'
'  LS                                                                           3d5.4s2 2F2'
'RU                Kurucz FeII 2013   4 K13       5 RU        4 K13       4 K13       4 K13       4 K13       4 K13       4 K13       4 K13     Fe+           '

I would prefer to store the info from each of these three lines in separate arrays. Since the entries are a mix of strings and floats, I'm using Numpy.genfromtxt to read the input file, as follows:

import itertools
import numpy as np

with open(input_file) as f_in:
  #Opening file, reading every fourth line starting with line 2.
  data = np.genfromtxt(itertools.islice(f_in,2,None,4),dtype=None,delimiter=",")
  #Storing lower transition designation:
  low = np.genfromtxt(itertools.islice(f_in,3,None,4),dtype=str)
  #Storing upper transition designation:
  up = np.genfromtxt(itertools.islice(f_in,4,None,4),dtype=str)

Upon executing the code, genfromtxt correctly reads the information from the file the first time. However, for the second and third call to genfromtxt, I get the following warning

UserWarning: genfromtxt: Empty input file: "<itertools.islice object at 0x102d7a1b0>"
warnings.warn('genfromtxt: Empty input file: "%s"' % fname)

Whereas this is only a warning, the arrays returned by the second and third call of genfromtxt are empty, and not containing strings as expected. If I comment out the second and third call of genfromtxt, the code behaves as expected.

As far as I understand, the above should be working, and I'm a bit at a loss as to why it doesn't. Ideas?

回答1:

After the first genfromtext (well, really the islice), the file iterator has reached the end of the file. Thus the warnings and empty arrays: the second two islice calls are using an empty iterator.

You'll want to read the file into memory line-by-line with f_in.readlines() as in hpaulj's answer, or add f_in.seek(0) before your subsequent reads, to reset the file pointer back to the beginning of the input. This is a slightly more memory-friendly solution, which could be important if those files are really huge.

# Note: Untested code follows
with open(input_file) as f_in:
    np.genfromtxt(itertools.islice(f_in,2,None,4),dtype=None,delimiter=",")

    f_in.seek(0)  # Set the file pointer back to the beginning
    low = np.genfromtxt(itertools.islice(f_in,3,None,4),dtype=str)

    f_in.seek(0)  # Set the file pointer back to the beginning
    up = np.genfromtxt(itertools.islice(f_in,4,None,4),dtype=str)


回答2:

Try this:

with open(input_file) as f_in:
  #Opening file, reading every fourth line starting with line 2.
  lines = f_in.readlines()
  data = np.genfromtxt(lines[2::4],dtype=None,delimiter=",")
  #Storing lower transition designation:
  low = np.genfromtxt(lines[3::4],dtype=str)
  #Storing upper transition designation:
  up = np.genfromtxt(lines[4::4],dtype=str)

I haven't used islice much, but the itertools tend to be generators, which iterate through to the end. You have to be careful when calling them repeatedly. You might be able to make islice work with tee or repeat. But the simplest, I think is to get a list of lines, and selected the relevant ones with ordinary indexing.

Example with tee:

with open('myfile.txt') as f:
    its = itertools.tee(f,2)
    print(list(itertools.islice(its[0],0,None,2)))
    print(list(itertools.islice(its[1],1,None,2)))

Now the file is read once, but can be iterated through twice.