There are several similar questions but none of them answers this simple question directly:
How can i catch a commands output and stream that content into numpy arrays without creating a temporary string object to read from?
So, what I would like to do is this:
import subprocess
import numpy
import StringIO
def parse_header(fileobject):
# this function moves the filepointer and returns a dictionary
d = do_some_parsing(fileobject)
return d
sio = StringIO.StringIO(subprocess.check_output(cmd))
d = parse_header(sio)
# now the file pointer is at the start of data, parse_header takes care of that.
# ALL of the data is now available in the next line of sio
dt = numpy.dtype([(key, 'f8') for key in d.keys()])
# i don't know how do make this work:
data = numpy.fromxxxx(sio , dt)
# if i would do this, I create another copy besides the StringIO object, don't I?
# so this works, but isn't this 'bad' ?
datastring = sio.read()
data = numpy.fromstring(datastring, dtype=dt)
I tried it with StringIO and cStringIO but both are not accepted by numpy.frombuffer and numpy.fromfile.
Using StringIO object I first have to read the stream into a string and then use numpy.fromstring, but I would like to avoid creating the intermediate object (several Gigabytes).
An alternative for me would be if I can stream sys.stdin into numpy arrays, but that does not work with numpy.fromfile either (seek needs to be implemented).
Are there any work-arounds for this? I can't be the first one trying this (unless this is a PEBKAC case?)
Solution: This is the current solution, it's a mix of unutbu's instruction how to use the Popen with PIPE and the hint of eryksun to use bytearray, so I don't know who to accept!? :S
proc = sp.Popen(cmd, stdout = sp.PIPE, shell=True)
d = parse_des_header(proc.stdout)
rec_dtype = np.dtype([(key,'f8') for key in d.keys()])
data = bytearray(proc.stdout.read())
ndata = np.frombuffer(data, dtype = rec_dtype)
I didn't check if the data is really not creating another copy, don't know how. But what I noticed that this works much faster than everything I tried before, so many thanks to both the answers' authors!