I'm making a call to a program from the shell using the subprocess module that outputs a binary file to STDOUT.
I use Popen() to call the program and then I want to pass the stream to a function in a Python package (called "pysam") that unfortunately cannot Python file objects, but can read from STDIN. So what I'd like to do is have the output of the shell command go from STDOUT into STDIN.
How can this be done from within Popen/subprocess module? This is the way I'm calling the shell program:
p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True).stdout
This will read "my_cmd"'s STDOUT output and get a stream to it in p. Since my Python module cannot read from "p" directly, I am trying to redirect STDOUT of "my_cmd" back into STDIN using:
p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE, shell=True).stdout
I then call my module, which uses "-" as a placeholder for STDIN:
s = pysam.Samfile("-", "rb")
The above call just means read from STDIN (denoted "-") and read it as a binary file ("rb").
When I try this, I just get binary output sent to the screen, and it doesn't look like the Samfile() function can read it. This occurs even if I remove the call to Samfile, so I think it's my call to Popen that is the problem and not downstream steps.
EDIT: In response to answers, I tried:
sys.stdin = subprocess.Popen(tagBam_cmd, stdout=subprocess.PIPE, shell=True).stdout
print "Opening SAM.."
s = pysam.Samfile("-","rb")
print "Done?"
sys.stdin = sys.__stdin__
This seems to hang. I get the output:
Opening SAM..
but it never gets past the Samfile("-", "rb") line. Any idea why?
Any idea how this can be fixed?
EDIT 2: I am adding a link to Pysam documentation in case it helps, I really cannot figure this out. The documentation page is:
http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/usage.html
and the specific note about streams is here:
http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/usage.html#using-streams
In particular:
"""
Pysam does not support reading and writing from true python file objects, but it does support reading and writing from stdin and stdout. The following example reads from stdin and writes to stdout:
infile = pysam.Samfile( "-", "r" )
outfile = pysam.Samfile( "-", "w", template = infile )
for s in infile: outfile.write(s)
It will also work with BAM files. The following script converts a BAM formatted file on stdin to a SAM formatted file on stdout:
infile = pysam.Samfile( "-", "rb" )
outfile = pysam.Samfile( "-", "w", template = infile )
for s in infile: outfile.write(s)
Note, only the file open mode needs to changed from r to rb.
"""
So I simply want to take the stream coming from Popen, which reads stdout, and redirect that into stdin, so that I can use Samfile("-", "rb") as the above section states is possible.
thanks.
I'm a little confused that you see binary on stdout if you are using stdout=subprocess.PIPE
, however, the overall problem is that you need to work with sys.stdin
if you want to trick pysam into using it.
For instance:
sys.stdin = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True).stdout
s = pysam.Samfile("-", "rb")
sys.stdin = sys.__stdin__ # restore original stdin
UPDATE: This assumed that pysam is running in the context of the Python interpreter and thus means the Python interpreter's stdin when "-" is specified. Unfortunately, it doesn't; when "-" is specified it reads directly from file descriptor 0.
In other words, it is not using Python's concept of stdin (sys.stdin) so replacing it has no effect on pysam.Samfile(). It also is not possible to take the output from the Popen call and somehow "push" it on to file descriptor 0; it's readonly and the other end of that is connected to your terminal.
The only real way to get that output onto file descriptor 0 is to just move it to an additional script and connect the two together from the first. That ensures that the output from the Popen in the first script will end up on file descriptor 0 of the second one.
So, in this case, your best option is to split this into two scripts. The first one will invoke my_cmd and take the output of that and use it for the input to a second Popen of another Python script that invokes pysam.Samfile("-", "rb").
In the specific case of dealing with pysam, I was able to work around the issue using a named pipe (http://docs.python.org/library/os.html#os.mkfifo), which is a pipe that can be accessed like a regular file. In general, you want the consumer (reader) of the pipe to listen before you start writing to the pipe, to ensure you don't miss anything. However, pysam.Samfile("-", "rb") will hang as you noted above if nothing is already registered on stdin.
Assuming you're dealing with a prior computation that takes a decent amount of time (e.g. sorting the bam before passing it into pysam), you can start that prior computation and then listen on the stream before anything gets output:
import os
import tempfile
import subprocess
import shutil
import pysam
# Create a named pipe
tmpdir = tempfile.mkdtemp()
samtools_prefix = os.path.join(tmpdir, "namedpipe")
fifo = samtools_prefix + ".bam"
os.mkfifo(fifo)
# The example below sorts the file 'input.bam',
# creates a pysam.Samfile object of the sorted data,
# and prints out the name of each record in sorted order
# Your prior process that spits out data to stdout/a file
# We pass samtools_prefix as the output prefix, knowing that its
# ending file will be named what we called the named pipe
subprocess.Popen(["samtools", "sort", "input.bam", samtools_prefix])
# Read from the named pipe
samfile = pysam.Samfile(fifo, "rb")
# Print out the names of each record
for read in samfile:
print read.qname
# Clean up the named pipe and associated temp directory
shutil.rmtree(tmpdir)
If your system supports it; you could use /dev/fd/#
filenames:
process = subprocess.Popen(args, stdout=subprocess.PIPE)
samfile = pysam.Samfile("/dev/fd/%d" % process.stdout.fileno(), "rb")