I need to do something like this post, but I need to create a subprocess that can be given input and give output many times. The accepted answer of that post has good code...
from subprocess import Popen, PIPE, STDOUT
p = Popen(['grep', 'f'], stdout=PIPE, stdin=PIPE, stderr=STDOUT)
grep_stdout = p.communicate(input=b'one\ntwo\nthree\nfour\nfive\nsix\n')[0]
print(grep_stdout.decode())
# four
# five
...that I would like to continue like this:
grep_stdout2 = p.communicate(input=b'spam\neggs\nfrench fries\nbacon\nspam\nspam\n')[0]
print(grep_stdout2.decode())
# french fries
But alas, I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/subprocess.py", line 928, in communicate
raise ValueError("Cannot send input after starting communication")
ValueError: Cannot send input after starting communication
The proc.stdin.write() method not enable you to collect output, if I understand correctly. What is the simplest way to keep the lines open for ongoing input/output?
Edit: ====================
It looks like pexpect
is a useful library for what I am trying to do, but I am having trouble getting it to work. Here is a more complete explanation of my actual task. I am using hfst
to get grammar analyses of individual (Russian) words. The following demonstrates its behavior in a bash shell:
$ hfst-lookup analyser-gt-desc.hfstol
> слово
слово слово+N+Neu+Inan+Sg+Acc 0.000000
слово слово+N+Neu+Inan+Sg+Nom 0.000000
> сработай
сработай сработать+V+Perf+IV+Imp+Sg2 0.000000
сработай сработать+V+Perf+TV+Imp+Sg2 0.000000
>
I want my script to be able to get the analyses of one form at a time. I tried code like this, but it is not working.
import pexpect
analyzer = pexpect.spawnu('hfst-lookup analyser-gt-desc.hfstol')
for newWord in ['слово','сработай'] :
print('Trying', newWord, '...')
analyzer.expect('> ')
analyzer.sendline( newWord )
print(analyzer.before)
# trying слово ...
#
# trying сработай ...
# слово
# слово слово+N+Neu+Inan+Sg+Acc 0.000000
# слово слово+N+Neu+Inan+Sg+Nom 0.000000
#
#
I obviously have misunderstood what pexpect.before
does. How can I get the output for each word, one at a time?
Popen.communicate()
is a helper method that does a one-time write of data to stdin
and creates threads to pull data from stdout
and stderr
. It closes stdin
when its done writing data and reads stdout
and stderr
until those pipes close. You can't do a second communicate
because the child has already exited by the time it returns.
An interactive session with a child process is quite a bit more complicated.
One problem is whether the child process even recognizes that it should be interactive. In the C libraries that most command line programs use for interaction, programs run from terminals (e.g., a linux console or "pty" pseudo-terminal) are interactive and flush their output frequently, but those run from other programs via PIPES are non-interactive and flush their output infrequently.
Another is how you should read and process stdout
and stderr
without deadlocking. For instance, if you block reading stdout
, but stderr
fills its pipe, the child will halt and you are stuck. You can use threads to pull both into internal buffers.
Yet another is how you deal with a child that exits unexpectedly.
For "unixy" systems like linux and OSX, the pexpect
module is written to handle the complexities of an interactive child process. For Windows, there is no good tool that I know of to do it.
This answer should be attributed to @J.F.Sebastian. Thanks for the comments!
The following code got my expected behavior:
import pexpect
analyzer = pexpect.spawnu('hfst-lookup analyser-gt-desc.hfstol')
analyzer.expect('> ')
for word in ['слово', 'сработай']:
print('Trying', word, '...')
analyzer.sendline(word)
analyzer.expect('> ')
print(analyzer.before)
Whenever you want to send input to the process, use proc.stdin.write()
. Whenever you want to get output from the process, use proc.stdout.read()
. Both stdin
and stdout
arguments to the constructor need to be set to PIPE
.
HFST has Python bindings: https://pypi.python.org/pypi/hfst
Using those should avoid the whole flushing issue, and will give you a cleaner API to work with than parsing the string output from pexpect.
From the Python REPL, you can get some doc's on the bindings with
dir(hfst)
help(hfst.HfstTransducer)
or read https://hfst.github.io/python/3.12.2/QuickStart.html
Snatching the relevant parts of the docs:
istr = hfst.HfstInputStream('hfst-lookup analyser-gt-desc.hfstol')
transducers = []
while not (istr.is_eof()):
transducers.append(istr.read())
istr.close()
print("Read %i transducers in total." % len(transducers))
if len(transducers) == 1:
out = transducers[0].lookup_optimize("слово")
print("got %s" % (out,))
else:
pass # or handle >1 fst in the file, though I'm guessing you don't use that feature