I have a text file with one sentence per line. I would like to lemmatize the worlds in each line using hunspell (-s option). Since I want to have the lemmas of each line separately, it wouldn't make sense to submit the whole text file to hunspell. I do need to send one line after another and have the hunspell output for each line.
Following the answers from How to process input and output streams in Steel Bank Common Lisp?, I was able to send the whole text file for hunspell one line after another but I was not able to capture the output of hunspell for each line. How interact with the process sending the line and reading the output before send another line?
My current code to read the whole text file is
(defun parse-spell-sb (file-in)
(with-open-file (in file-in)
(let ((p (sb-ext:run-program "/opt/local/bin/hunspell" (list "-i" "UTF-8" "-s" "-d" "pt_BR")
:input in :output :stream :wait nil)))
(when p
(unwind-protect
(with-open-stream (o (process-output p))
(loop
:for line := (read-line o nil nil)
:while line
:collect line))
(process-close p))))))
Once more, this code give me the output of hunspell for the whole text file. I would like to have the output of hunspell for each input line separately.
Any idea?
I suppose you have a buffering problem with the program you want to run. For example:
Now, on my system, this will work with
cat
:Notice the
finish-output
– without this, the read will hang. (There's alsoforce-output
.)Python in interactive mode will work, too:
But if you try this without the
-i
option (or similar options like-u
), you'll probably be out of luck, because of the buffering going on. For example, on my system, reading fromtr
will hang:Since
tr
doesn't provide a switch to turn off buffering, we'll wrap the call with a pty wrapper (in this caseunbuffer
from expect):So, long story short: Try using
finish-output
on the stream before reading. If that doesn't work, check for command line options preventing buffering. If it still doesn't work, you could try wrapping the programm in some kind of pty-wrapper.