Text-to-Speech in Emacs

2019-03-14 06:02发布

问题:

I'm not blind, I just want to have a way to have my Windows machine read the contents of a buffer outloud. Here are the basic requirements:

  • Read any English text buffer.
  • Pause the reading at any time and resume at any time (not wait a few minutes for a big buffer to finish when someone walks into my office).
  • Adjust the read-back speed at play-time.
  • Highlight the text currently being read (optional)

I found a couple of possible solutions:

  • Emacspeak: Designed for the blind. Looks like a stand-alone program, not an Emacs plug-in
  • festival.el: Requires Festival. I can't find Windows Binaries for Festival. Anyone have them?
  • I could also write my own. Text-To-Speech (TTS) libraries are a plenty these days. The interactive pause feature may be the biggest trick, but there must be some libraries that can do it.

Which option is the best plan? I don't want a week-long project here. Compiling Festival in Windows has been a painful experiment. Emacspeak looks like overkill for what I want.

回答1:

Festival for Windows is available here. I can't guarantee that festival.el will work with these binaries. I do have experience working with these binaries, though, so if you have problems getting them to work outside of Emacs, I may be able to help.

I don't think you will have control over playback speed with festival, though I could be mistaken. As far as retaining control over it, I'd say your best bet is to program it so that it is only sending small portions at a time to festival. Otherwise, there really isn't any way to prevent it from reading until done.

Basically, I don't think that there is anything out there that would meet your minimum requirements without some work.

Edit: after looking back over your requirements, I'd say the best approach would be to hack festival.el to send a sentence at a time to Festival. Then you can program a keystroke that will kill it, so that it will only finish the current sentence. At the same time, your script could highlight the sentence that is currently being sent to Festival.



回答2:

I have a simple solution based on the Python pyttsx module. This starts a python script as an emacs process and sends it strings to be read out.

(defvar tts nil "text to speech process")

(defun tts-up ()
  (interactive)
  (and (not (null tts))
       (eq (process-status tts) 'run)))

(defun tts-start ()
  (interactive)
  (if (not (tts-up))
      (setq tts
            (start-process "tts-python"
                           "*tts-python*"
                           "python" "speak.py"))))

(defun tts-end ()
  (interactive)
  (delete-process tts)
  (setq tts nil))

(defun tts-say (text)
  (interactive)
  (tts-start)
  (process-send-string tts (concat text "\n")))

The python file speak.py:

import pyttsx

engine = pyttsx.init()

def say(data):
    engine.say(data)
    engine.runAndWait()

while True:
    say(raw_input())