-->

How to programmatically send a unix socket command

2019-02-22 02:53发布

问题:

Presently the implementation of the Web Speech API Specification by Chromium and Firefox does not support parsing Speech Synthesis Markup Language (SSML) when SSML is set at text property of SpeechSyntheisUtterance instance and passed to window.speechSynthesis.speak() call; see SSML parsing implementation at browsers; 5.2.3 SpeechSynthesisUtterance Attributes; How to set options of commands called by browser?.

Chromium source code for the unix socket connect to speech-dispatcher connection appears to be at /src/chrome/browser/speech/tts_linux.cc

  {
    // spd_open has memory leaks which are hard to suppress.
    // http://crbug.com/317360
    ANNOTATE_SCOPED_MEMORY_LEAK;
    conn_ = libspeechd_loader_.spd_open(
        "chrome", "extension_api", NULL, SPD_MODE_THREADED);
  }

which appears to be reflected at /run/user/1000/speech-dispatcher/log

speechd: Updating client specific settings "linux:chrome:extension_api"

Chromium source code at /src/third_party/speech-dispatcher/libspeechd.h

appears to define the SSML_DATA_MODE described at speech-dispatcher documentation

The speech-dispatcher documentation states that the user configuration file can be used to set parameters for specific clients

4.1.6 Parameter Settings Commands

The following parameter setting commands are available. For configuration and history clients there are also functions for setting the value for some other connection and for all connections. They are listed separately below.

C API function: int spd_set_data_mode(SPDConnection *connection, SPDDataMode mode) Set Speech Dispatcher data mode. Currently, plain text and SSML are supported. SSML is especially useful if you want to use index marks or include changes of voice parameters in the text.

mode is the requested data mode: SPD_DATA_TEXT or SPD_DATA_SSML.

SPD_DATA_SSML is not set to on at the establishment of the SSIP connection from Chromium to speech-dispatcher, for example as demonstrated by @xmash at How to use Index Marks in "speech-dispatcher"?

spd_execute_command_wo_mutex( m_connection, "SET SELF SSML_MODE on" );

nor is it possible to pass options to the default speech synthesis module, m for espeak or -x for spd-say.

With LogLevel set to 4 or 5 /run/user/1000/speech-dispatcher/log lists the communication between Chromium (client) and speech-dispatcher

speechd:    Module set parameters

(server) which can also be viewed at stdout using the PID within /run/user/1000/speech-dispatcher/pid and strace, see Is there a way to intercept interprocess communication in Unix/Linux?

$ sudo strace -ewrite -p $PID

write(22, "216 OK OUTPUT MODULE SET\r\n", 26) = 26

There does not appear to be an option to set SSML parsing to on from either speechd.conf or espeak.conf following running

$ spd-conf -u

While attempting to parse SSML using JavaScript at SpeechSynthesisSSMLParser encountered a bug at Chromium when trying to parse <break> element, where it is not clear whether spd-say is called or the default output module, e.g., espeak is run when window.speechSynthesis.speak() is called by the browser; see /src/out/Debug/gen/library_loaders/libspeechd.h.

Created an approach to use php to call espeak using shell_exec() which returns the expected result

// JavaScript
async function SSMLStream({ssml="", options=""}) {
  const fd = new FormData();
  fd.append("ssml", ssml);
  fd.append("options", options);

  const request = await fetch("speak.php", {method:"POST", body:fd});
  const response = await request.arrayBuffer();
  return response;
}

let ssml = `<speak version="1.0" xml:lang="en-US"> 
             Here are <say-as interpret-as="characters">SSML</say-as> samples. 
             Hello universe, how are you today? 
             Try a date: <say-as interpret-as="date" format="dmy" detail="1">10-9-1960</say-as> 
             This is a <break time="2500ms" /> 2.5 second pause. 
             This is a <break /> sentence break</prosody> <break />
             <voice name="us-en+f3" rate="x-slow" pitch="0.25">espeak using</voice> 
             PHP and <voice name="en-us+f2"> <sub alias="JavaScript">JS</sub></voice>
           </speak>`;

SSMLStream({ssml, options:"-v en-us+f1"})
.then(async(data) => {

    let context = new AudioContext();
    let source = context.createBufferSource();
    source.buffer = await context.decodeAudioData(data);
    source.connect(context.destination);
    source.start()

})
// PHP
<?php 
  if(isset($_POST["ssml"])) {
    header("Content-Type: audio/x-wav");
    $options = $_POST["options"];
    echo shell_exec("espeak -m --stdout " . $options . " '" . $_POST["ssml"] . "'");
  };

Requirement:

Parse the SSML set at text property of SpeechSynthesisUtterance using the existing capabilities of the native program called to convert text to speech by speech-dispatcher output module using default browser capabilities.

Questions:

1) How to programmatically listen for the the PID when speech-dispatcher --spawn-communication-method unix_socket --socket-path /run/user/1000/speech-dispatcher/speechd.sock is called by Chromium browser, then call spd_execute_command_wo_mutex or spd_execute_command_wo_mutex to the speech-dispatcher server using the established unix socket connection as client (Chromium) with "SET SELF SSML_MODE on" as second parameter to turn on SSML parsing for all calls to window.speechSynthesis.speak() at Chromium browser?

2) If 1) is not possible, what needs to be adjusted at Chromium source code to turn on SSML parsing for the unix socket connection, e.g., at tools/generate_library_loader/generate_library_loader.py?

3) If 1) and 2) are not viable options, how to convert the JavaScript and PHP code into C++ code in the format used by Chromium browser; and how to build Chromium with the patch included; for the purpose of exposing a speak function with accepts parameters which can be passed to a native speech synthesis application where SSML is parsed and the resulting audio output is returned to JavaScript caller as an ArrayBuffer?

4) If options other than 1), 2) and 3) are available and capable of meeting requirement how can we resolve the inquiry programmatically; without having to start a local server manually at terminal?