This question is essentially about the suitability of Microsoft's Speech API (SAPI) for server workloads and whether it can be used reliably inside of w3wp for speech synthesis. We have an asynchronous controller that uses uses the native System.Speech
assembly in .NET 4 (not the Microsoft.Speech
one that ships as part of Microsoft Speech Platform - Runtime Version 11) and lame.exe to generate mp3s as follows:
[CacheFilter]
public void ListenAsync(string url)
{
string fileName = string.Format(@"C:\test\{0}.wav", Guid.NewGuid());
try
{
var t = new System.Threading.Thread(() =>
{
using (SpeechSynthesizer ss = new SpeechSynthesizer())
{
ss.SetOutputToWaveFile(fileName, new SpeechAudioFormatInfo(22050, AudioBitsPerSample.Eight, AudioChannel.Mono));
ss.Speak("Here is a test sentence...");
ss.SetOutputToNull();
ss.Dispose();
}
var process = new Process() { EnableRaisingEvents = true };
process.StartInfo.FileName = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, @"bin\lame.exe");
process.StartInfo.Arguments = string.Format("-V2 {0} {1}", fileName, fileName.Replace(".wav", ".mp3"));
process.StartInfo.UseShellExecute = false;
process.StartInfo.RedirectStandardOutput = false;
process.StartInfo.RedirectStandardError = false;
process.Exited += (sender, e) =>
{
System.IO.File.Delete(fileName);
AsyncManager.OutstandingOperations.Decrement();
};
AsyncManager.OutstandingOperations.Increment();
process.Start();
});
t.Start();
t.Join();
}
catch { }
AsyncManager.Parameters["fileName"] = fileName;
}
public FileResult ListenCompleted(string fileName)
{
return base.File(fileName.Replace(".wav", ".mp3"), "audio/mp3");
}
The question is why does SpeechSynthesizer
need to run on a separate thread like that in order to return (this is reported elsewhere on SO here and here) and whether implementing a STAThreadRouteHandler for this request is more-efficient/scalable than the approach above?
Second, what are the options for running SpeakAsync
in an ASP.NET (MVC or WebForms) context? None of the options I've tried seem to work (see update below).
Any other suggestions for how to improve this pattern (i.e. two dependencies that must execute serially to each other but each has async support) are welcome. I don't feel this scheme is sustainable under load, especially considering the known memory leaks in SpeechSynthesizer
. Considering running this service on a different stack all together.
Update:
Neither of the Speak
or SpeakAsnc
options appear to work under the STAThreadRouteHandler
. The former produces:
System.InvalidOperationException: Asynchronous operations are not allowed in this context. Page starting an asynchronous operation has to have the Async attribute set to true and an asynchronous operation can only be started on a page prior to PreRenderComplete event. at System.Web.LegacyAspNetSynchronizationContext.OperationStarted() at System.ComponentModel.AsyncOperationManager.CreateOperation(Object userSuppliedState) at System.Speech.Internal.Synthesis.VoiceSynthesis..ctor(WeakReference speechSynthesizer) at System.Speech.Synthesis.SpeechSynthesizer.get_VoiceSynthesizer() at System.Speech.Synthesis.SpeechSynthesizer.SetOutputToWaveFile(String path, SpeechAudioFormatInfo formatInfo)
The latter results in:
System.InvalidOperationException: The asynchronous action method 'Listen' cannot be executed synchronously. at System.Web.Mvc.Async.AsyncActionDescriptor.Execute(ControllerContext controllerContext, IDictionary`2 parameters)
It seems like a custom STA thread pool (with ThreadStatic
instances of the COM object) is a better approach: http://marcinbudny.blogspot.ca/2012/04/dealing-with-sta-coms-in-web.html
Update #2: It doesn't seem like System.Speech.SpeechSynthesizer
needs STA treatment, seems to run fine on MTA threads so long as you follow that Start/Join
pattern. Here's a new version that is able to correctly use SpeakAsync
(issue there was disposing it prematurely!) and breaks up the WAV generation and the MP3 generation into two separate requests:
[CacheFilter]
[ActionName("listen-to-text")]
public void ListenToTextAsync(string text)
{
AsyncManager.OutstandingOperations.Increment();
var t = new Thread(() =>
{
SpeechSynthesizer ss = new SpeechSynthesizer();
string fileName = string.Format(@"C:\test\{0}.wav", Guid.NewGuid());
ss.SetOutputToWaveFile(fileName, new SpeechAudioFormatInfo(22050,
AudioBitsPerSample.Eight,
AudioChannel.Mono));
ss.SpeakCompleted += (sender, e) =>
{
ss.SetOutputToNull();
ss.Dispose();
AsyncManager.Parameters["fileName"] = fileName;
AsyncManager.OutstandingOperations.Decrement();
};
CustomPromptBuilder pb = new CustomPromptBuilder(settings.DefaultVoiceName);
pb.AppendParagraphText(text);
ss.SpeakAsync(pb);
});
t.Start();
t.Join();
}
[CacheFilter]
public ActionResult ListenToTextCompleted(string fileName)
{
return RedirectToAction("mp3", new { fileName = fileName });
}
[CacheFilter]
[ActionName("mp3")]
public void Mp3Async(string fileName)
{
var process = new Process()
{
EnableRaisingEvents = true,
StartInfo = new ProcessStartInfo()
{
FileName = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, @"bin\lame.exe"),
Arguments = string.Format("-V2 {0} {1}", fileName, fileName.Replace(".wav", ".mp3")),
UseShellExecute = false,
RedirectStandardOutput = false,
RedirectStandardError = false
}
};
process.Exited += (sender, e) =>
{
System.IO.File.Delete(fileName);
AsyncManager.Parameters["fileName"] = fileName;
AsyncManager.OutstandingOperations.Decrement();
};
AsyncManager.OutstandingOperations.Increment();
process.Start();
}
[CacheFilter]
public ActionResult Mp3Completed(string fileName)
{
return base.File(fileName.Replace(".wav", ".mp3"), "audio/mp3");
}