Android Text-To-Speech API Sounds Robotic

2020-02-04 02:05发布

问题:

I'm learning android development for the first time and my goal is to create a simple Hello World application that takes in some text, and reads them out loud.

I've based my code off an example I found and here's my code:

class MainFeeds : AppCompatActivity() {


    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main_feeds)



        card.setOnClickListener{
            Toast.makeText(this, "Hello", Toast.LENGTH_LONG).show()
            TTS(this, "Hello this is leo")
        }
    }

}


class TTS(private val activity: Activity,
          private val message: String) : TextToSpeech.OnInitListener {

          private val tts: TextToSpeech = TextToSpeech(activity, this, "com.google.android.tts")

    override fun onInit(i: Int) {
        if (i == TextToSpeech.SUCCESS) {

            val localeUS = Locale.US

            val result: Int
            result = tts.setLanguage(localeUS)

            if (result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED) {
                Toast.makeText(activity, "This Language is not supported", Toast.LENGTH_SHORT).show()
            } else {
                speakOut(message)
            }

        } else {
            Toast.makeText(activity, "Initilization Failed!", Toast.LENGTH_SHORT).show()
        }
    }

    private fun speakOut(message: String) {
        tts.speak(message, TextToSpeech.QUEUE_FLUSH, null, null)
    }
}

And it works perfectly fine, the issue that I'm running into is that the audio that's coming out of the synthesizer sounds extremely robotic, almost like when I'm using Google Maps and I get disconnected from the internet. Is using the voice Google Assistant leverages some other API that I have to enable?

EDIT: I've tried running the app on my pixel 2xl and it still sounds robotic, as in it's not using the Google Assistant voice.

回答1:

The quality of speech first and foremost comes down to what "speech engine" is being used by the TextToSpeech object that you created:

private val tts: TextToSpeech = TextToSpeech(activity, this)

If you had instead entered:

private val tts: TextToSpeech = TextToSpeech(activity, this, "com.google.android.tts")

...then any device you run that code on will attempt to use the google speech engine... but it will only actually be used if it exists on the device.

Similarly, using "com.samsung.SMT" would attempt to use the Samsung speech engine (which is also high-quality, but usually only installed on Samsung [real] devices).

Whether or not the Google speech engine will be available is not so much dependent on the Android API level of the device (as long as it's recent enough to run the Google engine), but whether or not the actual Google text-to-speech engine is installed on the device at all.

To make sure that the Google engine is installed:

On an Android Studio Emulator:

Create a new emulator and select a system image that has "Google APIs" or "Google Play" in the "target" column.

On a real device:

Go to the Play Store and install the Google speech engine.

TTS on Android (or at least trying to predict its behavior) can be a real beast.

Documentation: Java | Kotlin.



回答2:

I've made a little test program that should answer this question for you.

It shows you a list of all the voices the Google engine has in it, and you click on them and listen to them! Yay!

What it actually does:

  • Initializes a TextToSpeech object using the Google text-to-speech engine if it exists on the device.
  • Lets you choose a specific Voice from a ListView which contains ALL potentially available voices corresponding to the Locale specified in code (in this case, English)... and corresponding to the version of the Google text-to-speech engine you have installed.

This way, you can test all the voices to see if the "Google Assistant" voice you seek is somewhere in there, and if it's not available, you can keep checking as new versions of the Google text-to-speech engine are released. It seems to me that the highest quality voices in this test are both quality:400, and specify that a network connection is required.

NOTES:

  • A voice (especially English) will most likely still "play" even if it is "not installed." This is because when using setVoice(Voice v), the (Google) engine will return a "success" int even if the requested voice is not available(!), as long as it has some other "back-up" voice on hand of the same language. Unfortunately it does all this in the background and still sneakily reports that it's using the same exact voice you requested even if you use getVoice() and compare objects. :(.

  • Generally if a voice says it IS installed, then the voice you are hearing is the voice you requested.

  • For these reasons, you'll want to make sure you're on the internet when you test these voices (so that they auto-install when you request unavailable voices)... and also so that Voices that require a network connection will not "auto-downgrade."

  • You can swipe/refresh the listview of voices in order to check whether voices have been installed yet, or use the system's pull-down menu to watch for downloads... or go into Google's text-to-speech settings in the device system settings.

  • In the list view, the Voice features such as "network required," and "installed," are simply echoes of what the Google engine reports and may not be accurate. :(

  • The maximum possible voice quality specified in the Voice class documentation is 500. In my tests I could only find voices up to quality 400. This may be because I do not have the latest version of Google text-to-speech installed on my test device (and I don't have Play Store access on it in order to update it). If you're using a real device, I suggest installing the latest version of Google TTS using the Google Play Store. You can verify engine version in the Logs. According to Wikipedia, the latest version as of this writing is 3.15.18.200023596. The version on my test device is 3.13.1.

To re-create this test app, make a blank Java project in Android Studio with a minimum API of 21. (getVoices() does not work pre-21).

Manifest:

<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
    package=" [ your.package.name ] "
    android:windowSoftInputMode="stateHidden">

    <application
        android:allowBackup="true"
        android:icon="@mipmap/ic_launcher"
        android:label="@string/app_name"
        android:roundIcon="@mipmap/ic_launcher_round"
        android:supportsRtl="true"
        android:theme="@style/AppTheme">
        <activity android:name=".MainActivity">
            <intent-filter>
                <action android:name="android.intent.action.MAIN" />
                <category android:name="android.intent.category.LAUNCHER" />
            </intent-filter>
        </activity>
    </application>

</manifest>

MainActivity:

package [ your package name ];

import android.content.Intent;
import android.content.pm.PackageManager;
import android.content.pm.ResolveInfo;
import android.graphics.Color;
import android.speech.tts.TextToSpeech;
import android.speech.tts.UtteranceProgressListener;
import android.speech.tts.Voice;
import android.support.v4.widget.SwipeRefreshLayout;
import android.support.v7.app.AppCompatActivity;
import android.os.Bundle;
import android.util.Log;
import android.view.View;
import android.view.inputmethod.InputMethodManager;
import android.widget.AdapterView;
import android.widget.EditText;
import android.widget.ListView;
import android.widget.Spinner;
import android.widget.TextView;

import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Locale;

public class MainActivity extends AppCompatActivity {

    EditText textToSpeak;
    TextView progressView;
    TextToSpeech googleTTS;
    ListView voiceListView;
    SwipeRefreshLayout swipeRefreshLayout;
    Long timeOfSpeakRequest;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        textToSpeak = findViewById(R.id.textToSpeak);
        textToSpeak.setText("Do I sound robotic to you?  1,2,3,4... yabadabadoo.  "
                + "ooo! ahh! la-la-la-la-la!  num-num-dibby-dibby-num-tick-tock...  "
                + "Can I pronounce the word, Antidisestablishmentarianism?  "
                + "Gerp!  My pants are too tight!  "
                + "CODE RED!  CODE RED!  Initiate disassemble!  Ice Cream is cold "
                + "...in my pants.  Exterminate!  exterminate!  Directive 4 is "
                + "classified."
        );
        progressView = findViewById(R.id.progressView);
        voiceListView = findViewById(R.id.voiceListView);
        swipeRefreshLayout = findViewById(R.id.swipeRefresh);


        // Create the TTS and wait until it's initialized to do anything else
        if (isGoogleEngineInstalled()) {
            createGoogleTTS();
        } else {
            Log.i("XXX", "onCreate(): Google not installed -- nothing done.");
        }

    }

    @Override
    protected void onStart() {
        super.onStart();

        swipeRefreshLayout.setOnRefreshListener(new SwipeRefreshLayout.OnRefreshListener() {
            @Override
            public void onRefresh() {
                assignFullSetOfVoicesToVoiceListView();
            }
        });

    }

    // this is where the program really begins (when the TTS is initialized)
    private void onTTSInitialized() {

        setUpWhatHappensWhenAVoiceItemIsClicked();
        setUtteranceProgressListenerOnTheTTS();
        assignFullSetOfVoicesToVoiceListView();

    }

    // FACTORED/EXTRACTED METHODS ----------------------------------------------------------------
    // These are just pulled out to make onCreate() easier to read and the basic sequence
    // of events more obvious.

    private void createGoogleTTS() {

        googleTTS = new TextToSpeech(this, new TextToSpeech.OnInitListener() {
            @Override
            public void onInit(int status) {
                if (status != TextToSpeech.ERROR) {
                    Log.i("XXX", "Google tts initialized");
                    onTTSInitialized();
                } else {
                    Log.i("XXX", "Internal Google engine init error.");
                }
            }
        }, "com.google.android.tts");

    }

    private void setUpWhatHappensWhenAVoiceItemIsClicked() {
        voiceListView.setOnItemClickListener(new AdapterView.OnItemClickListener() {
            @Override
            public void onItemClick(AdapterView<?> parent, View view, int position, long id) {
                Voice desiredVoice = (Voice) parent.getAdapter().getItem(position);
                // if (setting the desired voice is "successful")...
                // in the case of google engine, this does not necessarily mean the voice you
                // want will actually be used. :(
                if (googleTTS.setVoice(desiredVoice) == 0) {
                    Log.i("XXX", "Speech voice set to: " + desiredVoice.toString());
                    // TTS did may "auto-downgrade" voice selection
                    // due to internal reason such as no data
                    // Unfortunately it will not tell you, and there seems to be no
                    // way of checking whether the presently selected voice (getVoice()) "equals"
                    // the desired voice.
                    speak();
                }
            }
        });
    }

    private void setUtteranceProgressListenerOnTheTTS() {

        UtteranceProgressListener blurp = new UtteranceProgressListener() {

            @Override // MIN API 15
            public void onStart(String s) {
                long timeSinceSpeakCall = System.currentTimeMillis() - timeOfSpeakRequest;
                Log.i("XXX", "progress.onStart() callback.  "
                        + timeSinceSpeakCall + " millis since speak() was called.");
                runOnUiThread(new Runnable() {
                    @Override
                    public void run() {
                        progressView.setTextColor(Color.GREEN);
                        progressView.setText("PROGRESS: STARTED");
                    }
                });
            }

            @Override // MIN API 15
            public void onDone(String s) {
                long timeSinceSpeakCall = System.currentTimeMillis() - timeOfSpeakRequest;
                Log.i("XXX", "progress.onDone() callback.  "
                        + timeSinceSpeakCall + " millis since speak() was called.");
                runOnUiThread(new Runnable() {
                    @Override
                    public void run() {
                        progressView.setTextColor(Color.GREEN);
                        progressView.setText("PROGRESS: DONE");
                    }
                });
            }

            // Getting an error can simply mean that the particular voice is not available
            // to the device yet... and still needs to be downloaded / is still downloading
            @Override // MIN API 15 (depracated at API 21)
            public void onError(String s) {
                long timeSinceSpeakCall = System.currentTimeMillis() - timeOfSpeakRequest;
                Log.i("XXX", "progress.onERROR() callback.  "
                        + timeSinceSpeakCall + " millis since speak() was called.");
                runOnUiThread(new Runnable() {
                    @Override
                    public void run() {
                        progressView.setTextColor(Color.RED);
                        progressView.setText("PROGRESS: ERROR");
                    }
                });

            }
        };
        googleTTS.setOnUtteranceProgressListener(blurp);

    }

    // must happens AFTER tts is initialized
    private void assignFullSetOfVoicesToVoiceListView() {

        googleTTS.stop();

        List<Voice> tempVoiceList = new ArrayList<>();

        for (Voice v : googleTTS.getVoices()) {
            if (v.getLocale().getLanguage().contains("en")) { // only English voices
                tempVoiceList.add(v);
            }
        }

        // Sort the list alphabetically by name
        Collections.sort(tempVoiceList, new Comparator<Voice>() {
            @Override
            public int compare(Voice v1, Voice v2) {
                Log.i("XXX", "comparing item");
                return (v2.getName().compareToIgnoreCase(v1.getName()));
            }
        });

        VoiceAdapter tempAdapter = new VoiceAdapter(this, tempVoiceList);

        voiceListView.setAdapter(tempAdapter);
        swipeRefreshLayout.setRefreshing(false);
        progressView.setTextColor(Color.BLACK);
        progressView.setText("PROGRESS: ...");

    }

    private void speak() {
        HashMap<String, String> map = new HashMap<>();
        map.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, "merp");
        timeOfSpeakRequest = System.currentTimeMillis();
        googleTTS.speak(textToSpeak.getText().toString(), TextToSpeech.QUEUE_FLUSH, map);
    }

    // Checks if Google Engine is installed
    // ... (and gives more info in Logs).
    // The version number is going to dictate the quality of voices available
    private boolean isGoogleEngineInstalled() {

        final Intent ttsIntent = new Intent();
        ttsIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);
        final PackageManager pm = getPackageManager();
        final List<ResolveInfo> list = pm.queryIntentActivities(ttsIntent, PackageManager.GET_META_DATA);

        boolean googleIsInstalled = false;

        for (int i = 0; i < list.size(); i++) {

            ResolveInfo resolveInfoUnderScrutiny = list.get(i);
            String engineName = resolveInfoUnderScrutiny.activityInfo.applicationInfo.packageName;

            if (engineName.equals("com.google.android.tts")) {
                String version = "null";
                try {
                    version = pm.getPackageInfo(engineName,
                            PackageManager.GET_META_DATA).versionName;
                } catch (Exception e) {
                    Log.i("XXX", "Error getting google engine verion: " + e.toString());
                }
                Log.i("XXX", "Google engine version " + version + " is installed!");
                googleIsInstalled = true;
            } else {
                Log.i("XXX", "Google Engine is not installed!");
            }

        }
        return googleIsInstalled;
    }
}

VoiceAdapter.java:

package [ your package name ];

import android.content.Context;
import android.graphics.Color;
import android.speech.tts.Voice;
import android.view.LayoutInflater;
import android.view.View;
import android.view.ViewGroup;
import android.widget.BaseAdapter;
import android.widget.TextView;

import java.util.List;

public class VoiceAdapter extends BaseAdapter {

    private Context mContext;
    private LayoutInflater mInflater;
    private List<Voice> mDataSource;

    public VoiceAdapter(Context context, List<Voice> voicesToDisplay) {
        mContext = context;
        mDataSource = voicesToDisplay;
        mInflater = (LayoutInflater) mContext.getSystemService(Context.LAYOUT_INFLATER_SERVICE);
    }

    @Override
    public int getCount() {
        return mDataSource.size();
    }

    @Override
    public Object getItem(int position) {
        return mDataSource.get(position);
    }

    @Override
    public long getItemId(int position) {
        return position;
    }

    @Override
    public View getView(int position, View convertView, ViewGroup parent) {

        // In a real app this method is not efficient,
        // and "View Holder Pattern" shoudl be used instead.
        View rowView = mInflater.inflate(R.layout.list_item_voice, parent, false);

        if (position%2 == 0) {
            rowView.setBackgroundColor(Color.rgb(245,245,245));
        }

        Voice voiceUnderScrutiny = mDataSource.get(position);

        // example output of Voice.toString() :
        // "Voice[Name: pt-br-x-afs#male_2-local, locale: pt_BR, quality: 400, latency: 200,
        // requiresNetwork: false, features: [networkTimeoutMs, notInstalled, networkRetriesCount]]"

        // Get title element
        TextView voiceTitleTextView =
                (TextView) rowView.findViewById(R.id.voice_title);

        TextView qualityTextView =
                (TextView) rowView.findViewById(R.id.voice_quality);

        TextView networkRequiredTextView =
                (TextView) rowView.findViewById(R.id.voice_network);

        TextView isInstalledTextView =
                (TextView) rowView.findViewById(R.id.voice_installed);

        TextView featuresTextView =
                (TextView) rowView.findViewById(R.id.voice_features);

        voiceTitleTextView.setText("VOICE NAME: " + voiceUnderScrutiny.getName());

        // Voice Quality...
        // ( https://developer.android.com/reference/android/speech/tts/Voice.html )
        // 100 = Very Low, 200 = Low, 300 = Normal, 400 = High, 500 = Very High
        qualityTextView.setText(  "QLTY: " + ((Integer) voiceUnderScrutiny.getQuality()).toString()  );
        if (voiceUnderScrutiny.getQuality() == 500) {
            qualityTextView.setTextColor(Color.GREEN); // set v. high quality to green
        }

        if (!voiceUnderScrutiny.isNetworkConnectionRequired()) {
            networkRequiredTextView.setText("NET_REQ?: NO");
        } else {
            networkRequiredTextView.setText("NET_REQ?: YES");
        }

        if (!voiceUnderScrutiny.getFeatures().contains("notInstalled")) {
            isInstalledTextView.setTextColor(Color.GREEN);
            isInstalledTextView.setText("INSTLLD?: YES");
        } else {
            isInstalledTextView.setTextColor(Color.RED);
            isInstalledTextView.setText("INSTLLD?: NO");
        }

        featuresTextView.setText("FEATURES: " + voiceUnderScrutiny.getFeatures().toString());

        return rowView;
    }
}

activity_main.xml:

<?xml version="1.0" encoding="utf-8"?>
<android.support.constraint.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:focusable="true"
    android:focusableInTouchMode="true"
    tools:context=".MainActivity">

    <EditText
        android:id="@+id/textToSpeak"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:layout_marginEnd="8dp"
        android:layout_marginStart="8dp"
        android:layout_marginTop="8dp"
        android:ems="10"
        android:inputType="textPersonName"
        android:text="textToSpeak..."
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="parent" />

    <android.support.v4.widget.SwipeRefreshLayout
        android:id="@+id/swipeRefresh"
        android:layout_width="0dp"
        android:layout_height="0dp"
        android:layout_marginBottom="8dp"
        android:layout_marginEnd="8dp"
        android:layout_marginStart="8dp"
        android:layout_marginTop="8dp"
        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintHorizontal_bias="0.0"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/progressView">

    <ListView
        android:id="@+id/voiceListView"
        android:layout_width="0dp"
        android:layout_height="0dp"
        android:layout_marginBottom="8dp"
        android:layout_marginEnd="8dp"
        android:layout_marginStart="8dp"
        android:layout_marginTop="8dp">

    </ListView>

    </android.support.v4.widget.SwipeRefreshLayout>

    <TextView
        android:id="@+id/progressView"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginEnd="8dp"
        android:layout_marginStart="8dp"
        android:layout_marginTop="8dp"
        android:text="UTTERANCE_PROGRESS:"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/textToSpeak" />

</android.support.constraint.ConstraintLayout>

list_item_voice.xml:

<?xml version="1.0" encoding="utf-8"?>
<android.support.constraint.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="wrap_content"

    android:layout_centerInParent="true"
    android:paddingBottom="8dp"
    android:paddingLeft="16dp"
    android:paddingRight="16dp"
    android:paddingTop="8dp"
    >

    <TextView
        android:id="@+id/voice_title"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginEnd="8dp"
        android:layout_marginStart="8dp"
        android:layout_marginTop="8dp"
        android:text="NAME:"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="parent" />


    <TextView
        android:id="@+id/voice_installed"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginTop="8dp"
        android:fontFamily="monospace"
        android:text="INSTALLED? "
        android:textAlignment="textStart"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintHorizontal_bias="0.5"
        app:layout_constraintStart_toEndOf="@+id/voice_network"
        app:layout_constraintTop_toBottomOf="@+id/voice_title" />

    <TextView
        android:id="@+id/voice_quality"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginEnd="8dp"
        android:layout_marginStart="8dp"
        android:layout_marginTop="8dp"
        android:text="QUALITY:"
        app:layout_constraintEnd_toStartOf="@+id/voice_network"
        app:layout_constraintHorizontal_bias="0.5"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/voice_title" />

    <TextView
        android:id="@+id/voice_features"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginBottom="8dp"
        android:layout_marginEnd="8dp"
        android:layout_marginStart="8dp"
        android:layout_marginTop="8dp"
        android:text="FEATURES:"
        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/voice_quality" />

    <TextView
        android:id="@+id/voice_network"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginEnd="8dp"
        android:layout_marginStart="8dp"
        android:layout_marginTop="8dp"
        android:text="NET_REQUIRED?"
        app:layout_constraintEnd_toStartOf="@+id/voice_installed"
        app:layout_constraintHorizontal_bias="0.5"
        app:layout_constraintStart_toEndOf="@+id/voice_quality"
        app:layout_constraintTop_toBottomOf="@+id/voice_title" />

</android.support.constraint.ConstraintLayout>