Voice matching in Android [closed]

2019-01-24 20:55发布

问题:

Is there any way we can do voice matching in Android? Take the below scenario.

  1. User "A" speak something in the app and record it in phone via the app.
  2. User "B" speak something in the app and record it in phone via the app.
  3. User "C" speak something in the app and record it in phone via the app.
  4. After all of these recordings, user "A" come and speak to the app. Since his voice is already recorded, app identifies this is user "A".

Or else something like this..

  1. User "A" speak the word "House" in the app and record it in phone via the app.
  2. User "B" speak the word "House" in the app and record it in phone via the app.
  3. User "C" speak the word "House" in the app and record it in phone via the app.
  4. After all of these recordings, user "A" come and speak the word "House" to the app. Since his voice is already recorded, app identifies this is user "A".

Is this is possible in Android? Which method is possible? I haven't seen any built in libraries for this, but is there any way around?

回答1:

You may want to check Recognito that does text independent speaker recognition in Java

It's a FOSS lib licensed under Apache 2.0

https://github.com/amaurycrickx/recognito

disclaimer: I'm the author :-)

It has a light dependency on Oracle's javax.sound for file handling but it should be straightforward to remove this dependency from the main Recognito class (a few methods to discard: look for "file" in params and hit del)

I'm not aware of any other FOSS alternatives that would be Android compatible without modifications

There's plenty of javadoc, the code should be straightforward.

The one thing you'll wonder is how to create the double[] with values between -1.0 and 1.0 For a start you may want to look at FileHelper class which does just that with a 16bit PCM encoded file.

Please note a single word won't suffice to extract a good vocal print and to recognize the user afterwards

For the process, I'd say use a phrase repeated 3 times to build an averaged vocal print. Use the same phrase at recognition time.

The lib is text independent but it will help to use the same phrase if you need to keep the recording short. If you want it truly text independent (user says anything and gets recognized), you'll need longer vocal samples.

HTH