Given an audio , I want to calculate the pace of the speech. i.e how fast or slow is it.
Currently I am doing the following:
- convert speech to text and obtaining a transcript (using a free tool).
- count number of words in transcript.
- calculate length or duration of file.
- finally, pace = (number of words in transcript / duration of file)
.
However the accuracy of the pace obtained is dependent purely on transcription , which I think is an unnecessary step.
Is there any python-library/sox/ffmpeg way that will enable me to
- to calculate, in a straightforward way,the speed/pace of talk in an audio
- dominant Pitches/tones of that audio?
I referred : I referred : http://sox.sourceforge.net/sox.html and https://digitalcardboard.com/blog/2009/08/25/the-sox-of-silence/
Your method sounds interesting as a quick first-order approximation, but limited by the transcript resolution. You can analyze directly the audio file.
I'm not familiar with Sox, but from their manual seems like the stat option gives "... time and frequency domain statistical information about the audio"
Sox claims to be a "Swiss Army knife of audio manipulation", and just by skimming through their docs seems like it might suit you to find the general tempo.
If you want to run pitch analysis too, then you can develop your own algorithm with python - I recently used librosa and found it very useful and well documented.