I am fairly new to speech processing, but wondering how homophones are detected. I am in search for an API which gives similarity between two words on the basis of how they are pronounced.
for example: "to" and "two" are highly similar in terms of how they sound with respect to say "to" and "from".
You might want to try calculating the edit distance not on the original strings, but on pronunciations, like they are available in the CMU Pronouncing Dictionary at http://www.speech.cs.cmu.edu/cgi-bin/cmudict
The following are used for indexing words by their English pronunciation Soundex or Metaphone. You can use python packages like Fuzzy that implement several indexing algorithms.