.NET Library to Identify Pitches [closed]

2019-01-21 22:47发布

问题:

I'd like to write a simple program(preferably in C#) to which I sing a pitch using a mic and the program identifies to which musical note that pitch corresponds.


Thank you very much for your prompt responses. I clarify:

I'd like a (preferably .NET) library that would identify the notes I sing. I'd like that such a library:

  1. Identifies a note when I sing(a note from the chromatic scale).
  2. Tells me how much I'm off from the closest note.

I intend to use such a library to sing one note a time.

回答1:

The crucial piece of this problem is the Fast Fourier Transform. This algorithm turns a waveform (your sung note) into a frequency distribution. Once you've computed the FFT you identify the fundamental frequency (usually the frequency with the highest amplitude in the FFT, but this depends somewhat on your microphone's frequency response curve and exactly what type of sound your mic is listening to).

Once you've found the fundamental frequency you need to lookup that frequency in a list that maps frequencies to notes. Here you'll need to deal with the in betweens (so if the fundamental frequency of your sung note is 452Hz what note does that actually respond to, A or A#?).

This guy on CodeProject has an example of FFT in C#. I'm sure there are others out there...



回答2:

You're looking for a frequency estimation or pitch-detection algorithm. Most people suggest finding the maximum value of the FFT, but this is overly simplistic and doesn't work as well as you might think. If the fundamental is missing (a timpani, for instance), or one of the harmonics is larger than the fundamental (a trumpet, for instance), it won't detect the correct frequency. Trumpet spectrum:

Trumpet spectrum http://www.eng.cam.ac.uk/DesignOffice/mdp/electric_web/AC/02284.jpg

Also, you're wasting processor cycles calculating the FFT if you're only looking for a specific frequency. You can use things like the Goertzel algorithm to find tones in a specific frequency band more efficiently.

You really need to find "the first significant frequency" or "the first frequency with strong harmonic components", which is more ambiguous than just finding the maximum.

Autocorrelation or the harmonic product spectrum is better at finding the true fundamental for real instruments, but if the instrument is inharmonic (most are), then the wave shape is changing over time, and I suspect it won't work as well if you try to measure more than a few cycles at a time, which decreases your accuracy.



回答3:

You would usually do a Fourier transform on the input, then identify the most prominent frequency. This might not be the whole story though, since any nonsynthetic sound source produces a number of frequencies (they make up what is described as "tone colour"). Anyway, it can be done efficiently; there are real-time autotuners (you didn't believe that pop starlet could really sing, did you?).



回答4:

Pretty much every answer says to do an FFT. I've written this program myself, and I found that the FFT was good at roughly identifying the strongest frequency, but that there was some "smearing" out as a result -- it's not always easy to precisely identify tiny variations from the target pitch using an FFT, particularly if the sample is short.

Erik Kallen's approach seems reasonable, but there are other approaches. What I found worked fairly well was using a combination of FFT and a simple "zero crossing" detection algorithm to narrow in upon the exact frequency of the signal.

That is, count the number of times the signal crosses the zero line in a given interval, fit that to the rough frequency "bucket" produced by the FFT, and you can get a quite precise result.



回答5:

Performing a Fourier transform will give you values for each frequency found in the sample. The more prominent the frequency, the higher the value. If you look for the largest value, you'll find your root frequency but overtones will also be present.

If you're looking for specific frequency, using the Goertzel algorithm can be very effective.



回答6:

I've done pitch detection in the past, and the simple solution of "take an FFT and look at the peak" doesn't work at all for speech. I had fairly good luck using cepstral analysis . A lot of useful papers can be found in Lawrence Rabiner's publications. I recommend starting with "A comparative performance study of several pitch detection algorithms".

Just as a warning, it probably took me around 30-40 hours of work to get to the point where I could send a wav file into my program and have it spit out a sane number. I was also more interested in the fundamental frequency of a speaker's voice. I'm sure dealing with music will add many more wrinkles.



回答7:

You'll want to capture your raw input, accumulate some samples, and then do an FFT on them. The FFT will convert your samples from time domain to frequency domain, so what it produces is a bit like a histogram of how much energy the signal contained at various frequencies.

Getting from that to "the" frequency may be a bit difficult though -- a human voice is not going to just contain a single, clean frequency of sound. Instead you'll normally have energy at a pretty fair number of different frequencies. What you'll typically do is start from about the lowest voice range, and work your way up, looking for the first (lowest) frequency at which the energy is significantly higher than the background noise.



回答8:

You have to do an FFT of the sample and then analyze that. The two things that will complicate your analysis are:

  1. Overtones. If you sing/play the A at 440 Hz (A4), you will also get a tone at A5 (880Hz), one at E6 (1320 Hz), etc. Depending on the relative intensities at the frequencies, this tone could be perceived as an A4, A5 or E6, and detrimining the tone is not simply a matter of where the most intensity is, the human ear is more complicated than that. You could, however, guess reasonably well that it will be perceived as an A.

  2. Granularity. Your FFT will have a granularity that depends only on the duration of the sample, not on the sampling frequency. If I remember correctly, you need a two-second sample to be able to get a granularity of 1 Hz, which is still a little bit coarse. One way to get around this is to take three frequencies around each spike, approximate a second-degree polynomial around them, and then determine the maximum of that polynomial. I have read a paper claiming that using the phase is more accurate than the amplitude for this, but I don't remember where so I can't quote it.



回答9:

I'm amazed by all the answers here suggesting the use of FFT, given that FFT isn't generally precise enough for pitch detection. It can be, but only with an impractically large FFT window. For example, in order to determine the fundamental with 1/100th of a semi-tone accuracy (which is about what you need for accurate pitch detection) when the fundamental is around concert A (440 Hz), you need an FFT window with 524,288 elements. 1024 is a much more typical FFT size - the computation time become progressively worse the larger the window.

I have to identify the fundamental pitch of WAV files in my software synthesizer (where a "miss" is immediately audible as an out-of-tune instrument) and I've found that autocorrelation does by far the best job. Basically, I iterate through each note in the 12-tone scale over an 8-octave range, compute the frequency and the wavelength of each note, and then perform an autocorrelation using that wavelength as the lag (an autocorrelation is where you measure the correlation between a set of data and the same set of data offset by some lag amount).

The note with the highest autocorrelation score is thus roughly the fundamental pitch. I then "hone in" on the true fundamental by iterating from one semi-tone down to one semi-tone up by 1/1000ths of a semi-tone, to find the local peak autocorrelation value. This method works very accurately, and more importantly it works for a wide variety of instrument files (strings, guitar, human voices etc.).

This process is extremely slow, however, especially for long WAV files, so it could not be used as is for a realtime application. However, if you used FFT to get a rough estimate of the fundamental, and then used autocorrelation to zero in on the true value (and you were content with being less accurate then 1/1000th of a semi-tone, which is absurdly over-accurate) you would have a method which was both relatively fast and extremely accurate.



回答10:

If you just want the result - i,e, to use the software, there is a program called SingAndSee that does just this. It's about £25



回答11:

Since you're dealing with a monophonic source, most of your pitches detected with an FFT should be harmonically related, but you're not really guaranteed that the fundamental is the strongest pitch. For many instruments and some voice registers in fact, it probably won't be. It should be the lowest of the harmonically related (in integer multiples of the fundamental) pitches detected though.



回答12:

Maybe this fully-managed library from codeplex is appropriate for you: Realtime C# Pitch Tracker

Author lists the following advantages of auto-correlation and his algorithm implementation:

  1. Fast. As mentioned above, the algorithm is quite fast. It can easily perform 3000 pitch tests per second.

  2. Accurate. Measured deviation from the actual frequency is less than +-0.02%.

  3. Accurate across a large range of input levels. Because the algorithm uses the ratios of different peaks and not absolute values, it stays accurate over a very wide range of input levels. There is no loss in accuracy across the range from -40dB to 0dB input level.

  4. Accurate across the full frequency range. The accuracy stays high across the full range of detected frequencies, from about 50 Hz to 1.6 kHz. This is due to the interpolation that is applied when calculating samples for the sliding windows.

  5. Accurate with any type of waveform. Unlike a lot of other types of pitch detecting algorithms, this algorithm is essentially unaffected by complex waveforms. This means that it works with male and female voices of any type, as well as other instruments like guitars, etc. The only requirement is that the signal be monophonic, so chords cannot be detected. This pitch detector will work well as a very responsive guitar tuner.

  6. Does not rely on previous results. This algorithm is accurate enough that it does not need to rely on previous results. Each pitch result is a completely new calculated value. Pitch algorithms that track pitch by "locking on" to the pitch suffer from the problem that if they detect the wrong pitch (usually an octave too high or low) they will continue to be wrong for many subsequent tests as well.



回答13:

To convert the Time Domain signal coming from the microphone then you will need either a Discrete Fourier Transform (DFT) or a Fast Fourier Transform (FFT). The FFT will work quicker but the code will be much more complex (a DFT can be done in 5-10 lines of code). Once this is complete you have to map the fundamental frequencies to notes, unfortunately there are several mapping schemes depending on which tuning system you are using. The most common of these is Equal Temperament. Frequencies here. The Wikipedia article on Equal Temprement also gives a background on Equal temperament.

When using any fourier mathematics you need to know about how frequencies are handled, and Ideally perform anti-aliasing filtering before the transform and also watch out for the frequency reflection when performing a transform. Due to Nyquists theorum you will need to sample the microphone content at least twice as quickly as the maximum frequency ie. for a max frequency of 10Hz you must sample at 20Hz.



回答14:

D3D11 contains an FFT implementation