Audio analysis: frequency vs pitch [closed]

2019-01-24 16:28发布

问题:

I'm designing a simple tuner, so my target is displaying a note name (A, B, F#) and the distance in cents between the theoretic sound and the actual input.

I'm completely new to audio and signal processing, so I did some research and I found a thing called Fast Fourier Transform that will analyze the bytes and will give me the frequency. Also, I found a couple of Java libraries like common math and JTransforms so I won't write the hard code myself.

I believed that's all, since each range frequency can be directly mapped to a note in the equal temperament, but then I found this new (to me) word called pitch: it's said to be tightly related to frequency, but is not exactly the same thing and is much more difficult to get, and belongs to that psychoacoustic area.

So my question is, can somebody clearly outline the differences between pitch and frequency and maybe tell me which a tuner deals with?

回答1:

Frequency is simply the number of oscillations that a wave goes through per second. Any wave which is periodic has a frequency. But usually in music, use of the term is limited to talking about sine waves, so if you hear something about a wave of frequency x, it usually means a sine wave with that many oscillations per second.

Any arbitrary wave, whether periodic or not, can be constructed by adding up sine waves of various frequencies in varying amounts (that is with various amplitudes). What the Fourier transform does is tell you which frequencies to use, and with which amplitudes, to create any given wave. The fast Fourier transform (FFT) is a particular algorithm that computes the Fourier transform of a wave, given the data representing the amplitude of the wave as a function of time.

When you hear a musical note played by an instrument, it doesn't consist of just a single frequency. Instead, what you get is a combination of different multiples of a fundamental frequency, in different amounts. For example, a flute playing a particular note might produce a combination of

  • 440 Hz with amplitude 1
  • 1320 Hz with amplitude 1/2
  • 2200 Hz with amplitude 1/3

and so on. On the other hand, a trumpet playing the same note might produce a combination of

  • 440 Hz with amplitude 1
  • 880 Hz with amplitude 1/2
  • 1320 Hz with amplitude 1/4
  • 1760 Hz with amplitude 1/8

and so on. (Those are not the actual relative amplitudes for those instruments; I just made up some example numbers) So in your tuner application, when you run the FFT on incoming data, you will find multiple peaks in the output at various frequencies, depending on which instrument is being tuned. The point is that the output of the FFT will not just be a number; it won't just tell you "this instrument is playing a note at 440 Hz."

Now we get to pitch, which is a slightly more nebulous concept. The pitch of a note is basically what a person actually hears when exposed to that note. For many instruments, the pitch is correlated to the fundamental frequency being emitted by the instrument. But depending on the relative amplitudes of the higher frequencies, a person might perceive two instruments to have different pitches even if they are actually playing the same note.

Fortunately, if you're just making a simple tuner, you don't have to worry about pitch at all. The point of a tuner is to minimize beats between different instruments, and beats are caused by the actual frequencies, not the perceived pitches. A trumpet and a flute both playing with a fundamental frequency of 440 Hz will not exhibit beats because the differences between all their frequencies are multiples of 440 Hz, even if the untrained ear might think one of them is higher-pitched than the other.



回答2:

Pitch is about the periodicity of the signal. It's true that it's based on psychoacoustics, but it is very accurate to say we are detecting the pseudo-periodicities of the signal when we hear a pitch.

The spectrum is the breakdown of the audio signal into a sum of sines and cosines of various frequencies. As David pointed out, usually when people talk about "Frequency" in a musical context, they are referring to the frequency of these sine waves that you broke the signal into. So the spectrum is looking at which of these sine components are large, and what frequencies they are at. The spectrum broadly represents the "high frequency" you hear in a high hat, and the "low frequency" you hear in the thud of a rock hitting the ground. Strictly speaking, neither of these sounds are periodic, nor do you perceive a pitch, but what you hear is the relative magnitudes of the high frequency and the low frequency parts of the spectrum

The Fourier Transform (or DFT/FFT) is the mathematical algorithm by which you break down your audio signal into the sums of sines and cosines. So by looking at the magnitude of these sines and cosines that you get out of the FFT, you get the Spectrum. A naive way of guessing the pitch is by looking directly at the spectrum of a short piece of audio, and assume that the biggest sine component of your signal corresponds to its fundamental periodicity.

I wrote up a very long answer to another post that I think will answer your questions of how to extract pitch: https://stackoverflow.com/a/7211695/94102 I'd strongly suggest reading it. It will give you the tools and understanding you need to make a high quality tuner.



回答3:

A musical instrument playing a single note at one pitch can produce many many frequencies of acoustic vibrations during the duration of the note.

This is because musical instruments are not sine wave generators. The complicated (and more interesting sounding) waveforms instead produced can be represented as an additive composite of many many sine and cosine waves of different amplitudes, the "frequencies".

These many spectral frequencies are usually harmonics of the pitch frequency, sometimes exact multiples of the pitch frequency, but sometimes slightly inharmonic for big string instruments, to very inharmonic for some types of percussion instruments, as well as note transients.

When tuning a musical instrument, a musician usually only cares about pitch. They aren't interested in the frequency of all the harmonics (except maybe the 1st), even the loudest ones. These harmonics can be the frequencies which would show up as the highest peaks in an FFT magnitude. For some musical sounds, the pitch frequency might show up as one of the smallest among many frequency peaks, or might not show up at all, which makes frequency picking potentially error prone.

Pitch estimation algorithms, instead, try to pick out a fundamental (pseudo)repetitive period that a human would perceive as the musical pitch, whether or not the reciprocal of that period is among the strongest frequency components in the acoustic spectrum.

An FFT can be used as part of frequency estimator. Just using the FFT peak magnitude result alone is a very poor frequency estimator without proper sizing, windowing, interpolation, and maybe some sort of decision mechanism. But even a good frequency estimator is not a pitch estimator.

Pitch estimators can use an FFT as part of their analysis, but often use autocorrelation, cepstrums, vocoders, pattern matching, decision theory, and related algorithms, in addition to or instead of an FFT.

Summary: A tuner should deal with pitch, and ignore spectral frequency unless that turns out to be a relavent component of the pitch analysis or pitch estimation.



回答4:

The pitch is the standard note you have to approach. For the A, this is 440 Hz, officially, but more and more musicians and instruments are tending it up, as this can be 441, 442, ... For programming, you'd better let the user fix its standard A (let him go between 440 and 449, for example, by steps of 1 Hz). Then the A one octave up will be 880, 882 ... depending of the user's initial choice. You will have to compute the other notes on a log scale (by twelve intervals), and the best will be to show the distance between the frequency heard and the closest note. See this example : http://members.efn.org/~qehn/global/building/cents.htm



回答5:

As others have said, a musical "pitch" such as the A4 note played by a flute for example, is composed of many audio "frequencies", namely the fundamental A4 tone of 440 Hz, and many overtones (also known as harmonics.)

The overtones are integer multiples of the fundamental tone. In this example the fundamental tone is 440 Hz and the overtones are 880, 1320, 1760 Hz and so on.

You will understand much better the relationship between pitch and frequency by looking at the actual frequency spectra of several musical instruments.

You can see the frequency spectra here: Musical instrument spectrum

When you look at a musical instrument spectrum using the above tool, you are looking at the output of an FFT (a Fast Fourier Transform). The FFT was used to process the digitally recorded sound produced by the musical instrument.

The FFT transforms the audio signal of the musical instrument, from the time/sound_pressure domain, to the frequency/frequency_magnitude domain.

The FFT automatically produces magnitudes for "negative frequencies", in addition to magnitudes for the "normal" positive frequencies. No need to discuss that here, but to see only the "normal" positive frequencies, click the "Un-Fold w" button.

The above tool shows FFT magnitudes in decibels (by default). A decibel is a stretched version of a "normal" linear magnitude. Decibel graphs let you to see very large and very small magnitudes on the same graph.

If you want to see only the frequencies with the largest magnitudes, click the "FFT Y-Axis Magnitude" menu, and select "Sqrt(R^2+I^2)" at the top of the menu.

To go back to the decibel graph, select "dB Norm Sqrt(R^2+I^2)" in the same menu.

Click the "Play" button to hear the recorded sound of the selected instrument, playing the selected note.

Click the "Inv-FFT" button to see the time/sound_pressure signal that was recorded for the selected instrument and note.

By the way, Inv-FFT performs an actual inverse FFT. It actually synthesizes the original time/sound_pressure signal from the frequency/frequency_magnitude data.

Click the "FFT" button to again see the spectrum.

Use the zoom-in and zoom-out buttons to select a zoom mode. Then drag a box around the part of the graph you want to zoom-in or zoom-out. Click the zoom button again to return to unzoomed mode.

For your tuner, you'll have to:

  1. process the input signal (the instrument sound) with the FFT.
  2. detect the fundamental peak.
  3. determine how far away the peak is from the desired pitch, A4's 440 Hz for example.
  4. display the difference to the user.

Problems you'll encounter:

  1. background noise in the input signal.
  2. user's instrument is badly out of tune (bad instrument).
  3. user is trying to tune chords instead of single notes (bad user).


回答6:

It's important to notice the difference between a 'frequency' of vibration and a musical 'pitch'.

A 'pitch' is not a single vibration, such as a sine wave, but is a composite of multiple sound vibrations occurring at different mathematically related frequencies. The elements of this composite of vibrations at differing frequencies are referred to as harmonics or partials. For instance, if we press the Middle C key on the piano, the individual frequencies of the composite's harmonics will start at 261.6 Hz as the fundamental frequency, 523 Hz would be the 2nd Harmonic, 785 Hz would be the 3rd Harmonic, 1046 Hz would be the 4th Harmonic, etc. The later harmonics are integer multiples of the fundamental frequency, 261.6 Hz ( ex: 2 x 261.6 = 523, 3 x 261.6 = 785, 4 x 261.6 = 1046 ).

Below is the image of a Logarithmic DFT for 3 seconds of a guitar solo on a polyphonic MP3 recording. It shows how the harmonics appear for individual notes on a guitar, while playing a solo. For each note on this Logarithmic DFT we can see its multiple harmonics extending vertically, because each harmonic will have the same time-width. (click for image of Logarithmic DFT)

This Wikipedia article gives a good background into the concept of 'pitch' as it pertains to music, and introduces some concepts about pitch detection.

https://en.wikipedia.org/wiki/Transcription_(music)#Pitch_detection



回答7:

Pitch and frequency measure exactly the same quantity, but on different scales.

Frequency is generally measured in Hertz, which counts the number of times per second that the vibrating object goes through a complete period of its vibration. For example, if the frequency is 440 Hertz, then the object goes through 440 complete periods of its vibration every second.

Pitch is generally measured in octaves, semitones and cents - a cent is a 1/100 of a semitone, and a semitone is 1/12 of an octave. It's not normally expressed as a numeric quantity, but instead, with letters and symbols. That's because there's no "zero point" as such, for pitch.

Because pitch and frequency measure the same thing, you can convert freely between them - rather like converting between temperatures expressed in Fahrenheit and in Celsius. The algorithm is a little complicated though - to work out a pitch, you need to take the DIFFERENCE between the base-2 logarithm of the frequency, and the base-2 logarithm of a frequency corresponding to a known pitch. The most commonly used value for this known pitch is "A above middle C" - it corresponds to a frequency of exactly 440 Hertz.

This conversion is best demonstrated with an example. Suppose I want to find the pitch corresponding to a frequency of 1000 Hertz. The base-2 logarithm of 1000 is 9.9657842847. The base-2 logarithm of 440 is 8.7813597135. The difference is 1.1844245711; which tells me that the pitch corresponding to 1000 Hertz is 1.1844245711 octaves above "A above middle C". Multiply this by 12 to give an answer in semitones - it's 14.21309485 semitones. Now 14 semitones above "A above middle C" is the "B" almost 2 octaves above middle C. The pitch that we're looking for is therefore 21.309485 cents above this "B".

The letter names are a bit confusing, because sometimes you go up 2 semitones to get to the next letter (so B is 2 semitones above A), and sometimes just 1 (so C is 1 semitone above B). They also repeat every octave (so 2 semitones above G isn't H, it's A). Musicians find this easy to deal with; the rest of us find it horribly confusing.

Now, when you play a single note on a musical instrument, the sound wave that you get has multiple frequencies, which you can find out with a Fourier analysis. The lowest frequency is called the "fundamental frequency", and the other frequencies are usually integer multiples of this frequency (which are called "harmonics" or "overtones"). So, if you play "A above middle C" on a piano, you'll get a composite sound, made up of a frequency of 440Hz, a frequency of 880Hz, a frequency of 1320Hz and so on - there could be dozens of these individual frequencies that make up your sound, all of them integer multiples of 440Hz. Now most musicians listening to this won't distinguish individual sounds for each frequency, so when a musician uses the word "pitch", they're normally referring to the pitch of the fundamental frequency ("A above middle C"), because that's the only pitch that is actually distinguishable.

If you're building a tuner, this is the definition of "pitch" that you will want to use; that is, your tuner should only display the pitches that a musician who hears the sound can actually distinguish. This means that after you've done a Fourier analysis, you need to remove these higher frequencies, before you calculate the pitches. I think (but I'm not sure about this part) that once you've got your set of frequencies from your Fourier analysis, you'll need to remove any frequencies that

  • are an integer multiple, or very close to an integer multiple, of a lower frequency in the set
  • have a notably lower amplitude than that lower frequency - but I'm not sure how much lower the amplitude has to be, before the higher pitch disappears into inaudibility (and it probably varies from one hearer to another).

To give another example, suppose I have a sound which includes frequencies of 262Hz, 440Hz, 524Hz, 786Hz, 880Hz, 1048Hz and 1320Hz, and the amplitude of each frequency is much greater than the amplitude of the frequency above. I notice that all the frequencies are multiples of 262Hz or 440Hz. So I conclude that this sound has just two "fundamental frequencies", and therefore consists of just two musical notes, or just two pitches (roughly middle C and the A above). The higher pitches are certainly components of the sound, but they are harmonics. The harmonics won't be audible to anyone hearing the sound; and therefore should not be displayed by your tuner.

To that extent, pitch as perceived by a musician is a psychological effect, which makes it hard to model in an electronic tuner. You may have to do some experimentation, to work out exactly when a higher pitch should be considered a separate note, and when it should be considered a harmonic. Also, many musicians will be able to hear pitches that the Fourier analysis doesn't pick up (summation tones and difference tones) - their hearing really does play tricks.