Data to audio and back. Modulation / demodulation

I have a stream of binary data and want to convert it to raw waveform sound data, which I can send to the speakers.

This is what the old-school modems did in order to transfer binary data over the phone line (producing the typical modemish sound). It is called modulation.

Then I need a reverse process - from the raw waveform samples, I want to obtain the exact binary data. This is called demodulation.

Any bitrate will work for a start.
The sound is played using computer speakers and sampled using a microphone.
Bandwidth would be quite low (low quality microphone).
There is some background noise but not much.

I found one particular way to do this - Frequency shift keying. The problem is I can't find any source code.

Can you please point me to an implementation of FSK in any language?
Or offer any alternative encoding binary<->sound with available source code?

回答1:

The simplest modulation scheme would be amplitude modulation (technically for the digital realm this would be called Amplitude Shift Keying). Take a fixed frequency (let's say 10Khz), your "carrier wave", and use the bits in your binary data to turn it on and off. If your data rate is 10 bits per second you will be toggling the 10KHz signal on and off at that rate. The demodulation would be an (optional) 10KHz filter followed by comparing with a threshold. This is a fairly simple scheme to implement. Generally the higher the signal frequency and your available bandwidth, the faster you can switch that signal on and off.

A very cool/fun application here would be to encode/decode as morse code and see how fast you can go.

FSK, shifting between two frequencies is more efficient in bandwidth and more immune to noise but will make the demodulator more complex as you need to distinguish between the two frequencies.

Advanced modulation scheme such as Phase Shift Keying are good at getting the highest bit rate for a given bandwidth and signal to noise ratio but they are more complicated to implement. Analog phone modems needed to deal with certain bandwidth (e.g. as little as 3Khz) and noise limitations. If you need to get the highest possible bitrate given bandwidth and noise limitations then is the way to go.

For actual code samples of advanced modulation schemes I would investigate application notes from DSP vendors (such as TI and Analog Devices) as those were common applications for DSPs.

Implementing a PI/4 Shift D-QPSK Baseband Modem Using the TMS320C50

QPSK modulation demystified

V.34 Transmitter and Receiver Implementation on the TMS320C50 DSP

Another very simple and not so efficient method is to use DTMF. Those are the tones generated by phone keypads where each symbol is a combination of two frequencies. If you Google you'll find a lot of source code. Depending on your application/requirements this may be a simple solution.

Let's dive in to some simple scheme implementation details, something like the morse code I mentioned earlier. We can use "dot" for 0 and "dash' for 1. An advantage of a morse like scheme is that it also solves the framing problem as you can resynchronize your sampling after every space. For simplicity let's pick the "Carrier Wave" frequency at 11KHz and assume your wave output is 44Khz, 16 bit, mono. We'll also use a square wave which will create harmonics but we don't really care. If 11KHz is beyond your microphone's frequency response then just divide all frequencies by 2 e.g. We'll pick some arbitrary level 10000 and so our "on" waveform looks like this:

{10000, 10000, 0, 0, 10000, 10000, 0, 0, 10000, 0, 0, ...} // 4 samples = 11Khz period

and our "off" waveform is just all zeros. I leave the coding of this part as an excersize to the reader.

And so we have something like:

const int dot_samples = 400; // ~10ms - speed up later
const int space_samples = 400; // ~10ms
const int dash_samples = 800; // ~20ms

void encode( uint8_t* source, int length, int16_t* target ) // assumes enough room in target
{
  for(int i=0; i<length; i++)
  {
    for(int j=0; j<8; j++)
    {
      if((source[i]>>j) & 1) // If data bit is 1 we'll encode a dot
      {
        generate_on(&target, dash_samples); // Generate ON wave for n samples and update target ptr
      }
      else // otherwise a dash
      {
        generate_on(&target, dot_samples); // Generate ON wave for n samples and update target ptr
      }
      generate_off(&target, space_samples); // Generate zeros
    } 
  }
}

The decoder is a bit more complicated but here's an outline:

Optionally band-pass filter the sampled signal around 11Khz. This will improve performance in a noisy enviornment. FIR filters are pretty simple and there are a few online design applets that will generate the filter for you.
Threshold the signal. Every value above 1/2 maximum amplitude is 1 every value below is 0. This assumes you have sampled the entire signal. If this is in real time you either pick a fixed threshold or do some sort of automatic gain control where you track the maximum signal level over some time.
Scan for start of dot or dash. You probably want to see at least a certain number of 1's in your dot period to consider the samples a dot. Then keep scanning to see if this is a dash. Don't expect a perfect signal - you'll see a few 0's in the middle of your 1's and a few 1's in the middle of your 0's. If there's little noise then differentiating the "on" periods from the "off" periods should be fairly easy.
Then reverse the above process. If you see dash push a 1 bit to your buffer, if a dot push a zero.

回答2:

One purpose of modulation/demodulation is to adapt to channel characteristics. For instance, the channel might not be able to pass DC. Another purpose is to overcome a given amount and type of noise in the channel, while still transferring data above some given error rate.

For FSK, you simply want routines than can generate sine waves at two different frequencies on the transmit end, and filter and detect two different frequencies on the receiving end. The length of each segment of sine waves, the separation in frequency, and the amplitude will depend on the data rate and amount of noise you need to overcome.

In the simplest case, zero noise, simply produce N or 2N sine waves within successive fixed time frames. Something like:

x[i] = amplitude * sin( i * 2 * pi * (data[j] ? 1.0 : 2.0) * freq) / sampleRate )

On the receiving end, you can sample the signal at well above twice the max frequency and measure the distance between zero crossings, and see if you find an bunch of short period or long period waveforms. Much fancier methods using digital signal processing filters (IIR, FIR, etc.) and various statistic detectors can be used in the presence of non-zero noise.