Audio Mixing with Java (without Mixer API)

I am attempting to mix several different audio streams and trying to get them to play at the same time instead of one-at-a-time.

The code below plays them one-at-a-time and I cannot figure out a solution that does not use the Java Mixer API. Unfortunately, my audio card does not support synchronization using the Mixer API and I am forced to figure out a way to do it through code.

Please advise.

/////CODE IS BELOW////

class MixerProgram {
public static AudioFormat monoFormat;
private JFileChooser fileChooser = new JFileChooser(); 
private static File[] files;
private int trackCount;   
private FileInputStream[] fileStreams = new FileInputStream[trackCount];
public static AudioInputStream[] audioInputStream;
private Thread trackThread[] = new Thread[trackCount];
private static DataLine.Info sourceDataLineInfo = null; 
private static SourceDataLine[] sourceLine;  

public MixerProgram(String[] s)
{
  trackCount = s.length;
  sourceLine = new SourceDataLine[trackCount];
  audioInputStream = new AudioInputStream[trackCount]; 
  files = new File[s.length];
}

public static void getFiles(String[] s)
{
  files = new File[s.length];
  for(int i=0; i<s.length;i++)
  {
    File f = new File(s[i]);
    if (!f.exists()) 
    System.err.println("Wave file not found: " + filename);
    files[i] = f;
  }
}


public static void loadAudioFiles(String[] s) 
{
  AudioInputStream in = null;
  audioInputStream = new AudioInputStream[s.length];
  sourceLine = new SourceDataLine[s.length];
  for(int i=0;i<s.length;i++){
    try 
    {
      in = AudioSystem.getAudioInputStream(files[i]); 
    } 
    catch(Exception e) 
    {
      System.err.println("Failed to assign audioInputStream");
    }
    monoFormat = in.getFormat();
    AudioFormat decodedFormat = new AudioFormat(
                                              AudioFormat.Encoding.PCM_SIGNED,
                                              monoFormat.getSampleRate(), 16, monoFormat.getChannels(),
                                              monoFormat.getChannels() * 2, monoFormat.getSampleRate(),
                                              false);
  monoFormat = decodedFormat; //give back name
  audioInputStream[i] = AudioSystem.getAudioInputStream(decodedFormat, in);
  sourceDataLineInfo = new DataLine.Info(SourceDataLine.class, monoFormat);
  try 
  {
    sourceLine[i] = (SourceDataLine) AudioSystem.getLine(sourceDataLineInfo); 
    sourceLine[i].open(monoFormat);
  } 
  catch(LineUnavailableException e) 
  {
    System.err.println("Failed to get SourceDataLine" + e);
  }
}               
}

public static void playAudioMix(String[] s)
{
  final int tracks = s.length;
  System.out.println(tracks);
  Runnable playAudioMixRunner = new Runnable()
  {
    int bufferSize = (int) monoFormat.getSampleRate() * monoFormat.getFrameSize();
    byte[] buffer = new byte[bufferSize]; 
    public void run()
    {
      if(tracks==0)
        return;
      for(int i = 0; i < tracks; i++)
      {
        sourceLine[i].start();
      }        
      int bytesRead = 0;
      while(bytesRead != -1)
      {
        for(int i = 0; i < tracks; i++)
        {
          try 
          {
            bytesRead = audioInputStream[i].read(buffer, 0, buffer.length);
          } 
          catch (IOException e) {
          // TODO Auto-generated catch block
            e.printStackTrace();
          }            
          if(bytesRead >= 0)
          {
            int bytesWritten = sourceLine[i].write(buffer, 0, bytesRead);
            System.out.println(bytesWritten);
          }
        }
      }
    }
  };
  Thread playThread = new Thread(playAudioMixRunner);
  playThread.start();
}
}

The problem is that you are not adding the samples together. If we are looking at 4 tracks, 16-bit PCM data, you need to add all the different values together to "mix" them into one final output. So, from a purely-numbers point-of-view, it would look like this:

[Track1]  320  -16  2000   200  400
[Track2]   16    8   123   -87   91
[Track3]  -16  -34  -356  1200  805
[Track4] 1011 1230 -1230  -100   19
[Final!] 1331 1188   537  1213 1315

In your above code, you should only be writing a single byte array. That byte array is the final mix of all tracks added together. The problem is that you are writing a byte array for each different track (so there is no mixdown happening, as you observed).

If you want to guarantee you don't have any "clipping", you should take the average of all tracks (so add all four tracks above and divide by 4). However, there are artifacts from choosing that approach (like if you have silence on three tracks and one loud track, the final output will be much quiter than the volume of the one track that is not silent). There are more complicated algorithms you can use to do the mixing, but by then you are writing your own mixer :P.