I'm currently stumped. I've been looking around and experimenting with audio comparison. I've found quite a bit of material, and a ton of references to different libraries and methods to do it.
As of now I've taken Audacity and exported a 3min wav file called "long.wav" and then split the first 30seconds of that into a file called "short.wav". I figured somewhere along the line I could visually log (log.txt) the data through java for each and should be able to see at least some visual similarities among the values.... here's some code
Main method:
int totalFramesRead = 0;
File fileIn = new File(filePath);
BufferedWriter writer = new BufferedWriter(new FileWriter(outPath));
writer.flush();
writer.write("");
try {
AudioInputStream audioInputStream =
AudioSystem.getAudioInputStream(fileIn);
int bytesPerFrame =
audioInputStream.getFormat().getFrameSize();
if (bytesPerFrame == AudioSystem.NOT_SPECIFIED) {
// some audio formats may have unspecified frame size
// in that case we may read any amount of bytes
bytesPerFrame = 1;
}
// Set an arbitrary buffer size of 1024 frames.
int numBytes = 1024 * bytesPerFrame;
byte[] audioBytes = new byte[numBytes];
try {
int numBytesRead = 0;
int numFramesRead = 0;
// Try to read numBytes bytes from the file.
while ((numBytesRead =
audioInputStream.read(audioBytes)) != -1) {
// Calculate the number of frames actually read.
numFramesRead = numBytesRead / bytesPerFrame;
totalFramesRead += numFramesRead;
// Here, do something useful with the audio data that's
// now in the audioBytes array...
if(totalFramesRead <= 4096 * 100)
{
Complex[][] results = PerformFFT(audioBytes);
int[][] lines = GetKeyPoints(results);
DumpToFile(lines, writer);
}
}
} catch (Exception ex) {
// Handle the error...
}
audioInputStream.close();
} catch (Exception e) {
// Handle the error...
}
writer.close();
Then PerformFFT:
public static Complex[][] PerformFFT(byte[] data) throws IOException
{
final int totalSize = data.length;
int amountPossible = totalSize/Harvester.CHUNK_SIZE;
//When turning into frequency domain we'll need complex numbers:
Complex[][] results = new Complex[amountPossible][];
//For all the chunks:
for(int times = 0;times < amountPossible; times++) {
Complex[] complex = new Complex[Harvester.CHUNK_SIZE];
for(int i = 0;i < Harvester.CHUNK_SIZE;i++) {
//Put the time domain data into a complex number with imaginary part as 0:
complex[i] = new Complex(data[(times*Harvester.CHUNK_SIZE)+i], 0);
}
//Perform FFT analysis on the chunk:
results[times] = FFT.fft(complex);
}
return results;
}
At this point I've tried logging everywhere: audioBytes before transforms, Complex values, and FFT results.
The problem: No matter what values I log, the log.txt of each wav file is completely different. I'm not understanding it. Given that I took the small.wav from the large.wav (and they have all the same properties) there should be a very heavy similarity among either the raw wav byte[] data... or Complex[][] fft data... or something thus far..
How can I possibly try to compare these files if the data isn't even close to similar at any point of these calculations.
I know I'm missing quite a bit of knowledge with regards to audio analysis, and this is why I come to the board for help! Thanks for any info, help, or fixes you can offer!!
I don't know how are you comparing both audio files, but, seeing some service that offer music recognition (like TrackId or MotoID), these services take a small sample of the music you're hearing (10-20 secs), then process them in their server, i theorize that they have samples that long or less and that they have a database of (or calculate it on the fly) patterns of that samples (in your case Fourier Transforms), in your case, you may need to break your long audio file in chunks of or smaller size than your sample data, in the first case you may find a specific chunk that resembles more the pattern in your sample data, in the second case your smaller chunks may resamble a part of your sample data and you can calculate the probability that the sample data belongs to a respective audio file.
For 16-bit audio, 3e-05 isn't really that different from zero. So a file of zeros is pretty much the same as a file of zeros (maybe missing equality by some tiny rounding errors.)
ADDED: For your comparison, read in and plot, using some Java plotting library, a portion of each of the two waveforms when they get past the portion that's mostly (close to) zero.
I think you are looking at Acoustic Fingerprinting It's hard, and there are libraries to do it. If you want to implement it yourself, this is a whitepaper on the shazam algorithm.
I think for debugging you better try use matlab to plot out. Since matlab is much more powerful in dealing with this problem.
You use "wavread" to the file, and "stft" to get the short time Fourier Transformation which is a complex number Matrix. Then simply abs(Matrix) to get the magnitude of each complex number. Show the image with imshow(abs(Matrix),[]).
I don't know how do you compare the whole file and 30s clip (by looking at the stft image?)
Have you looked at MARF? It is a well-documented Java library used for audio recognition.
It is used to recognize speakers (for transcription or securing software) but the same features should be able to be used to classify audio samples. I'm not familiar with it but it looks like you'd want to use the FeatureExtraction class to extract an array of features from each audio sample and then create a unique id.