Incomplete wav file in R, but extra data in file?

2019-08-10 19:52发布

问题:

I am trying to open a sound file in R, but the load.wave() function complains that the file is "incomplete". The sound plays well on a number of other audio software (mplayer, Audacity, Praat, etc) and file does not report it to be any different from other WAV files with which there is no problem:

RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 22050 Hz

I know load.wave() internally calls a C function to process the data, but I don't know what that function is, or what it does (so I can't see why it's complaining). The call from load.wave() is defined in R as .Call("load_wave_file", where, PACKAGE = "audio"), where where is the path to the file.

Opening the sound in Audacity and saving it again as a WAV file generates an identical sounding file which can be opened in R without any problems.

However, the files seem to be considerably different. Using vbindiff, there are differences both in the header:

# Original file
0000 0000: 52 49 46 46 39 AE 02 00  57 41 56 45 66 6D 74 20  RIFF9... WAVEfmt   
0000 0010: 12 00 00 00 01 00 01 00  22 56 00 00 44 AC 00 00  ........ "V..D...  
0000 0020: 02 00 10 00 00 00 64 61  74 61 D4 AD 02 00 F9 FF  ......da ta......  

# Fixed file
0000 0000: 52 49 46 46 F8 AD 02 00  57 41 56 45 66 6D 74 20  RIFF.... WAVEfmt   
0000 0010: 10 00 00 00 01 00 01 00  22 56 00 00 44 AC 00 00  ........ "V..D...  
0000 0020: 02 00 10 00 64 61 74 61  D4 AD 02 00 FA FF F6 FF  ....data ........  

and throughout the file:

More interestingly, a chunk at the end of the original file has been removed:

# Original file
0002 ADF0: 5E 00 5D 00 5F 00 5F 00  5F 00 5F 00 5E 00 5D 00  ^.]._._. _._.^.].  
0002 AE00: 5B 00 63 75 65 20 1C 00  00 00 01 00 00 00 01 00  [.cue .. ........  
0002 AE10: 00 00 88 58 01 00 64 61  74 61 00 00 00 00 00 00  ...X..da ta......  
0002 AE20: 00 00 88 58 01 00 4C 49  53 54 13 00 00 00 61 64  ...X..LI ST....ad  
0002 AE30: 74 6C 6C 61 62 6C 07 00  00 00 01 00 00 00 52 54  tllabl.. ......RT  
0002 AE40: 00               

# Fixed file
0002 ADF0: 5E 00 5F 00 5F 00 5F 00  5F 00 5D 00 5F 00 59 00  ^._._._. _.]._.Y.  
0002 AE00:                                                                         
0002 AE10:                                                                      
0002 AE20:                                                                      
0002 AE30:                                                                      
0002 AE40:

1. What is wrong with this file, that prevents me from opening it?

2. What is the data at the end of the original file? (See below)

I know there are multiple audio processing programs out there which are rather liberal with the WAV spec, so this type of problem is not uncommon. I just want to figure out what is going on, to maybe implement a fix (which doesn't require me to fire up Audacity) and to prevent it from happening again in the future.

Update:

This chunk seems to be a "Cue-Points Chunk", as explained here:

The cue-points chunk identifies a series of positions in the waveform data stream.

I guess this means it should be harmless, but is that what's causing the problem?