I found a really useful bit of perl here that writes the filename of a text file to the first line of the file. I am running this from terminal in OS X Yosemite:
perl -i -pe 'BEGIN{undef $/;} s/^/\nFilename:$ARGV\n/' `find . -name '*.TXT'`
With some modification I thought it had solved my specific problem however the files I'm picking up are UTF-16LE and I've since discovered this command is writing in UTF-8 and making a real mess of the output (text is visibly correct but is not recognised in calculations in excel, filemaker etc).
After several attempts I need help with getting this script to write the filename in UTF-16LE to the start of the file. (Note: I do have a workaround now of batch convert files to UTF-8, then run this however I'd prefer to have this workflow in one step).
reinierpost was correct - it was more about removing the original unicode byte order mark (BOM). What worked in the end was:
perl -i -pe 'BEGIN{undef $/;} s/\xFF\xFE/Filename:$ARGV\n/' `find . -name '*.TXT'`
where the UTF-16LE BOM \xFF\xFE is replaced by my new string. For reference some other BOMs are :
- iso-10646-1 > \xFE\xFF
- UTF-16BE > \xFE\xFF
- UTF-8 > \xEF\xBB\xBF
I was also able to write the new text into UTF-16LE with
perl -i -pe 'BEGIN{binmode STDIN,":encoding(utf8)";binmode STDOUT,":encoding(utf16)"; undef $/;} s/\xFF\xFE/\xFF\xFE\nFilename:$ARGV\n/' `find . -name '*.TXT'`
however I now believe that my source data is a mixed bag of UTF8 and UTF16 as this last version creates a mixed set of characters between the new header and the data. Thanks reinierpost for steering me in the right direction. I remain interested if others can improve this.