Using: Unix 2.6.18-194.el5
I am having an issue where this join statement is omitting values/indexes from the match. I found out the values are between 11-90 (out of about 3.5 Million entries) and I have tried to look for foreign characters but I may be overlooking something (Tried cat -v to see hidden characters).
Here is the join statement I am using (only simplified the output columns for security):
join -t "|" -j 1 -o 1.1 2.1 file1 file2> fileJoined
file1 contents (first 20 values):
1 3 7 11 12 16 17 19 20 21 27 28 31 33 34 37 39 40 41 42
file2 contents (first 50 values so you can see where it would match):
1|US 2|US 3|US 4|US 5|US 6|US 7|US 8|US 9|US 10|US 11|US 12|US 13|US 14|US 15|US 16|US 17|US 18|US 19|US 20|US 21|US 22|US 23|US 24|US 25|US 26|US 27|US 28|US 29|US 30|US 31|US 32|US 33|US 34|US 35|US 36|US 37|US 38|US 39|US 40|US 41|US 42|US 43|US 44|US 45|US 46|US 47|US 48|US 49|US 50|US
From my initial testing it appears that file2 is the culprit. Because when I create a new file with values 1-100 I am able to get the join statement to match completely against file1; however the same file will not match against file2.
Another strange thing is that the file is 3.5 million records long and at value 90 they start matching again. For example, the output of fileJoined looks like this (first 20 values only):
1|1 3|3 7|7 90|90 91|91 92|92 93|93 95|95 96|96 97|97 98|98 99|99 106|106 109|109 111|111 112|112 115|115 116|116 117|117 118|118
Other things I have tried are:
- Using vi to manually enter a new line 11 (still doesnt match on the join statement)
- copying the code into notepad, deleting the lines in vi and then copying them back in (same result, no matching 11-90)
- Removing lines 11-90 to see if the problem then shifts to 90-170 and it does not shift
I think that there may be some hidden values that I am missing, or that the 11 - 90 from file1 is not the same binary equivalent as the 11 - 90 in file2?
I am lost here, any help would be greatly appreciated.