I've this txt file (almost 60 MiB)
560000100300100201100001000000000000[...]
560000100400100201100001000000000000[...]
560000100400200201100001000000000000[...]
560000100200100201100001000000000000[...]
i'm writing an app in vb .net that do some unrelated process with this file.
But at the end, it's unsorted.
The "keys" are: (they're together)
01003, 01004, 01004, 01002
and
001, 001, 002, 001
Every line starts with 56000 then the first key, the the second key and the rest of the line.
I tried to use SORT, that's included with Windows. It does a pretty nice job, but i need to have my own function in case SORT is not available.
The output should write 560001002001 at first.
Any ideas?, ask whatever you need yo know.
Thank you.
Given the size of the file, you may be better going 'old school' and using something like DOS SORT to sort the file. I've had to do this for Data Warehousing and code did not perform as well as a text file sorter.
In a command window (could use a console application, or using ShellExecute on a batch file, or some other way in code), the following command will sort a file according to its contents:
This way, you sort the file as quick as possible, then read the contents of your sorted file (MyFile_Sorted.CSV) into your program. It may be two steps but this is much easier and faster than reading into memory, sorting, then working on the result set. You could read each line in knowing it's already sorted, and remove the need to place 60 MiB in memory.
wanted to comment, but the browser won't let me. so an answer to the sort on n chars: see sort(icomparer) in http://msdn.microsoft.com/en-us/library/0e743hdt.aspx where you write your own compare function, so anything goes.
Don't use the Windows "sort.exe". Use VB.Net instead:
Here's an example program from MSDN that already does most of the work for you:
Here's the documentation for ArrayList.Sort():
http://msdn.microsoft.com/en-us/library/8k6e334t.aspx
'Hope that helps!