Sorting text file

2019-08-11 11:44发布

I've this txt file (almost 60 MiB)

560000100300100201100001000000000000[...]
560000100400100201100001000000000000[...]
560000100400200201100001000000000000[...]
560000100200100201100001000000000000[...]

i'm writing an app in vb .net that do some unrelated process with this file.

But at the end, it's unsorted.

The "keys" are: (they're together)

01003, 01004, 01004, 01002

and

001, 001, 002, 001

Every line starts with 56000 then the first key, the the second key and the rest of the line.

I tried to use SORT, that's included with Windows. It does a pretty nice job, but i need to have my own function in case SORT is not available.

The output should write 560001002001 at first.

Any ideas?, ask whatever you need yo know.

Thank you.

3条回答
We Are One
2楼-- · 2019-08-11 12:20

Given the size of the file, you may be better going 'old school' and using something like DOS SORT to sort the file. I've had to do this for Data Warehousing and code did not perform as well as a text file sorter.

In a command window (could use a console application, or using ShellExecute on a batch file, or some other way in code), the following command will sort a file according to its contents:

SORT C:\MyFile.CSV /O C:\MyFile_Sorted.CSV

This way, you sort the file as quick as possible, then read the contents of your sorted file (MyFile_Sorted.CSV) into your program. It may be two steps but this is much easier and faster than reading into memory, sorting, then working on the result set. You could read each line in knowing it's already sorted, and remove the need to place 60 MiB in memory.

查看更多
家丑人穷心不美
3楼-- · 2019-08-11 12:33

wanted to comment, but the browser won't let me. so an answer to the sort on n chars: see sort(icomparer) in http://msdn.microsoft.com/en-us/library/0e743hdt.aspx where you write your own compare function, so anything goes.

查看更多
Summer. ? 凉城
4楼-- · 2019-08-11 12:41

Don't use the Windows "sort.exe". Use VB.Net instead:

  1. Read file into a VB.Net string list, a line at a time
  2. Sort the list
  3. Write back the sorted file

Here's an example program from MSDN that already does most of the work for you:

Imports System
Imports System.IO
Imports System.Collections

Module Module1

    Sub Main()
        Dim objReader As New StreamReader("c:\test.txt")
        Dim sLine As String = ""
        Dim arrText As New ArrayList()

        Do
            sLine = objReader.ReadLine()
            If Not sLine Is Nothing Then
                arrText.Add(sLine)
            End If
        Loop Until sLine Is Nothing
        objReader.Close()

        For Each sLine In arrText
            Console.WriteLine(sLine)
        Next
        Console.ReadLine()
    End Sub

End Module

Here's the documentation for ArrayList.Sort():

http://msdn.microsoft.com/en-us/library/8k6e334t.aspx

'Hope that helps!

查看更多
登录 后发表回答