Processing large text file in C#

2020-02-06 04:00发布

问题:

I have 4GB+ text files (csv format) and I want to process this file using linq in c#.

I run complex linq query after load csv and convert to class?

but file size is 4gb although application memory double size of file.

how can i process (linq and new result) large files?

Thanks

回答1:

Instead of loading whole file into memory, you could read and process the file line-by-line.

using (var streamReader = new StreamReader(fileName))
{
    string line;
    while ((line = streamReader.ReadLine()) != null)
    {
        // analize line here
        // throw it away if it does not match
    }
}

[EDIT]

If you need to run complex queries against the data in the file, the right thing to do is to load the data to database and let DBMS to take care of data retrieval and memory management.



回答2:

I think this one is good way... CSV



回答3:

If you are using .NET 4.0 you could use Clay and then write a method that returns an IEnumerable line for line and that makes code like the below possible

from record in GetRecords("myFile.csv",new []{"Foo","Bar"},new[]{","})
where record.Foo == "Baz"
select new {MyRealBar = int.Parse(record.Bar)

the method to project the CSV into a sequence of Clay objects could be created like:

 private IEnumerable<dynamic> GetRecords(
                    string filePath,
                    IEnumerable<string> columnNames, 
                    string[] delimiter){
            if (!File.Exists(filePath))
                yield break;
            var columns = columnNames.ToArray();
            dynamic New = new ClayFactory();
            using (var streamReader = new StreamReader(filePath)){
                var columnLength = columns.Length;
                string line;
                while ((line = streamReader.ReadLine()) != null){
                    var record = New.Record();
                    var fields = line.Split(delimiter, StringSplitOptions.None);
                    if(fields.Length != columnLength)
                        throw new InvalidOperationException(
                                 "fields count does not match column count");
                    for(int i = 0;i<columnLength;i++){
                        record[columns[i]] = fields[i];
                    }
                    yield return record;
                }
            }
        }