I have a text file that contains about 100000 articles. The structure of file is:
.Document ID 42944-YEAR:5 .Date 03\08\11 .Cat political Article Content 1 .Document ID 42945-YEAR:5 .Date 03\08\11 .Cat political Article Content 2
I want to open this file in c# for processing it line by line. I tried this code:
String[] FileLines = File.ReadAllText(
TB_SourceFile.Text).Split(Environment.NewLine.ToCharArray());
But it says:
Exception of type 'System.OutOfMemoryException' was thrown.
The question is How can I open this file and read it line by line.
- File Size: 564 MB (591,886,626 bytes)
- File Encoding: UTF-8
- File contains Unicode characters.
Your file is too large to be read into memory in one go, as
File.ReadAllText
is trying to do. You should instead read the file line by line.Adapted from MSDN:
In this way, no more than a single line of the file is in memory at any one time.
If you are using .NET Framework 4, there is a new static method on System.IO.File called ReadLines that returns an IEnumerable of string. I believe it was added to the framework for this exact scenario; however, I have yet to use it myself.
MSDN Documentation - File.ReadLines Method (String)
Related Stack Overflow Question - Bug in the File.ReadLines(..) method of the .net framework 4.0
Something like this:
You can open the file and read it as a stream rather than loading everything into memory all at once.
From MSDN: