.NET System.OutOfMemoryException on String.Split()

2020-07-11 06:42发布

I am using C# to read a ~120 MB plain-text CSV file. Initially I did the parsing by reading it line-by-line, but recently determined that reading the entire file contents into memory first was multiple times faster. The parsing is already quite slow because the CSV has commas embedded inside quotes, which means I have to use a regex split. This is the only one I have found that works reliably:

string[] fields = Regex.Split(line, 
@",(?!(?<=(?:^|,)\s*\x22(?:[^\x22]|\x22\x22|\\\x22)*,)
(?:[^\x22]|\x22\x22|\\\x22)*\x22\s*(?:,|$))");
// from http://regexlib.com/REDetails.aspx?regexp_id=621

In order to do the parsing after reading the entire contents into memory, I do a string split on the newline character to get an array containing each line. However, when I do this on the 120 MB file, I get a System.OutOfMemoryException. Why does it run out of memory so quickly when my computer has 4 GB of RAM? Is there a better way to quickly parse a complicated CSV?

9条回答
一纸荒年 Trace。
2楼-- · 2020-07-11 06:55

You're running out of memory on the stack, not the heap.

You could try re-factoring your app such that you're processing the input in more manageable "chunks" of data rather than processing 120MB at a time.

查看更多
孤傲高冷的网名
3楼-- · 2020-07-11 07:00

If you have the whole file read into a string you should probably use a StringReader.

StringReader reader = new StringReader(fileContents);
string line;
while ((line = reader.ReadLine()) != null) {
    // Process line
}

This should be roughtly the same as streaming from a file with the difference that the contents are in the memory already.

Edit after testing

Tried the above with a 140MB file where the processing consisted of incrementing length variable with line.Length. This took around 1.6 seconds on my computer. After this I tried the following:

System.IO.StreamReader reader = new StreamReader("D:\\test.txt");
long length = 0;
string line;
while ((line = reader.ReadLine()) != null)
    length += line.Length;

The result was around 1 second.

Of course your mileage may vary, especially if you are reading from a network drive or your processing takes long enough for hard drive to seek somewhere else. But also if you're using FileStream to read the file and you're not buffering. StreamReader provides buffering which greatly enhances the reading.

查看更多
等我变得足够好
4楼-- · 2020-07-11 07:01

You can get an OutOfMemoryException for basically any size of allocation. When you allocate a piece of memory you're really asking for a continuous piece of memory of the requested size. If that cannot be honored you'll see an OutOfMemoryException.

You should also be aware that unless you're running 64 bit Windows, your 4 GB RAM is split into 2 GB kernel space and 2 GB user space, so your .NET application cannot access more that 2 GB per default.

When doing string operations in .NET you risk creating a lot of temporary strings due to the fact that .NET strings are immutable. Therefore you may see memory usage rise quite dramatically.

查看更多
登录 后发表回答