.NET RegEx “Memory Leak” investigation

2020-06-02 05:31发布

问题:

I recently looked into some .NET "memory leaks" (i.e. unexpected, lingering GC rooted objects) in a WinForms app. After loading and then closing a huge report, the memory usage did not drop as expected even after a couple of gen2 collections. Assuming that the reporting control was being kept alive by a stray event handler I cracked open WinDbg to see what was happening...

Using WinDbg, the !dumpheap -stat command reported a large amount of memory was consumed by string instances. Further refining this down with the !dumpheap -type System.String command I found the culprit, a 90MB string used for the report, at address 03be7930. The last step was to invoke !gcroot 03be7930 to see which object(s) were keeping it alive.

My expectations were incorrect - it was not an unhooked event handler hanging onto the reporting control (and report string), but instead it was held on by a System.Text.RegularExpressions.RegexInterpreter instance, which itself is a descendant of a System.Text.RegularExpressions.CachedCodeEntry. Now, the caching of Regexs is (somewhat) common knowledge as this helps to reduce the overhead of having to recompile the Regex each time it is used. But what then does this have to do with keeping my string alive?

Based on analysis using Reflector, it turns out that the input string is stored in the RegexInterpreter whenever a Regex method is called. The RegexInterpreter holds onto this string reference until a new string is fed into it by a subsequent Regex method invocation. I'd expect similar behaviour by hanging onto Regex.Match instances and perhaps others. The chain is something like this:

  • Regex.Split, Regex.Match, Regex.Replace, etc
    • Regex.Run
      • RegexScanner.Scan (RegexScanner is the base class, RegexInterpreter is the subclass described above).

The offending Regex is only used for reporting, rarely used, and therefore unlikely to be used again to clear out the existing report string. And even if the Regex was used at a later point, it would probably be processing another large report. This is a relatively significant problem and just plain feels dirty.

All that said, I found a few options on how to resolve, or at least work around, this scenario. I'll let the community respond first and if no takers come forward I will fill in any gaps in a day or two.

回答1:

Are you using instances of Regex or the static Regex methods which take a string pattern? According to this post, Regex instances do not participate in the caching.



回答2:

Try switching to a compiled Regex - instantiation takes longer, but perhaps won't be subject to this odd leak.

See http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions%28v=VS.100%29.aspx for more.

Or, don't hold onto the Regex instance longer than you need to - create a fresh one for each report invocation.