Out of Memory Exception In C# [closed]

2019-07-25 12:47发布

问题:

I'm attempting to construct a suffix trie, and due to strict requirements it must be indexed in memory.

EDIT: The problem is not the tree itself, but actually the way I was reading the file.

回答1:

If you're passing the entire text file as a single string you could easily run into an out of memory exception with your first loop!

// imagine if s.Length was 100k or so
for (int i = 0; i < s.Length; i++)
{
    AddString(s.Substring(i, s.Length-i));
}

When reading the file to construct the trie, you'll need to split each line and probably normalize the characters:

string line;
while (null != (line = reader.ReadLine()))
{
    string[] parts = line.Split(' ', ',', '.', '!', '\t', '?'); // naive
    foreach (string part in parts)
    {
        if (part.Length > 0)
        {
            // make each string uppercase so as to avoid Hello and hello being
            // two trie entries
            trie.AddSuffix(part.ToUpperInvariant());
        }
    }
}

For example (on the output from dir /b c:\windows):

A
 D
  D
   I
    N
     S
  E
   D
 P
  P
   C
    O
     M
      P
       A
        T
   P
    A
     T
      C
       H
...

To appropriately handle larger files, a more compact trie structure would be desirable. I would just have unshared suffixes stored in a separate dictionary:

// If you add a character, but there is no entry in m_children
// just park the tail end of it here
Dictionary<char, string> m_tails;

You would then move the per character logic to your AddString of the SuffixNode:

public void AddString(string s)
{
    if (s.Length == 0) return;

    char c = s[0];
    if (m_children.ContainsKey(c))
    {
        if (s.Length > 1) m_children[c].AddString(s.Substring(1));
    }
    else if (m_tails.ContainsKey(c))
    {
        SuffixNode node = new SuffixNode();
        node.AddString(m_tails[c]);
        if (s.Length > 1) node.AddString(s.Substring(1));

        m_children.Add(c, node);
        m_tails.Remove(c);
    }
    else
    {
        m_tails.Add(c, s.Length > 1 ? s.Substring(1) : "");
    }
}

Now you have a much more compact version of the trie, which will greatly decrease the number of child SuffixNodes created for any given corpus. Returning to the dir /b c:\windows example, we can see a practical reduction in nodes:

A
 P
  P
   COMPAT
   PATCH
  I
 T
  I
   O
    N
     S
...

At this point your trie has a more efficient representation. You're left with determining how to deal with terminal node representations in order to ensure lookups are accurate.