How can I grab comma delimited values that appear

2020-05-03 10:40发布

问题:

Here's my code so far:

public void DeserialStream(string filePath)
    {
        using (StreamReader sr = new StreamReader(filePath))
        {
            string currentline;
            while ((currentline = sr.ReadLine()) != null)
            {
                if (currentline.IndexOf("Count", StringComparison.CurrentCultureIgnoreCase) >= 0)
                {
                    Console.WriteLine(currentline);
                }

            }
        }
    }

I was wondering how can I grab comma delimited values that appear after a term I searched for?

Like if I a csv that contained this info:

"Date","dd/mm/yyyy"
"ExpirationDate","dd/mm/yyyy"

"DataType","Count"
"Location","Unknown","Variable1","Variable2","Variable3"
"A(Loc3, Loc4)","Unknown","5656","787","42"
"A(Loc5, Loc6)","Unknown","25","878","921"

"DataType","Net"
"Location","Unknown","Variable1","Variable2","Variable3"
"A(Loc3, Loc4)","Unknown","5656","787","42"
"A(Loc5, Loc6)","Unknown","25","878","921"

But how would I grab the table of values after Count but before Net?

That is, only the data is brackets is what I want to parse:

"Date","dd/mm/yyyy"
    "ExpirationDate","dd/mm/yyyy"

    "DataType","Count"
   [ "Location","Unknown","Variable1","Variable2","Variable3"
    "A(Loc3, Loc4)","Unknown","5656","787","42"
    "A(Loc5, Loc6)","Unknown","25","878","921"]

    "DataType","Net"
    "Location","Unknown","Variable1","Variable2","Variable3"
    "A(Loc3, Loc4)","Unknown","5656","787","42"
    "A(Loc5, Loc6)","Unknown","25","878","921"

I was thinking maybe I should use a regular expression or is there an easier way using the method above?

回答1:

You can use LINQ:

List<string> lines = File.ReadLines(path)
   .SkipWhile(l => l.IndexOf("\"Count\"", StringComparison.InvariantCultureIgnoreCase) == -1)
   .Skip(1) // skip the "Count"-line
   .TakeWhile(l => l.IndexOf("\"Net\"",   StringComparison.InvariantCultureIgnoreCase) == -1)
   .ToList();

Use String.Split to get a string[] for every line. In general i would use an available CSV parser which handle edge cases and bad data instead of reinventing the wheel.

Edit: If you want to split the fields into a List<string> you should use a CSV parser as mentioned since your data already uses a quoting character, so commas wrapped in " should not be splitted.

However, here is another simple but efficient approach using a StringBuilder:

public static IEnumerable<string> SplitCSV(string csvString)
{
    var sb = new StringBuilder();
    bool quoted = false;

    foreach (char c in csvString)
    {
        if (quoted)
        {
            if (c == '"')
                quoted = false;
            else
                sb.Append(c);
        }
        else
        {
            if (c == '"')
            {
                quoted = true;
            }
            else if (c == ',')
            {
                yield return sb.ToString();
                sb.Length = 0;
            }
            else
            {
                sb.Append(c);
            }
        }
    }

    if (quoted)
        throw new ArgumentException("csvString", "Unterminated quotation mark.");

    yield return sb.ToString();
}

( thanks to https://stackoverflow.com/a/4150727/284240 )

Now you can use SelectMany in the query above to flatten out all tokens:

List<string> allTokens = File.ReadLines(path)
    .SkipWhile(l => l.IndexOf("\"Count\"", StringComparison.InvariantCultureIgnoreCase) == -1)
    .Skip(1) // skip the "Count"-line
    .TakeWhile(l => l.IndexOf("\"Net\"", StringComparison.InvariantCultureIgnoreCase) == -1)
    .SelectMany(l => SplitCSV(l.Trim()))
    .ToList();

Result:

Location, Unknown, Variable1, Variable2, Variable3, A(Loc3, Loc4), Unknown, 5656, 787, 42, A(Loc5, Loc6), Unknown, 25, 878, 921, ""


回答2:

You can use a regex like this:

\"DataType\"\,\"(?:Count|Net)\"((?!\"DataType\").)*

This would match the DataType line all the way to the next DataType line.