Performing a subtotal on filtered data from a stre

2020-05-06 14:46发布

问题:

edit as question is unanswered

I have a filtered output based on 1 criteria (first 3 numbers are 110,210 or 310,to give 3 distinct groups) to console from streamreader. Question edited because first answer was a literal solution to the specific example I gave, the real strings I'm using are 450 ASCII characters long. I have adjusted the example strings to remedy this, anything that works on the sample data will work on what I have.

so what i really need is something that can, depending on the first 3 numbers, take the 3 letters from a predesignated known location (for 210's it'll be character slot 14-16 and then using that as a subcategory, sum up all entries in character slot 33-37, and output those).

example strings:

210!!!!123244AAA75AWEHUIHJUAS!!!11111
210???1223455ABC76554HJHSDFQ????22222
210--32455623ABCFFCDGHDSFAS-----33333
310         1232451    2ABC34       GAERsASDFASDG1234523   44444
310 1234a354GDSAASDR  3 AAA  GF234523653hfdssdgSDASDF      11111
310 12378HJK1234        ABC HJHJK123462 ASDHDFS FA REW     22222
4101111ASDJF     1ABCASF        D1234    ASGF66666
4102222QW12362ER2 ABC 23459876HJKXC          11111
41033333T123 1RWE AAA  ASDFHJKRTR  WQ        22222

At the end of this, my output would be:

210 AAA 11111
210 ABC 55555
310 ABC 66666
310 AAA 11111
410 ABC 77777
410 AAA 22222

The ABC, AAA etc. are always in the same location for the same starting number, but will be different per starting number.

Likewise the location of the amounts being summed up are also only in the same place per each starting number.

I've tried adding some string.split to the existing code (below) but haven't had any luck.

// Read in a file line-by-line, and store in a List.
List<string> list = new List<string>();
using (StreamReader reader = new StreamReader("file.dat"))
{
    string line;
    while ((line = reader.ReadLine()) != null)
    {
        var beginning = line.Substring(0, 3);
        if (beginning != "210" && beginning != "310" && beginning != "410")
            continue;
        list.Add(line); // Add to list.
        Console.WriteLine(line); // Write to console.
    }
}

回答1:

(Posting this answer here, as the other question is closed.) Using ReadAllText will be inefficient for large files.

public static class LinqToTextReader {
    public static IEnumerable<string> AsEnumerable(this TextReader reader) {
        string line;
        while ((line = reader.ReadLine()) != null) {
            yield return line;
        }
    }
}

class Program {
    static void Main(string[] args) {
        using (StreamReader reader = new StreamReader("file.dat")) {
            var locations = new Dictionary<string, int[]>() {
                {"210", new [] {406, 409, 129, 140, 142, 153}},
                {"310", new [] {322, 325, 113, 124, 126, 137}},
                {"410", new [] {478, 481, 113, 124, 126, 137}}
            };

            var query =
                from line in reader.AsEnumerable()
                let lineStart = line.Substring(0, 3)
                where lineStart == "210" || lineStart == "310" || lineStart == "410"
                let currentLocations = locations[lineStart]
                select new {
                    letters = line.Substring(currentLocations[0], currentLocations[1]),
                    value =
                        int.Parse(line.Substring(currentLocations[2], currentLocations[3])) +
                        int.Parse(line.Substring(currentLocations[4], currentLocations[5]))
                };

            //It should be possible to combine the two queries
            var query2 = 
                from item in query
                group item by item.letters into letterGroup
                select new {
                    letters = letterGroup.Key,
                    total = letterGroup.Sum(item => item.value)
                };

            foreach (var item in query2) {
                Console.WriteLine(item.letters);
                Console.WriteLine(item.total);
            }
        }
    }
}


回答2:

string input = File.ReadAllText("file.dat");
var result = Regex.Matches(input, "(210|310|410).*?([A-C]{3})([0-9]{5})")
    .Cast<Match>()
    .Select(m => new { 
        P1 = m.Groups[1].Value, 
        P2 = m.Groups[2].Value, 
        P3 = Convert.ToInt32(m.Groups[3].Value)
    })
    .GroupBy(x => new{x.P1,x.P2})
    .Select(x=>String.Format("{0} {1} {2}",x.Key.P1,x.Key.P2,x.Sum(y=>y.P3)))
    .ToList();