Reading a delimted file using LINQ

2019-07-22 08:41发布

问题:

The following LINQ reads a delimited file. Currently, it outputs only the recordId. I want it to output all the fields in file so I can perform some additional LINQ operations on the data. For example, I want to group by recordId, sort by a date, and take(x) results.

  1. I want all the fields in the csv to be returned. Do I need to decalre a variable and set use the index value, like I did for FirstName, LastName and recordId? Not a big deal but is there a better way?

  2. I tried removing the return statement and projecting with new but that didn't work.

Any suggestions?

Thanks!

var recipients = File.ReadAllLines(path)
.Select (record => 
{
string[] tokens = record.Split('|');

string FirstName = tokens[2];
string LastName = tokens[4];
string recordId = tokens[13];

return recordId;
}
)
.GroupBy (recordId => {return recordId; } )
.Dump();

回答1:

Change your Select() to project to an anonymous type that holds all the properties you want:

.Select (record => 
{
  string[] tokens = record.Split('|');

  string FirstName = tokens[2];
  string LastName = tokens[4];
  string recordId = tokens[13];

  return  new { RecordId = recordId, FirstName, LastName };
}

you could also rewrite this more succint:

File.ReadAllLines(path)
    .Select(record  => record.Split('|'))
    .Select(tokens => new { RecordId = tokens[13], FirstName = tokens[2], LastName = tokens[4] })
    .GroupBy(x => x.RecordId)
    .Dump();


回答2:

var query = from l in lines select create cutom type and then afterward you kan group or so anything else { from a in query ..... }



回答3:

Note - I'm not actually sure what you are trying to achieve in your "Grouping" - are you really looking to Group by the Id? Or is this an OrderBy too?

I think you can achieve what you are looking for using anonymous types, e.g.:

var recipients = File.ReadAllLines(path)
    .Select(record => record.Split('|'))
    .Select(split => new { 
         TheDate = Date.Parse(tokens[0]), // this might be the wrong index?
         FirstName = tokens[2], 
         LastName = tokens[4],
         RecordId = tokens[13],
    }) 
    .OrderBy(anon => anon.TheDate)
    .Dump();

If you want to use an actual type, then this article from Jon Skeet might help as a background - http://www.developerfusion.com/article/84468/linq-to-log-files/



回答4:

BrokenGlass already answered the question, but I just wanted to point out that this syntax looks cleaner (at least to me):

var recipients = File.ReadAllLines(path)
.Select(record => record.Split('|'))
.Select (tokens => 
    new {
        FirstName = tokens[2],
        LastName = tokens[4],
        recordId = tokens[5]
    }
)
.GroupBy(person => person.recordId )
.Dump();

Theres not really a need for lambdas with statement blocks here.



回答5:

I'll add:

var recipients = (from record in File.ReadAllLines(path)
    let tokens = record.Split('|')
    let record2 = new { RecordId = tokens[13], FirstName = tokens[2], LastName = tokens[4] }
    group record2 by record2.RecordId
).Dump();

in LINQ non-functional syntax.

I'll add another variant, that uses select ... into instead of let.

var recipients = (from record in File.ReadAllLines(path)
    select record.Split('|') into tokens
    select new { RecordId = tokens[13], FirstName = tokens[2], LastName = tokens[4] } into record2
    group record2 by record2.RecordId
).Dump();

You would need to benchmark them to find the fastest (and it would surely be interesting... Tomorrow morning I'll try)



标签: c# linq .net-4.0