What's faster? Struct array or DataTable

2019-06-22 08:21发布

I am using LinqToSQL to process data from SQL Server to dump it into an iSeries server for further processing. More details on that here.

My problem is that it is taking about 1.25 minutes to process those 350 rows of data. I am still trying to decipher the results from the SQL Server Profiler, but there are a TON of queries being run. Here is a bit more detail on what I am doing:

using (CarteGraphDataDataContext db = new CarteGraphDataDataContext())
{
    var vehicles = from a in db.EquipmentMainGenerals
                   join b in db.EquipmentMainConditions on a.wdEquipmentMainGeneralOID equals b.wdEquipmentMainGeneralOID
                   where b.Retired == null
                   orderby a.VehicleId
                   select a;

    et = new EquipmentTable[vehicles.Count()];

    foreach (var vehicle in vehicles)
    {
       // Move data to the array

       // Rates
       GetVehcileRates(vehicle.wdEquipmentMainGeneralOID);

       // Build the costs accumulators
       GetPartsAndOilCosts(vehicle.VehicleId);
       GetAccidentAndOutRepairCosts(vehicle.wdEquipmentMainGeneralOID);

       // Last Month's Accumulators
       et[i].lastMonthActualGasOil = GetFuel(vehicle.wdEquipmentMainGeneralOID) + Convert.ToDecimal(oilCost);
       et[i].lastMonthActualParts = Convert.ToDecimal(partsCost);
       et[i].lastMonthActualLabor = GetLabor(vehicle.VehicleId);
       et[i].lastMonthActualOutRepairs = Convert.ToDecimal(outRepairCosts);
       et[i].lastMonthActualAccidentCosts = Convert.ToDecimal(accidentCosts);

       // Move more data to the array

       i++;
   }
}

The Get methods all look similar to:

private void GetPartsAndOilCosts(string vehicleKey)
{
   oilCost = 0;
   partsCost = 0;

   using (CarteGraphDataDataContext db = new CarteGraphDataDataContext())
   {
      try
      {
         var costs = from a in db.WorkOrders
                     join b in db.MaterialLogs on a.WorkOrderId equals b.WorkOrder
                     join c in db.Materials on b.wdMaterialMainGeneralOID equals c.wdMaterialMainGeneralOID
                     where (monthBeginDate.Date <= a.WOClosedDate && a.WOClosedDate <= monthEndDate.Date) && a.EquipmentID == vehicleKey
                     group b by c.Fuel into d
                     select new
                            {
                                isFuel = d.Key,
                                totalCost = d.Sum(b => b.Cost)
                            };

          foreach (var cost in costs)
          {
             if (cost.isFuel == 1)
             {
                oilCost = (double)cost.totalCost * (1 + OVERHEAD_RATE);
             }
             else
             {
                partsCost = (double)cost.totalCost * (1 + OVERHEAD_RATE);
             }
          }
       }
       catch (InvalidOperationException e)
       {
          oilCost = 0;
          partsCost = 0;
       }
    }

    return;
 }

My thinking here is cutting down the number of queries to the DB should speed up the processing. If LINQ does a SELECT for every record, maybe I need to load every record into memory first.

I still consider myself a beginner with C# and OOP in general (I do mostly RPG programming on the iSeries). So I am guessing I am doing something stupid. Can you help me fix my stupidity (at least with this problem)?

Update: Thought I would come back and update you on what I have discovered. It appears like the database was poorly designed. Whatever LINQ was generating in the background it was highly inefficient code. I am not saying the LINQ is bad, it just was bad for this database. I converted to a quickly thrown together .XSD setup and the processing time went from 1.25 minutes to 15 seconds. Once I do a proper redesign, I can only guess I'll shave a few more seconds off of that. Thank you all for you comments. I'll try LINQ again some other day on a better database.

1条回答
萌系小妹纸
2楼-- · 2019-06-22 08:56

There are a few things that I spot in your code:

  1. You query the database multiple times for each item in the 'var vehicles' query, you might want to rewrite that query so that less database queries are needed.
  2. When you don't need all the properties of the queried entity, or need sub entities of that entity, it's better for performance to use an anonymous type in your select. LINQ to SQL will analyze this and retrieve less data from your database. Such a select might look like this: select new { a.VehicleId, a.Name }
  3. The query in the GetPartsAndOilCosts can be optimized by putting the calculation cost.totalCost * (1 + OVERHEAD_RATE) in the LINQ query. This way the query can be executed in the database completely, which should make it much faster.
  4. You are doing a Count() on the var vehicles query, but you only use it for determining the size of the array. While LINQ to SQL will make a very efficient SELECT count(*) query of it, it takes an extra round trip to the database. Besides that (depending on your isolation level) the time you start iterating the query an item could be added. In that case your array is too small and an ArrayIndexOutOfBoundsException will be thrown. You can simply use .ToArray() on the query or create a List<EquipmentTable> and call .ToArray() on that. This will normally be fast enough especially when you only have only 380 items in this collection and it will certainly be faster than having an extra roundtrip to the database (the count).
  5. As you probably already expect, the amount of database queries are the actual problem. Switching between struct array or DataTable will not perform much different.
  6. After you optimized away as much queries that you could, start analyzing the queries left (using SQL profiler) and optimize these queries using the Index tuning wizard. It will propose some new indexes for you, that could speed things up considerably.

A little extra explanation for point #1. What you're doing here is a bit like this:

var query = from x in A select something;

foreach (var row in query)
{
    var query2 = from y in data where y.Value = row.Value select something;

    foreach (var row2 in query2)
    {
        // do some computation.
    }
}

What you should try to accomplish is to remove the query2 subquery, because it is executing on each row of the top query. So you could end up with something like this:

var query =
    from x in A
    from y in B
    where x.Value == y.Value
    select something;

foreach (var row in query)
{
}

Of course this example is simplistic and in real life it gets get pretty complicated (as you’ve already noticed). In your case also because you've got multiple of those 'sub queries'. It can take you some time to get this right, especially with your lack of knowledge of LINQ to SQL (as you said yourself).

If you can't figure it out, you can always ask again here at Stackoverflow, but please remember to strip your problem to the smallest possible thing, because it's no fun to read over someone's mess (we're not getting paid for this) :-) Good luck.

查看更多
登录 后发表回答