Convert TSQL to Linq to Entities

2019-07-17 07:06发布

问题:

I have 2 table named FileList and ExtensionsCategory :

FileList
-------------------------------------------------------------
ID    Path         Extension

1     C:\7.pdf        pdf
2     D:\3.png        png
3     C:\1.mp3        mp3
4     D:\32.pdf       pdf

and

ExtensionsCategory 
------------------------------------------------------------
ID     Extension       Category

1        mp3           Multimedia
2        pdf           Document
3        png           Photo

there isn't any relation between tables.

now i want to know each category contain how many files ? i wrote the following TSQL query in SSMS and work fine :

SELECT 
     Category
    ,SUM(FileCount) AS TotalFileCount 
FROM    
    (SELECT 
         COUNT(*) AS FileCount
        ,(SELECT ExtensionsCategories.Category FROM ExtensionsCategories WHERE FileList.Extension = ExtensionsCategories.Extension) AS Category
    FROM FileList GROUP BY Extension) AS T1 GROUP BY Category

i want know is there any improved way to do this without Joins ?

i try to write it using Linq to Entities, this is my code but i can't complete it :

    var a = _dbContext.FileLists
        .Select(fl => new
        {
            Extension = fl.Extension,
            Count = _dbContext.FileLists.Count(),
            Cat =
                _dbContext.ExtensionsCategories.Where(w => w.Extension == fl.Extension).Select(s => s.Category)
        }).GroupBy(g => g.Extension);

回答1:

This should give you correct result:-

var result = from f in _dbContext.FileLists
             group f by f.Extension into g
             let firstCategory = _dbContext.ExtensionsCategories
                                           .FirstOrDefault(x => x.Extension == g.Key)
             select new
                    {
                        Category = firstCategory != null ? firstCategory.Category : "",
                        TotalFileCount = g.Count()
                    };

Or if you need with Method Syntax:-

var result = _dbContext.FileLists.GroupBy(e => e.Extension)
              .Select(x =>
                  {
                      var firstCategory = _dbContext.ExtensionsCategories
                                           .FirstOrDefault(z => z.Extension == x.Key);
                      return new
                        {
                           Category = firstCategory != null ? firstCategory.Category : "",
                           TotalFileCount = x.Count()
                        };
                  });

Fiddle with Linq-to-objects.

Update:

If a category can contain multiple extension then you can use below query:-

var result = extensionCat.GroupBy(x => x.Category)
                      .Select(x =>
                         new
                          {
                              Category = x.Key,
                              TotalFileCount = fileList.Count(f => x
                                                    .Any(z => z.Extension == f.Extension))
                          }
                      );


回答2:

There is another way to get the same results which might be faster.

First GroupBy file extension and Count them inside the GroupBy element selector. There you can also get the category name for the keys which are in this case file extensions:

var result =
    Files
    .GroupBy(x => x.Extension, (ext, fs) => new 
    { 
        Extension = ext, 
        Category = Categories.Single(c => c.Extension == ext).Name, 
        FileCount = fs.Count() 
    });

This will produce the following query:

SELECT [t1].[Extension], (
    SELECT [t2].[Name]
    FROM [Category] AS [t2]
    WHERE [t2].[Extension] = [t1].[Extension]
    ) AS [Category], [t1].[value] AS [FileCount]
FROM (
    SELECT COUNT(*) AS [value], [t0].[Extension]
    FROM [File] AS [t0]
    GROUP BY [t0].[Extension]
    ) AS [t1]

@RahulSingh version on the other hand produces this:

-- Region Parameters
DECLARE @p0 NVarChar(1000) = ''
-- EndRegion
SELECT 
    (CASE 
        WHEN EXISTS(
            SELECT TOP (1) NULL AS [EMPTY]
            FROM [Category] AS [t2]
            WHERE [t2].[Extension] = [t1].[Extension]
            ) THEN (
            SELECT [t4].[Name]
            FROM (
                SELECT TOP (1) [t3].[Name]
                FROM [Category] AS [t3]
                WHERE [t3].[Extension] = [t1].[Extension]
                ) AS [t4]
            )
        ELSE CONVERT(NVarChar(50),@p0)
     END) AS [Category], (
    SELECT COUNT(*)
    FROM [File] AS [t5]
    WHERE (([t1].[Extension] IS NULL) AND ([t5].[Extension] IS NULL)) OR (([t1].[Extension] IS NOT NULL) AND ([t5].[Extension] IS NOT NULL) AND ([t1].[Extension] = [t5].[Extension]))
    ) AS [TotalFileCount]
FROM (
    SELECT [t0].[Extension]
    FROM [File] AS [t0]
    GROUP BY [t0].[Extension]
    ) AS [t1]

I've tested both queries with 1.000.000 rows and the results are:

vs. the ToList version:

In the last rows of the statistics you can see that short version is about 3 times faster.