Better query strategy to sort files by file hash f

2019-06-01 09:46发布

I've wrote this query without much thought but as a beginner I'm almost sure it could be written better.

Here it's:

    SELECT filehash, filename, filesize, group_files
      FROM files
INNER JOIN (  SELECT filehash group_id,
                     COUNT(filehash) group_files
                FROM files
            GROUP BY filehash) groups
        ON files.filehash = groups.group_id
  ORDER BY group_files DESC,
           filesize DESC

Table definition:

CREATE TABLE files (fileid INTEGER PRIMARY KEY AUTOINCREMENT,
                    filename TEXT,
                    filesize INTEGER,
                    filehash TEXT)

Indexes definition:

CREATE INDEX files_filehash_idx
          ON files(filehash)
CREATE UNIQUE INDEX files_filename_idx
                 ON files(filename)
CREATE INDEX files_filesize_idx
          ON files(filesize)

Query EXPLAIN QUERY PLAN:

selectid    order   from    detail
1           0       0       SCAN TABLE files USING COVERING INDEX files_filehash_idx (~1000000 rows)
0           0       1       SCAN SUBQUERY 1 AS groups (~100 rows)
0           1       0       SEARCH TABLE files USING INDEX files_filehash_idx (filehash=?) (~10 rows)
0           0       0       USE TEMP B-TREE FOR ORDER BY

Could you correct me if I'm wrong? Thank you in advance.

标签： sqlite query-optimization

2条回答

女痞

2楼-- · 2019-06-01 10:02

Nope. Looks spot on to me.

I don't think you need the index on filename for this query. There are plans where the index on file size would help, but MySQL isn't using them for this. You might be better off replacing the two separate indexes with a compound index on (filehash, filesize). Or you might not!

0人赞添加讨论(0) 举报

傲

3楼-- · 2019-06-01 10:05

What do you think about this version?

  select filehash, group_concat(filename), filesize, count(*) as group_files
    from files
group by filehash
order by group_files desc

It seems like that will probably run faster. Does it do what you need?

0人赞添加讨论(0) 举报

Better query strategy to sort files by file hash f

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间