I've wrote this query without much thought but as a beginner I'm almost sure it could be written better.
Here it's:
SELECT filehash, filename, filesize, group_files
FROM files
INNER JOIN ( SELECT filehash group_id,
COUNT(filehash) group_files
FROM files
GROUP BY filehash) groups
ON files.filehash = groups.group_id
ORDER BY group_files DESC,
filesize DESC
Table definition:
CREATE TABLE files (fileid INTEGER PRIMARY KEY AUTOINCREMENT,
filename TEXT,
filesize INTEGER,
filehash TEXT)
Indexes definition:
CREATE INDEX files_filehash_idx
ON files(filehash)
CREATE UNIQUE INDEX files_filename_idx
ON files(filename)
CREATE INDEX files_filesize_idx
ON files(filesize)
Query EXPLAIN QUERY PLAN:
selectid order from detail
1 0 0 SCAN TABLE files USING COVERING INDEX files_filehash_idx (~1000000 rows)
0 0 1 SCAN SUBQUERY 1 AS groups (~100 rows)
0 1 0 SEARCH TABLE files USING INDEX files_filehash_idx (filehash=?) (~10 rows)
0 0 0 USE TEMP B-TREE FOR ORDER BY
Could you correct me if I'm wrong? Thank you in advance.