I'm trying to filter rows from the MySQL table where all the $_POST
data is stored from an online form. Sometimes the user's internet connection stalls or the browser screws up, and the new page after form submission is not displayed (though the INSERT worked and the table row was created). They then hit refresh, and submit their form twice, creating a duplicate row (except for the timestamp and autoincrement id columns).
I'd like to select unique form submissions. This has to be a really common task, but I can't seem to find something that lets me call with DISTINCT
applying to every column except the timestamp and id in a succinct way (sort of like SELECT id, timestamp, DISTINCT everything_else FROM table;
. At the moment, I can do:
CREATE TEMPORARY TABLE IF NOT EXISTS temp1 AS (
SELECT DISTINCT everything,except,id,and,timestamp
FROM table1
);
SELECT * FROM table1 LEFT OUTER JOIN temp1
ON table1.everything = temp1.everything
...
;
My table has 20k rows with about 25 columns (classification features for a machine learning exercise). This query takes forever (as I presume it traverses the 20k rows 20K times?) I've never even let it run to completion. What's the standard practice way to do this?
Note: This question suggests add an index to the relevant columns, but there can be max 16 key parts to an index. Should I just choose the most likely unique ones? I can find about 700 duplicates in 2 seconds this way, but I can't be sure of not throwing away a unique row because I also have to ignore some columns when specifying the index.