Is this possible ?
Let's say I have a list of addresses, with a lot of duplicate entries. I need to filter out all the duplicates, because some addresses have slightly different names but the same postalcode and telephone number.
First I do a GROUP BY on postalcode and telephone.
SELECT name, address, postalcode, place, phone
FROM addresses
GROUP BY postalcode, phone
But then I get random names. I would like to get the best name, that is, the name with the most entries per postalcode/phone.
So I thought of the following. Here I use the SUBSTRING_INDEX function to only get the first item in the group_concat (there are no names with the string '~~' in it):
SELECT SUBSTRING_INDEX(
GROUP_CONCAT(DISTINCT name ORDER BY COUNT(name) DESC SEPARATOR '~~')
, '~~', 1),
address,
postalcode,
place,
phone
FROM addresses
GROUP BY postalcode, telephone
but I get an 'invalid use of group function'.
How do I get the GROUP_CONCAT to order by the number of times the name occurs ?
Found a solution myself, with a subquery:
I wonder if it can be done without a subquery.