I desperately need help with a query that's been causing a lot of grief over the past 6 months on a high traffic website. I am a frontend developer with ability to write simple SQL queries and so I am unable to fix this issue on my own. The query is now locking mysql database frequently for lack of memory or CPU or interference from other queries on the VPS. I upgraded the hardware, but that alone didn't resolve the issue. So here's a description of what the query is trying to do:
User accesses a particular URL (say, the_source_url). Application tries to fetch related source_urls that other users (who have also accessed the_source_url) have accessed earlier, sorted by most frequently accessed to least frequently accessed. Basically, application is trying to find users with similar interest and show other pages they have accessed earlier.
This is the "complex query" that I wrote when I was young/stupid and the website had no traffic:
SELECT DISTINCT(SOURCE_URL), COUNT(SOURCE_URL) CATCOUNT
FROM topsources
WHERE SOURCE_URL <> ?
AND USER_ID IN (SELECT DISTINCT(USER_ID)
FROM topsources WHERE SOURCE_URL = ?)
GROUP BY SOURCE_URL ORDER BY CATCOUNT DESC
This is the table structure:
`topsources` (
`USER_ID` varchar(255) NOT NULL,
`DATE_AND_HOUR` varchar(255) NOT NULL,
`UPDATED_TIME` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`ITEM_ID` int(11) NOT NULL,
`SOURCE_URL` varchar(100) NOT NULL,
`FEED_PAGE_URL` varchar(255) NOT NULL,
`CATEGORY_URL` varchar(100) NOT NULL,
`REFERRER` varchar(2048) DEFAULT NULL,
PRIMARY KEY (`USER_ID`,`DATE_AND_HOUR`(30),`ITEM_ID`),
KEY `USER_ID` (`USER_ID`),
KEY `FEED_PAGE_URL` (`FEED_PAGE_URL`),
KEY `SOURCE_URL` (`SOURCE_URL`),
KEY `CATEGORY_URL` (`CATEGORY_URL`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Please note that I have already tried doing inner join instead of subselect but that doesn't work. i.e., the below query doesn't return the same result as that of the above query.
SELECT DISTINCT(ts.SOURCE_URL), COUNT(ts.SOURCE_URL) CATCOUNT FROM topsources ts INNER JOIN topsources tsi ON ts.USER_ID = tsi.USER_ID AND tsi.SOURCE_URL = ? WHERE ts.SOURCE_URL <> ? AND ts.CATEGORY_URL = ? GROUP BY ts.SOURCE_URL ORDER BY ts.CATCOUNT DESC