I have the following entity:
Project
Having multiple related tables:
Project >-< ProjectAttribute >-< AttributeType
Project >-< ProjectAttribute >-< AttributeValue
Project >-< ProjectTask >------< Task
Project >-< ProjectTask >------< Employee
...
Involving about 15 tables including subjoins.
Now I'm required to find the best matching Projects for a single Project given, by comparing the values and counting the occurrences. e.g. a match of AttributeType and AttributeValue increases the "best match" indicator of a Project by 1.
How can I achieve this?
I think I found out how to query similarities:
SELECT
sp.*,
((
SELECT COUNT(spa.id) FROM project_attribute AS spa WHERE spa.project = sp.id AND spa.attribute = pa.attribute AND spa.attribute_value = pa.attribute_value
) * 1)
+
((
SELECT COUNT(spt.id) FROM project_task AS spt WHERE spt.project = sp.id AND spt.address = pt.address
) * .25)
+
((
SELECT COUNT(spc.id) FROM project_campaign AS spc WHERE spc.project = sp.id AND spc.campaign = pc.campaign
) * 2)
AS similarity
FROM project AS p
LEFT JOIN project_attribute AS pa ON (p.id = pa.project)
LEFT JOIN project_task AS pt ON (p.id = pt.project)
LEFT JOIN project_campaign AS pc ON (p.id = pc.project)
LEFT JOIN project AS sp ON (p.id != sp.id)
WHERE
p.id = <PROJECT-ID>
GROUP BY sp.id
ORDER BY
similarity DESC
I also added a multiplicator to control the impact of similar rows.
The performance isn't the best (~200ms for 235 projects), but it fits my current needs.