Order by “best match” based on relational tables v

2019-09-14 16:00发布

问题:

I have the following entity:

Project

Having multiple related tables:

Project >-< ProjectAttribute >-< AttributeType
Project >-< ProjectAttribute >-< AttributeValue
Project >-< ProjectTask >------< Task
Project >-< ProjectTask >------< Employee
...

Involving about 15 tables including subjoins.

Now I'm required to find the best matching Projects for a single Project given, by comparing the values and counting the occurrences. e.g. a match of AttributeType and AttributeValue increases the "best match" indicator of a Project by 1.

How can I achieve this?

回答1:

I think I found out how to query similarities:

SELECT 
    sp.*,
    ((
        SELECT COUNT(spa.id) FROM project_attribute AS spa WHERE spa.project = sp.id AND spa.attribute = pa.attribute AND spa.attribute_value = pa.attribute_value
    ) * 1)
    + 
    ((
        SELECT COUNT(spt.id) FROM project_task AS spt WHERE spt.project = sp.id AND spt.address = pt.address    
    ) * .25)
    + 
    ((
        SELECT COUNT(spc.id) FROM project_campaign AS spc WHERE spc.project = sp.id AND spc.campaign = pc.campaign  
    ) * 2)
    AS similarity
FROM project AS p
LEFT JOIN project_attribute AS pa ON (p.id = pa.project)
LEFT JOIN project_task AS pt ON (p.id = pt.project)
LEFT JOIN project_campaign AS pc ON (p.id = pc.project)
LEFT JOIN project AS sp ON (p.id != sp.id)
WHERE 
    p.id = <PROJECT-ID>
GROUP BY sp.id
ORDER BY 
    similarity DESC

I also added a multiplicator to control the impact of similar rows.

The performance isn't the best (~200ms for 235 projects), but it fits my current needs.