I have this problem. Given a users
table that consists of users' username in a social network and friends
table that contain a user's name and a user's friendname like below...
username friendname
John Thomas
Chris James
... I'm trying to write an SQL statement that will if a user is in my network. In other words
is that user a friend or friend of friends?
I've been dancing around this problem and could only come up with this query:
SELECT f2.username, f2.friendname
FROM friends f2
WHERE f2.username IN (
SELECT f1.friendname
FROM friends f1
WHERE f1.username = 'Thomas')
AND f2.friendname <> 'user1'
AND f2.friendname = 'user2';
It basically check if a user if is a friend of my friend i.e. just return null if false.
Trying to figure out how I can expand to go through all my network of friend. I mean not just friend of my friend.
SELECT *
FROM (
SELECT username
FROM friends
START WITH
username = 'myname'
CONNECT BY
friendname = PRIOR username
AND level <= 3
)
WHERE username = 'friendname'
AND rownum = 1
Update the level as necessary: you may search for the third layer friends etc.
If the friendship relationship is symmetric, you should make the following query:
WITH q AS
(
SELECT username, friendname
FROM friends
UNION ALL
SELECT friendname, username
FROM friends
),
f AS
(
SELECT friendname, level
FROM q
START WITH
username = 'Thomas'
CONNECT BY NOCYCLE
username = PRIOR friendname
)
SELECT *
FROM f
WHERE friendname = 'Jo'
AND rownum = 1
This query can be made much faster if you denormalize your table: store two records per friendship, like this:
CREATE TABLE dual_friends (orestes NOT NULL, pylades NOT NULL, CONSTRAINT pk_dualfriends_op PRIMARY KEY (orestes, pylades)) ORGANIZATION INDEX
AS
SELECT username, friendname
FROM friends
UNION ALL
SELECT friendname, username
FROM friends
Then you can just replace the CTE
above with the dual_friends
:
WITH f AS
(
SELECT pylades, level
FROM dual_friends
START WITH
orestes = 'Thomas'
CONNECT BY NOCYCLE
orestes = PRIOR pylades
AND level <= 3
)
SELECT *
FROM f
WHERE pylades = 'Jo'
AND rownum = 1
, which will use the index and be much more efficient, especially if you limit the level to some reasonable value.