I am trying to achieve what is shown here: I have 2 CSV Files, diease_mstr and Test_mstr Now in Test_mstr, I have many test to disease ID records, which means none of them are unique. The disease ID points to disease_mstr file. In disease_mstr file I have only 2 fields, ID and Disease_name (disease name is unique).
Now, I am creating 3 nodes with labels 1) Tests (only "testname" property) which will have unique tests (total 345 unique testnames)
**Properties :**
a) testname
2) Linknode (pulled entire Test_mstr file) also pulled "disease_name" for corresponding disease_ID from Disease_mstr File
**Properties**
a)tname
b)dname
c)did
3) Disease (pulled form disease_mstr) file.
**Properties**
a)did
b)diseasename
Afterwhich I run create relationships
1)MATCH (t:Tests),(n:Linknode) where t.testname = n.tname CREATE (n)-[r:TEST_2]->(t) RETURN n,r,t
2)MATCH (d:Disease), (l:Linknode) where d.did = l.did MERGE (d)-[r:FOR_DISEASE]->(l) RETURN d,r,l
To get the desired result as shown in image, I run following cypher command :
MATCH (d:Disease)-[r2:FOR_DISEASE]->(l:Linknode)-[r:TEST_2]->(t:Tests) RETURN l,r,t,r2 LIMIT 25
Can someone please help me create 2 more relationships which is marked and linked in image with BLUE and GREEN lines?.
Sample files and images can be accessed in my google folder link
Is your goal to link all diseases to tests so that for any disease you can find out which tests are relevant and for each test, which diseases it tests for?
If so, you are nearly there.
You don't need the link nodes other than to help you during linking the tests to the diseases. In your current scenario you're treating the link nodes as you would if you were creating a relational database. They won't add any value in your graph db. You can create a single relationship between diseases and tests which will do all the work.
Here's a step by step way to load your database. (It probably isn't the most efficient, but it's easy to follow and it works.)
Normalise and load your tests:
Load your diseases (these looked normalised to me)
Load your link nodes:
Now you can create a direct relationship between the diseases and tests with the following query:
This last query will find all the link nodes for each disease and extract the test name. It then looks up the test and joins it directly to its corresponding disease.
The link nodes are redundent now, so you can delete them if you wish.
To create the 'blue lines', which I assume are meant to show where tests have diseases in common, run the query below:
The match clause finds all disease pairs with a common test, the where clause ensures a link is created in only one direction and the merge clause ensures only one link is created.