I am working on a project where I am using Spark for Data processing. My data is now processed and I need to load the data into Neo4j. After loading into Neo4j, I will be using that to showcase the results.
I wanted all the implementation to de done in Python Programming. But I could't find any library or example on net. Can you please help with links or the libraries or any example.
My RDD is a PairedRDD. And in every tuple, I have to create a relationship.
PairedRDD
Key Value
Jack [a,b,c]
For simplicity purpose, I transformed the RDD to
Key value
Jack a
Jack b
Jack c
Then I have to create relationships between
Jack->a
Jack->b
Jack->c
Based on William Answer, I am able to load a list directly. But this data is throwing the cypher error.
I tried like this:
def writeBatch(b):
print("writing batch of " + str(len(b)))
session = driver.session()
session.run('UNWIND {batch} AS elt MERGE (n:user1 {user: elt[0]})', {'batch': b})
session.close()
def write2neo(v):
batch_d.append(v)
for hobby in v[1]:
batch_d.append([v[0],hobby])
global processed
processed += 1
if len(batch) >= 500 or processed >= max:
writeBatch(batch)
batch[:] = []
max = userhobbies.count()
userhobbies.foreach(write2neo)
b is the list of lists. Unwinded elt is a list of two elements elt[0],elt[1] as key and values.
Error
ValueError: Structure signature must be a single byte value
Thanks Advance.
You can do a
foreach
on your RDD, example :I would however improve the function to batch the writes, but this simple snippet is working for basic implementation
UPDATE WITH EXAMPLE OF BATCHING WRITES
- Which results with