I have a DataFrame
in Databricks which I want to use to create a graph in Cosmos, with one row in the DataFrame
equating to 1 vertex in Cosmos.
When I write to Cosmos I can't see any properties on the vertices, just a generated id.
Get data:
data = spark.sql("select * from graph.testgraph")
Configuration:
writeConfig = {
"Endpoint" : "******",
"Masterkey" : "******",
"Database" : "graph",
"Collection" : "TestGraph",
"Upsert" : "true",
"query_pagesize" : "100000",
"bulkimport": "true",
"WritingBatchSize": "1000",
"ConnectionMaxPoolSize": "100",
"partitionkeydefinition": "/id"
}
Write to Cosmos:
data.write.
format("com.microsoft.azure.cosmosdb.spark").
options(**writeConfig).
save()
Below is the working code to insert records into cosmos DB. go to the below site, click on the download option and select the uber.jar https://search.maven.org/artifact/com.microsoft.azure/azure-cosmosdb-spark_2.3.0_2.11/1.2.2/jar then add in your dependency
spark-shell --master yarn --executor-cores 5 --executor-memory 10g --num-executors 10 --driver-memory 10g --jars "path/to/jar/dependency/azure-cosmosdb-spark_2.3.0_2.11-1.2.2-uber.jar" --packages "com.google.guava:guava:18.0,com.google.code.gson:gson:2.3.1,com.microsoft.azure:azure-documentdb:1.16.1"