I have a CSV file with 2 columns and 20,000 rows I would like to import into Google Cloud Datastore. I'm new to the Google Cloud and NoSQL databases. I have tried using dataflow but need to provide a Javascript UDF function name. Does anyone have an example of this? I will be querying this data once it's in the datastore. Any advice or guidance on how to create this would be appreciated.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
Using Apache Beam, you can read a CSV file using the TextIO
class. See the TextIO documentation.
Pipeline p = Pipeline.create();
p.apply(TextIO.read().from("gs://path/to/file.csv"));
Next, apply a transform that will parse each row in the CSV file and return an Entity
object. Depending on how you want to store each row, construct the appropriate Entity
object. This page has an example of how to create an Entity
object.
.apply(ParDo.of(new DoFn<String, Entity>() {
@ProcessElement
public void processElement(ProcessContext c) {
String row = c.element();
// TODO: parse row (split) and construct Entity object
Entity entity = ...
c.output(entity);
}
}));
Lastly, write the Entity
objects to Cloud Datastore. See the DatastoreIO documentation.
.apply(DatastoreIO.v1().write().withProjectId(projectId));
回答2:
Simple in python, but can easily adapt to other langauges. Use the split()
method to loop through the lines and comma-separated values:
from google.appengine.api import urlfetch
from my.models import MyModel
csv_string = 'http://someplace.com/myFile.csv'
csv_response = urlfetch.fetch(csv_string, allow_truncated=True)
if csv_response.status_code == 200:
for row in csv_response.content.split('\n'):
row_values = row.split(',')
# csv values are strings. Cast them if they need to be something else
new_entry = MyModel(
property1 = row_values[0],
property2 = row_values[1]
)
new_entry.put()
else:
print 'cannot load file: {}'.format(csv_string)