Adding UUID for each row being imported from a CSV

2019-02-20 04:43发布

问题:

We want to import 100 thousand rows from a .csv file into a Cassandra table.

There is no unique value for each row, for this reason we want to add UUID to each imported row, how do we do this automatically while importing data from CSV file.

Sample row (first row is column names) from .CSV file

DateTime,Latitude,Longitude,Depth,Magnitude,MagType,NbStations,Gap,Distance,RMS,Source,EventID,Version
2014-09-11T12:36:11.000+00:00,67.689,-162.763,14.6,3.9,ml,,,,0.79,ak,ak11387003,1410441826879

Want to add UUID to each row like below

UID, DateTime,Latitude,Longitude,Depth,Magnitude,MagType,NbStations,Gap,Distance,RMS,Source,EventID,Version
c37d661d-7e61-49ea-96a5-68c34e83db3a,2014-09-11T12:36:11.000+00:00,67.689,-162.763,14.6,3.9,ml,,,,0.79,ak,ak11387003,1410441826879

回答1:

There's no way to do that directly from CQL's COPY command, but instead you could process the CSV file outside of Cassandra first.

For example, here's a Python script that will read in from file in.csv, append a UUID column to each row, and write out to out.csv:

#!/usr/bin/python
# read in.csv adding one column for UUID

import csv
import uuid

fin = open('in.csv', 'rb')
fout = open('out.csv', 'w')

reader = csv.reader(fin, delimiter=',', quotechar='"')
writer = csv.writer(fout, delimiter=',', quotechar='"')

firstrow = True
for row in reader:
    if firstrow:
        row.append('UUID')
        firstrow = False
    else:
        row.append(uuid.uuid4())
    writer.writerow(row)

The resulting file could be imported using CQL COPY (after you've created your schema accordingly). If you use this example, make sure to read up on Python's uuid functions to choose the one you need (probably uuid1 or uuid4).