We want to import 100 thousand rows from a .csv file into a Cassandra table.
There is no unique value for each row, for this reason we want to add UUID to each imported row, how do we do this automatically while importing data from CSV file.
Sample row (first row is column names) from .CSV file
DateTime,Latitude,Longitude,Depth,Magnitude,MagType,NbStations,Gap,Distance,RMS,Source,EventID,Version
2014-09-11T12:36:11.000+00:00,67.689,-162.763,14.6,3.9,ml,,,,0.79,ak,ak11387003,1410441826879
Want to add UUID to each row like below
UID, DateTime,Latitude,Longitude,Depth,Magnitude,MagType,NbStations,Gap,Distance,RMS,Source,EventID,Version
c37d661d-7e61-49ea-96a5-68c34e83db3a,2014-09-11T12:36:11.000+00:00,67.689,-162.763,14.6,3.9,ml,,,,0.79,ak,ak11387003,1410441826879
There's no way to do that directly from CQL's COPY command, but instead you could process the CSV file outside of Cassandra first.
For example, here's a Python script that will read in from file in.csv, append a UUID column to each row, and write out to out.csv:
#!/usr/bin/python
# read in.csv adding one column for UUID
import csv
import uuid
fin = open('in.csv', 'rb')
fout = open('out.csv', 'w')
reader = csv.reader(fin, delimiter=',', quotechar='"')
writer = csv.writer(fout, delimiter=',', quotechar='"')
firstrow = True
for row in reader:
if firstrow:
row.append('UUID')
firstrow = False
else:
row.append(uuid.uuid4())
writer.writerow(row)
The resulting file could be imported using CQL COPY (after you've created your schema accordingly). If you use this example, make sure to read up on Python's uuid functions to choose the one you need (probably uuid1
or uuid4
).