I am using the Google Cloud Storage Client Library.
I am trying to open and process a CSV file (that was already uploaded to a bucket) using code like:
filename = '/<my_bucket/data.csv'
with gcs.open(filename, 'r') as gcs_file:
csv_reader = csv.reader(gcs_file, delimiter=',', quotechar='"')
I get the error "argument 1 must be an iterator" in response to the first argument to csv.reader (i.e. the gcs_file). Apparently the gcs_file doesn't support the iterator .next method.
Any ideas on how to proceed? Do I need to wrap the gcs_file and create an iterator on it or is there an easier way?
I think it's better you have your own wrapper/iterator designed for csv.reader. If gcs_file was to support Iterator protocol, it is not clear what next() should return to always accommodate its consumer.
According to csv reader doc, it
It expects a chunk of raw bytes from the underlying file, not necessarily a line. You can have a wrapper like this (not tested):
The key is to read a chunk at a time so that when you have a large file, you don't blow up memory or experience timeout from urlfetch.
Or even simpler. To use iter built in:
Try this:
This isn't ideal though. I've filed a feature request to have GCS files support iterating.