I have a list of about 1 million addresses, and a function to find their latitudes and longitudes. Since some of the records are improperly formatted (or for whatever reason), sometimes the function is not able to return the latitudes and longitudes of some addresses. This would lead to the for loop breaking. So, for each address whose latitude and longitude is successfully retrieved, I want to write it to the output CSV file. Or, perhaps instead of writing line by line, writing in small chunk sizes would also work. For this, I am using df.to_csv
in "append" mode (mode='a'
) as shown below:
for i in range(len(df)):
place = df['ADDRESS'][i]
try:
lat, lon, res = gmaps_geoencoder(place)
except:
pass
df['Lat'][i] = lat
df['Lon'][i] = lon
df['Result'][i] = res
df.to_csv(output_csv_file,
index=False,
header=False,
mode='a', #append data to csv file
chunksize=chunksize) #size of data to append for each loop
But the problem with this is that, it is printing the whole dataframe for each append. So, for n
lines, it would write the whole dataframe n^2
times. How to fix this?