I want to access the flickr API with a REST request and download the Metadata of approx. 1 Mio photos (maybe more). I want to store them in a .csv file and import them then into a MySQL Database for further processing
I am wondering what is the smartest way to handle such big data. What I am not sure about is how to store them after accessing the website in Python, passing them to the .csv file and from there to the db. Thats one big questionmark.
Whats happening now (for my understanding, see code below) is that a dictionary
is created for every photo
(250 per called URL). This way I would end up with as many dictionaries as photos (1 Mio or more). Is that possible?
All these dictionaries
will be appended to a list. Can I append that many dictionaries to a list? The only reason I want to append the dictionaries to the list is because it seems way easier to save from a list, row per row, to a .csv file.
What you should know is that I am a complete beginner to programming, python or what so ever. My profession is a completely different one and I just started to learn. If you need any further explanations please let me know!
#accessing website
list = []
url = "https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5...1b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description"
soup = BeautifulSoup(urlopen(url)) #soup it up
for data in soup.find_all('photo'):
dict = {
"id": data.get('id'),
"title": data.get('title'),
"tags": data.get('tags'),
"latitude": data.get('latitude'),
"longitude": data.get('longitude'),
}
print (dict)
list.append(dict)
I am working with python 3.3. The reason why I do not pass the data direct into the db is because I cannot get the python connecter for mysql db on my os x 10.6 to run.
Any help is very appreciated. Thank you folks!