I'm writing a simple service to take data from a couple of sources, munge it together, and use the Google API client to send it to a Google Sheet. Easy peasy works good, the data is not that big.
The issue is that calling .spreadsheets() after building the api service (i.e. build('sheets', 'v4', http=auth).spreadsheets()
) causes a memory jump of roughly 30 megabytes (I did some profiling to separate out where the memory was being allocated). When deployed to GAE, these spikes stick around for long stretches of time (hours at a time sometimes), creeping upwards and after several requests trigger GAE's 'Exceeded soft private memory limit' error.
I am using memcache for the discovery document and urlfetch for grabbing data, but those are the only other services I am using.
I have tried manual garbage collection, changing threadsafe in app.yaml, even things like changing the point at which .spreadsheets() is called, and can't shake this problem. It's also possible that I am simply misunderstanding something about GAE's architecture, but I know the spike is caused by the call to .spreadsheets() and I am not storing anything in local caches.
Is there a way either to 1) reduce the size of the memory spike from calling .spreadsheets() or 2) keep the spikes from staying around in memory (or preferably do both). A very simplified gist is below to give an idea of the API calls and request handler, I can give fuller code if needed. I know similar questions have been asked before, but I can't get it fixed.
https://gist.github.com/chill17/18f1caa897e6a20201232165aca05239
I ran into this when using the spreadsheets API on a small processor with only 20MB of usable RAM. The problem is the google API client pulls in the whole API in string format and stores it as a resource object in memory.
If free memory is an issue, you should construct your own http object and manually make the desired request. See my Spreadsheet() class as an example of how to create a new spreadsheet using this method.
SCOPES = 'https://www.googleapis.com/auth/spreadsheets'
CLIENT_SECRET_FILE = 'client_secret.json'
APPLICATION_NAME = 'Google Sheets API Python Quickstart'
class Spreadsheet:
def __init__(self, title):
#Get credentials from locally stored JSON file
#If file does not exist, create it
self.credentials = self.getCredentials()
#HTTP service that will be used to push/pull data
self.service = httplib2.Http()
self.service = self.credentials.authorize(self.service)
self.headers = {'content-type': 'application/json', 'accept-encoding': 'gzip, deflate', 'accept': 'application/json', 'user-agent': 'google-api-python-client/1.6.2 (gzip)'}
print("CREDENTIALS: "+str(self.credentials))
self.baseUrl = "https://sheets.googleapis.com/v4/spreadsheets"
self.spreadsheetInfo = self.create(title)
self.spreadsheetId = self.spreadsheetInfo['spreadsheetId']
def getCredentials(self):
"""Gets valid user credentials from storage.
If nothing has been stored, or if the stored credentials are invalid,
the OAuth2 flow is completed to obtain the new credentials.
Returns:
Credentials, the obtained credential.
"""
home_dir = os.path.expanduser('~')
credential_dir = os.path.join(home_dir, '.credentials')
if not os.path.exists(credential_dir):
os.makedirs(credential_dir)
credential_path = os.path.join(credential_dir,
'sheets.googleapis.com-python-quickstart.json')
store = Storage(credential_path)
credentials = store.get()
if not credentials or credentials.invalid:
flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE, SCOPES)
flow.user_agent = APPLICATION_NAME
if flags:
credentials = tools.run_flow(flow, store, flags)
else: # Needed only for compatibility with Python 2.6
credentials = tools.run(flow, store)
print('Storing credentials to ' + credential_path)
return credentials
def create(self, title):
#Only put title in request body... We don't need anything else for now
requestBody = {
"properties":{
"title":title
},
}
print("BODY: "+str(requestBody))
url = self.baseUrl
response, content = self.service.request(url,
method="POST",
headers=self.headers,
body=str(requestBody))
print("\n\nRESPONSE\n"+str(response))
print("\n\nCONTENT\n"+str(content))
return json.loads(content)