periodically delete entries in objectify

2019-08-15 06:43发布

I'm using Google App Engine with Objectify and would like to delete some entries in the db every 5 minutes. What would be the best way to accomplish this? Should I use Google App Engine's ThreadManager or a cron job? Or is there another way?

2条回答
做个烂人
2楼-- · 2019-08-15 07:29

Sounds like you want, every 5 minutes, to:

  • Write hundreds of thousands of entities
  • Aggregate hundreds of thousands of entities
  • Delete hundreds of thousands of entities

It's possible to do this with map/reduce. However, it will be expensive (hundreds of dollars per day), and you're going to have timing issues - especially when the task queue backs up.

You should strongly consider storing this data outside GAE. Get a Google Compute Engine account and set up a mongodb or redis instance there. Or even host it on AWS. GAE is not well suited for this sort of workload, but it's not "all or nothing" - you can easily work with services in other parts of the cloud.

查看更多
不美不萌又怎样
3楼-- · 2019-08-15 07:30

Cron sounds like fitting the requirement here, but I'm worried about the scale of entities that need to be deleted. (Up to a few hundred thousands every five minutes according to the comments). Deleting that many entities takes a considerable amount of time, most likely more than the five minutes period, and might even be more than the 10 minutes deadline for front-end cron handlers.

One possible solution is to do the deletion from the backend instance, since backends could run without any deadline. Crons could be used to kick up a process that queries for to-be-deleted entities, fetch their keys using key-only query, and then delete the entities in multiple background threads.

Since the process could run indefinitely, after the threads report that the deletion finishes, you could immediately query again and delete the next set of entities. You could use a global in-memory lock in the backend to ensure subsequent cron requests are not kicking up a separate process, but silently exit if it detects that the process already runs. So here the cron is used only as a keep-alive signal for the deletion process.

As a side note, please note that querying and deleting entities this frequently and at this scale might be prohibitively expensive in terms of datastore operation cost.

查看更多
登录 后发表回答