How to convert Linux cron jobs to “the Amazon way”

2020-05-10 21:02发布

For better or worse, we have migrated our whole LAMP web application from dedicated machines to the cloud (Amazon EC2 machines). It's going great so far but the way we do crons is sub-optimal. I have a Amazon-specific question about how to best manage cron jobs in the cloud using "the Amazon way".

The problem: We have multiple webservers, and need to run crons for batch jobs such as creating RSS feeds, triggering emails, many different things actually. BUT the cron jobs need to only run on one machine because they often write to the database so would duplicate the results if run on multiple machines.

So far, we designated one of the webservers as the "master-webserver" and it has a few "special" tasks that the other webservers don't have. The trade-off for cloud computing is reliability - we don't want a "master-webserver" because it's a single point of failure. We want them to all be identical and to be able to upscale and downscale without remembering not to take the master-webserver out of the cluster.

How can we redesign our application to convert Linux cron jobs into transitory work items that don't have a single point of failure?

My ideas so far:

  • Have a machine dedicated to only running crons. This would be a little more manageable but would still be a single-point-of-failure, and would waste some money having an extra instance.
  • Some jobs could conceivably be moved from Linux crons to MySQL Events however I'm not a big fan of this idea as I don't want to put application logic into the database layer.
  • Perhaps we can run all crons on all machines but change our cron scripts so they all start with a bit of logic that implements a locking mechanism so only one server actually takes action and the others just skip. I'm not a fan of this idea as it sounds potentially buggy and I would prefer to use a Amazon best-practice rather than rolling our own.
  • I'm imagining a situation where jobs are scheduled somewhere, added to a queue and then the webservers could each be a worker, that can say "hey, I'll take this one". Amazon Simple Workflow Service sounds exactly this kind of thing but I don't currently know much about it so any specifics would be helpful. It seems kind of heavy-weight for something as simple as a cron? Is it the right service or is there a more suitable Amazon service?

Update: Since asking the question I have watched the Amazon Simple Workflow Service webinar on YouTube and noticed at 34:40 (http://www.youtube.com/watch?v=lBUQiek8Jqk#t=34m40s) I caught a glimpse of a slide mentioning cron jobs as a sample application. In their documentation page, "AWS Flow Framework samples for Amazon SWF", Amazon say they have sample code for crons:

... > Cron jobs In this sample, a long running workflow periodically executes an activity. The ability to continue executions as new executions so that an execution can run for very extended periods of time is demonstrated. ...

I downloaded the AWS SDK for Java (http://aws.amazon.com/sdkforjava/) and sure enough buried within a ridiculous layers of folders there is some java code (aws-java-sdk-1.3.6/samples/AwsFlowFramework/src/com/amazonaws/services/simpleworkflow/flow/examples/periodicworkflow).

The problem is, if I'm honest, this doesn't really help as it's not something I can easily digest with my skillset. The same sample is missing from the PHP SDK and there doesn't seem to be a tutorial that walks though the process. So basically, I'm still hunting for advice or tips.

13条回答
我只想做你的唯一
2楼-- · 2020-05-10 21:23

If you already have a Redis service up, this looks like a good solution:

https://github.com/kvz/cronlock

Read more: http://kvz.io/blog/2012/12/31/lock-your-cronjobs/

查看更多
放荡不羁爱自由
3楼-- · 2020-05-10 21:24

I signed up for Amazon Gold support to ask them this question, this was their response:

Tom

I did a quick poll of some of my colleagues and came up empty on the cron, but after sleeping on it I realised the important step may be limited to locking. So I looked for "distributed cron job locking" and found a reference to Zookeeper, an Apache project.

http://zookeeper.apache.org/doc/r3.2.2/recipes.html

http://highscalability.com/blog/2010/3/22/7-secrets-to-successfully-scaling-with-scalr-on-amazon-by-se.html

Also I have seen reference to using memcached or a similar caching mechanism as a way to create locks with a TTL. In this way you set a flag, with a TTL of 300 seconds and no other cron worker will execute the job. The lock will automatically be released after the TTL has expired. This is conceptually very similar to the SQS option we discussed yesterday.

Also see; Google's chubby http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/chubby-osdi06.pdf

Let me know if this helps, and feel free to ask questions, we are very aware that our services can be complex and daunting to both beginners and seasoned developers alike. We are always happy to offer architecture and best practice advice.

Best regards,

Ronan G. Amazon Web Services

查看更多
可以哭但决不认输i
4楼-- · 2020-05-10 21:24

If you're willing to use a non-AWS service, then you might check out Microsoft Azure. Azure offers a great job scheduler.

查看更多
老娘就宠你
5楼-- · 2020-05-10 21:36

The "Amazon" way is to be distributed, meaning bulky crons should be split into many smaller jobs and handed to the right machines.

Using SQS queue with type set to FIFO, glue it together to ensure each job is executed by only one machine. It also tolerates failure since the queues will buffer until a machine spins back up.

FIFO Exactly-Once Processing: A message is delivered once and remains available until a consumer processes and deletes it. Duplicates are not introduced into the queue.

Also consider whether you really need to 'batch' these operations. What happens if one night's updates are considerably larger than expected? Even with dynamic resourcing, your processing could be delayed waiting for enough machines to spin up. Instead, store your data in SDB, notify machines of updates via SQS, and create your RSS feed on the fly (with caching).

Batch jobs are from a time when processing resources were limited and 'live' services took precedence. In the cloud, this is not the case.

查看更多
做自己的国王
6楼-- · 2020-05-10 21:37

What we do is we have one particular server that is part of our web application cluster behind an ELB also assigned a specific DNS name so that we can run the jobs on that one specific server. This also has the benefit that if that job causes that server to slow down, the ELB will remove it from the cluster and then return it once the job is over and it gets healthy again.

Works like a champ.

查看更多
男人必须洒脱
7楼-- · 2020-05-10 21:40

On 12/Feb/16 Amazon blogged about Scheduling SSH jobs using AWS Lambda. I think this answers the question.

查看更多
登录 后发表回答