How to automate Amazon aws EC2 for scraping [close

2019-06-12 03:34发布

Hi I'd like to set up some amazon EC2 instances (multiple) to scrape data from arbitrary sites. The way I imagine it being set up is one amazon instance that's a master which will programatically set up other instances to scrape. Right now I have php scripts that can scrape the way I want it to, but how can I set up my master server to...

1) make other ec2 instances

2) communicate between the master server and slave servers

2条回答
Animai°情兽
2楼-- · 2019-06-12 03:42

You could build this yourself by having your master launch worker instances when needed, send them scrape requests, terminate them when needed and generally code all the orchestration yourself and try to make it highly available. But that's not a good way to do this. Instead, you should take advantage of AWS features.

You could use a combination of SQS and Auto Scaling Groups. Your master instance would add scrape requests to an SQS queue and you would have an Auto Scaling Group triggered on SQS queue depth that launches new worker instances - this helps you to automate the launching of workers (scrapers) when the workload is high and terminate the workers when the workload is low. Those worker instances would pull a scrape request from the SQS queue, do the scraping work, and then repeat.

Another way to do this would be to use AWS Lambda. You can trigger Lambda functions from SQS or from SNS. Have your master add scrape requests to an SQS queue or have the master publish requests to an SNS topic, and then drive a web-scraper Lambda function (written in JavaScript) from the SQS queue or SNS topic.

Personally, I would investigate the Lambda route first.

查看更多
老娘就宠你
3楼-- · 2019-06-12 03:59

This page shows you how to write a PHP script to launch an EC2 instance.

http://blogs.aws.amazon.com/php/post/TxMLFLE50WUAMR/Provision-an-Amazon-EC2-Instance-with-PHP

You could then either SSH as the article suggests (but with Bash), or have the new instance initiate communication with the master on a preconfigured static address.

查看更多
登录 后发表回答