I'm attempting to use the packages 'request promise' and 'cheerio' on NodeJS to scrape links from a website, via an Amazon EC2 instance.
I'm able to run my script on my own desktop (outside the Amazon EC2 instance) and scrape the links without any problems.
However, when I run the same script in my Amazon EC2 instance, I get the following error:
Unhandled rejection RequestError: Error: socket hang up
at new RequestError (/home/ec2-user/node_modules/request-promise-core/lib/errors.js:14:15)
at Request.plumbing.callback (/home/ec2-user/node_modules/request-promise-core/lib/plumbing.js:87:29)
at Request.RP$callback [as _callback] (/home/ec2-user/node_modules/request-promise-core/lib/plumbing.js:46:31)
at self.callback (/home/ec2-user/node_modules/request/request.js:186:22)
at emitOne (events.js:96:13)
at Request.emit (events.js:188:7)
at Request.onRequestError (/home/ec2-user/node_modules/request/request.js:845:8)
at emitOne (events.js:96:13)
at ClientRequest.emit (events.js:188:7)
at TLSSocket.socketOnEnd (_http_client.js:346:9)
Does this have something to do with my AWS security group settings? Currently, outbound traffic is configured to 'All Traffic' with destination 0.0.0.0/0, while inbound traffic is only set to the following configuration: SSH, TCP, Port 20, my own IP address. Should I be adding another rule for inbound traffic?
Alternatively, could this have something to do with my instance size? I'm currently using a micro instance that is free under AWS.
Or finally, could it be the case that the website has blocked access from AWS instances?