Requirements
I have a web app which allow users schedule some social media tasks like posting on Facebook or Twitter.
Each user can tell the app to publish on his social media accounts at any time (14:00, 15:11, 17:54...).
Besides this, I need to complete other tasks for each user every day such as getting their followers/friends or who unfollowed them on Twitter.
Situation
So far, I have had a file for each task (post.php, getFollowers.php, analytics.php...). For example:
post.php
I have created a cron job for this script which check every minute if some post must be published. Let's assume we run the script and it finds three users who want to tweet at this time, it will iterate the users with a foreach loop and post in every accounts.
...the other scripts do the same: get every users who want to do something, create a queue and iterate it.
The problems
- Posting task requires being completed on time.
- Some long tasks like get followers require running every day.
(1) Posting on Twitter and Facebook takes 30-40s, so if five users want to post at 14:00, it will be late for the 3, 4 and 5.
(2) Getting some followers of one user takes 40-60s, so just with 1000 users the script would spend 11-16h which definitely is not scalable. I should be able to get this task done in just 2-3h.
Solution?
I have thought I could solve both problems by separating user tasks and executing a process for each user.
Is this a correct and scalable solution? How would you solve these problems in a scalable way?
Thanks in advance.
Use a managed, distributed scheduled task service, such as AWS Elastic Beanstalk Worker Tier or IronWorker.
With AWS EB, you would include in your project a
cron.yaml
file containing a config such as:Which will trigger a POST request to
http://localhost/post
every minute.I would also suggest that the scheduled task itself not send the posts, but rather trigger other, multiple, tasks to do so. Using AWS EB, you would do so using the AWS SDK for PHP:
This will trigger a POST request to your configured URL for the Worker Tier (ie.
http://localhost/worker
) for every message with the JSON encoded data in the body.This approach allows you to better scale with the number of posts to send simultaneously.
Use a queue and worker system.
The queue, eg: Amazon SQS:
The worker:
The trick is you have one queue, and then as many worker processes/servers as is necessary to keep the queue from growing continuously.