可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I want to execute my scrapy crawler from cron job .
i create bash file getdata.sh where scrapy project is located with it's spiders
#!/bin/bash
cd /myfolder/crawlers/
scrapy crawl my_spider_name
My crontab looks like this , I want to execute it in every 5 minute
*/5 * * * * sh /myfolder/crawlers/getdata.sh
but it don't works , whats wrong , where is my error ?
when I execute my bash file from terminal sh /myfolder/crawlers/getdata.sh it works fine
回答1:
I solved this problem including PATH into bash file
#!/bin/bash
cd /myfolder/crawlers/
PATH=$PATH:/usr/local/bin
export PATH
scrapy crawl my_spider_name
回答2:
Adding the following lines in crontab -e
runs my scrapy crawl at 5AM every day. This is a slightly modified version of crocs' answer
PATH=/usr/bin
* 5 * * * cd project_folder/project_name/ && scrapy crawl spider_name
Without setting $PATH
, cron would give me an error "command not found: scrapy". I guess this is because /usr/bin is where scripts to run programs are stored in Ubuntu.
Note that the complete path for my scrapy project is /home/user/project_folder/project_name
. I ran the env command in cron and noticed that the working directory is /home/user
. Hence I skipped /home/user
in my crontab above
The cron log can be helpful while debugging
grep CRON /var/log/syslog
回答3:
Another option is to forget using a shell script and chain the two commands together directly in the cronjob. Just make sure the PATH variable is set before the first scrapy cronjob in the crontab list. Run:
crontab -e
to edit and have a look. I have several scrapy crawlers which run at various times. Some every 5 mins, others twice a day.
PATH=/usr/local/bin
*/5 * * * * user cd /myfolder/crawlers/ && scrapy crawl my_spider_name_1
* 1,13 * * * user cd /myfolder/crawlers/ && scrapy crawl my_spider_name_2
All jobs located after the PATH variable will find scrapy. Here the first one will run every 5 mins and the 2nd twice a day at 1am and 1pm. I found this easier to manage. If you have other binaries to run then you may need to add their locations to the path.
回答4:
For anyone who used pip3
(or similar) to install scrapy
, here is a simple inline solution:
*/10 * * * * cd ~/project/path && ~/.local/bin/scrapy crawl something >> ~/crawl.log 2>&1
Replace:
*/10 * * * *
with your cron pattern
~/project/path
with the path to your scrapy project (where your scrapy.cfg
is)
something
with the spider name (use scrapy list
in your project to find out)
~/crawl.log
with your log file position (in case you want to have logging)
回答5:
Check where scrapy is installed using "which scrapy" command.
In my case, scrapy is installed in /usr/local/bin
.
Open crontab for editing using crontab -e
.
*/5 * * * * cd /myfolder/path && /usr/local/bin/scrapy crawl spider_name
It should work.
Scrapy runs every 5 minutes.
回答6:
does your shell script have execute permission?
e.g. can you do
/myfolder/crawlers/getdata.sh
without the sh?
if you can then you can drop the sh in the line in cron
回答7:
in my case scrapy is in .local/bin/scrapy give the proper path of scraper and name it worK perfect
0 0 * * * cd /home/user/scraper/Folder_of_scriper/ && /home/user/.local/bin/scrapy crawl "name" >> /home/user/scrapy.log 2>&1
/home/user/scrapy.log it use to save the output and error in scrapy.log for check it program work or not
thank you.