How to check status of URLs from text file using b

2020-06-03 05:02发布

I have to check the status of 200 http URLs and find out which of these are broken links. The links are present in a simple text file (say URL.txt present in my ~ folder). I am using Ubuntu 14.04 and I am a Linux newbie. But I understand the bash shell is very powerful and could help me achieve what I want.

My exact requirement would be to read the text file which has the list of URLs and automatically check if the links are working and write the response to a new file with the URLs and their corresponding status (working/broken).

5条回答
在下西门庆
2楼-- · 2020-06-03 05:41

Herewith my full script that checks URLs listed in a file passed as an argument e.g. 'checkurls.sh listofurls.txt'.

What it does:

  • check url using curl and return HTTP status code
  • send email notifications when url returns other code than 200
  • create a temporary lock file for failed urls (file naming could be improved)
  • send email notification when url becoms available again
  • remove lock file once url becomes available to avoid further notifications
  • log events to a file and handle increasing log file size (AKA log rotation, uncomment echo if code 200 logging required)

Code:

#!/bin/sh

EMAIL=" your@email.com"
DATENOW=`date +%Y%m%d-%H%M%S`
LOG_FILE="checkurls.log"
c=0

while read url
do
  ((c++))
  LOCK_FILE="checkurls$c.lock"
  urlstatus=$(/usr/bin/curl -H 'Cache-Control: no-cache' -o /dev/null --silent --head --write-out '%{http_code}' "$url" )

  if [ "$urlstatus" = "200" ]
   then
    #echo "$DATENOW OK $urlstatus connection->$url" >> $LOG_FILE
    [ -e $LOCK_FILE ] && /bin/rm -f -- $LOCK_FILE > /dev/null && /bin/mail -s "NOTIFICATION URL OK: $url" $EMAIL <<< 'The URL is back online'
else
    echo "$DATENOW FAIL $urlstatus connection->$url" >> $LOG_FILE
    if [ -e $LOCK_FILE ]
     then
        #no action - awaiting URL to be fixed
        :
    else
        /bin/mail -s "NOTIFICATION URL DOWN: $url" $EMAIL <<< 'Failed to reach or URL problem'
        /bin/touch $LOCK_FILE
    fi
  fi
done < $1

# REMOVE LOG FILE IF LARGER THAN 100MB
# alow up to 2000 lines average
maxsize=120000
size=$(/usr/bin/du -k "$LOG_FILE" | /bin/cut -f 1)
if [ $size -ge $maxsize ]; then
     /bin/rm -f -- $LOG_FILE > /dev/null
     echo "$DATENOW LOG file [$LOG_FILE] has been recreated" > $LOG_FILE
else
     #do nothing
     :
fi

Please note that changing order of listed urls in text file will affect any existing lock files (remove all .lock files to avoid confusion). It would be improved by using url as file name but certain characters such as : @ / ? & would have to be handled for operating system.

查看更多
【Aperson】
3楼-- · 2020-06-03 05:42

if your input file contains one url per line you can use a script to read each line, then try to ping the url, if ping success then the url is valid

#!/bin/bash
INPUT="Urls.txt"
OUTPUT="result.txt"
while read line ;
do
  if ping -c 1 $line &> /dev/null
  then
      echo "$line valid" >> $OUTPUT
  else
      echo "$line not valid " >> $OUTPUT
  fi
done < $INPUT
exit

ping options :

-c count
      Stop after sending count ECHO_REQUEST packets. With deadline option, ping waits for count ECHO_REPLY packets, until the timeout expires.

you can use this option as well to limit waiting time

 -W timeout
      Time to wait for a response, in seconds. The option affects only timeout in absense
      of any responses, otherwise ping waits for two RTTs.
查看更多
放我归山
4楼-- · 2020-06-03 05:45

I created a file "checkurls.sh" and placed it in my home directory where the urls.txt file is also located. I gave execute privileges to the file using

$chmod +x checkurls.sh

The contents of checkurls.sh is given below:

#!/bin/bash
while read url
do
    urlstatus=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "$url" )
    echo "$url  $urlstatus" >> urlstatus.txt
done < $1

Finally, I executed it from command line using the following -

$./checkurls.sh urls.txt

Voila! It works.

查看更多
▲ chillily
5楼-- · 2020-06-03 05:52

What about to add some parallelism to the accepted solution. Lets modify the script chkurl.sh to be little easier to read and to handle just one request at a time:

#!/bin/bash
URL=${1?Pass URL as parameter!}
curl -o /dev/null --silent --head --write-out "$URL %{http_code} %{redirect_url}\n" "$URL"

And now you check your list using:

cat URL.txt | xargs -P 4 -L1 ./chkurl.sh

This could finish the job up to 4 times faster.

查看更多
趁早两清
6楼-- · 2020-06-03 06:02
#!/bin/bash
while read -ru 4 LINE; do
    read -r REP < <(exec curl -IsS "$LINE" 2>&1)
    echo "$LINE: $REP"
done 4< "$1"

Usage:

bash script.sh urls-list.txt

Sample:

http://not-exist.com/abc.html
https://kernel.org/nothing.html
http://kernel.org/index.html
https://kernel.org/index.html

Output:

http://not-exist.com/abc.html: curl: (6) Couldn't resolve host 'not-exist.com'
https://kernel.org/nothing.html: HTTP/1.1 404 Not Found
http://kernel.org/index.html: HTTP/1.1 301 Moved Permanently
https://kernel.org/index.html: HTTP/1.1 200 OK

For everything, read the Bash Manual. See man curl, help, man bash as well.

查看更多
登录 后发表回答