unable to pass wget a variable with quotes inside

2019-01-29 07:27发布

问题:

I am trying to script a wget command to download a web page and all it's attachments and jpegs etc.

When I enter the script by hand, it works, but I need to run this over 35000 times to archive an old web site which is outside of my control (international company politics, but I'm the owner of the data).

My problem has been in variablising the session parameters.

My script so far is as follows:

cnt=35209
# initialise the headers
general_settings='-4 -P xyz --restrict-file-names=windows -nc --limit-rate=250k'
html_page_specific='--convert-links --html-extension'
proxy='--proxy-user=xxxxxx --proxy-password=yyyyyyy' 
session="--header=\'Host: mywebsite.com:9090\' --header=\'User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:20.0) Gecko/20100101 Firefox/20.0\'"
address=http://mywebsite.com:9090/browse/item-$cnt

echo $general_settings $proxy $session $cookie $address
echo
echo
echo Getting item-$cnt...

#while [ $cnt -gt 0 ]
#do
#  # get the page
  wget --debug $general_settings $html_page_specific $proxy $session $cookie $address

  # now get the attachments, pdf, txt, jpg, gif, sql, etc...
#  wget -A.pdf  $general_settings -r $proxy $session $cookie $address
#  wget -A.txt  $general_settings -r $proxy $session $cookie $address
#  wget -A.jpg  $general_settings -r $proxy $session $cookie $address
#  wget -A.gif  $general_settings -r $proxy $session $cookie $address
#  wget -A.sql  $general_settings -r $proxy $session $cookie $address
#  wget -A.doc  $general_settings -r $proxy $session $cookie $address
#  wget -A.docx $general_settings -r $proxy $session $cookie $address
#  wget -A.xls  $general_settings -r $proxy $session $cookie $address
#  wget -A.xlsm $general_settings -r $proxy $session $cookie $address
#  wget -A.xlsx $general_settings -r $proxy $session $cookie $address
#  wget -A.xml  $general_settings -r $proxy $session $cookie $address
#  wget -A.ppt  $general_settings -r $proxy $session $cookie $address
#  wget -A.pptx $general_settings -r $proxy $session $cookie $address
#  wget -A.png  $general_settings -r $proxy $session $cookie $address
#  wget -A.ps   $general_settings -r $proxy $session $cookie $address
#  wget -A.mdb  $general_settings -r $proxy $session $cookie $address
#  ((cnt=cnt-1))
#
#done

but when I run the script, I get the following output

Getting item-35209...
Setting --inet4-only (inet4only) to 1
Setting --directory-prefix (dirprefix) to xyz
Setting --restrict-file-names (restrictfilenames) to windows
Setting --no (noclobber) to 1
Setting --limit-rate (limitrate) to 250k
Setting --convert-links (convertlinks) to 1
Setting --html-extension (htmlextension) to 1
Setting --proxy-user (proxyuser) to xxxxx
Setting --proxy-password (proxypassword) to yyyyy
Setting --header (header) to \'Host:
Setting --header (header) to 'Cookie:
DEBUG output created by Wget 1.11.4 Red Hat modified on linux-gnu.

As you can see, the Host and Cookie sections are not being properly formatted, resulting in the wget command failing to log in and extract the data.

I've been reading the bash man pages, googling, and have tried several related suggestions from SO, but I'm still unable to get the command to execute.

Anyone out there going to be nice enough to show me the correct way to quote quotes in veriables?

Thanks,

回答1:

Quotes inside of quoted strings or variables are ordinary characters, not quoting characters. There's no way to change that. Use an array instead:

A=(a b 'c d' 'e f')
cmd "${A[@]}"

calls cmd with four arguments a, b, c d, and e f.

(You could achieve a similar effect with eval, but that's a lot more error prone. In your case, using arrays is much more convenient.)



回答2:

session="--header=Host: mywebsite.com:9090 --header=User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:20.0) Gecko/20100101 Firefox/20.0"

use this,