Curl Complex With Bash

2019-08-29 06:08发布

问题:

Small Note: I removed the http:// from infront each link, because stackoverflow isn't allowing me to post it in original way. I wrote a script which access to a webpage, to catch a URL and download it. One of the urls makes curl stop working and the whole URLS in the list to the same. The script works as following:-

PAGE=$(curl -sL pageurl)
FILE_URL=$(echo $PAGE | sed -e 's/^.*<a href=\"\(.*\)\">\(.*\) alt="File" \/><\/a>.*$/\1/')

The FILE_URL VALUE is

URL/files/PartOne - Booke (Coll).pdf
webprod25.megashares.com/index.php?d01=3109985&lccdl=9e8e091ef33dd103&d01go=1&fln=/adobe reader exe.rar

AND SO One for others

When curl tried to catch this url it shows the following error using the debug mode of bash

++ curl -sOL 'webprod37.megashares.com/index.php?d01=3109985&lccdl=9e8e091ef33dd103&d01go=1&fln=/adobe' reader exe.rar fileshare273.depositfiles.com/auth-13023763920cd7ec18a0fdbfa8b62d35-188.165.197.50-43792102-7713641/FS273-7/PageMaker.rar -sOLJg fileshare601.depositfiles.com/auth-1302376689013d421df6c01e7f64c8d2-188.165.197.50-43801594-82379659/FS601-2/Adobe_Flash_Player_v10.3.180.65.2.rar -sOLJg 'webprod37.megashares.com/index.php?d01=de48789&lccdl=9e8e091ef33dd103&d01go=1&fln=/KAZAMIZA.COM.Adobe.Flash' Player-10.3.180.65.Beta-2.JUDGMENT DAY.rar bellatrix.oron.com/spzsttzwytpflwd76j3ne2moukomuhcdxg6llddfztqa2ztd7cplwwp457h3mxuacq3pbxzs/An-Beat - Mentally Insine '(Original' 'Mix).mp3'
curl: option -: is unknown

curl: try 'curl --help' or 'curl --manual' for more information

The quote marks the curl put it itself, I tried to do some workarounds like escaping url but it not works.

回答1:

The basic problem seems to be that you are using $() expansion for something that looks to me like a multi line value. You should try iterating over each line.

The other problem looks like one of improper quoting of URLs containing spaces. There's a lone dash (-) in "An-Beat - Mentally Insine"

Oh, one more problem: The sed part to catch the href="..." contents only works if there's exactly one href on the line. If there are two or more, your \(.*\) will match everything else up to the last href. You should use something like href="\([^"]*\)", matching "any number of non-doublequotes followed by a doublequote".



回答2:

Quote your variables as in:

pageurl='the url'
PAGE=$(curl -sL "$pageurl")
FILE_URL=$(echo "$PAGE" | sed -e 's/^.*<a href=\"\(.*\)\">\(.*\) alt="File" \/><\/a>.*$/\1/')

Otherwise, shell expansion will occur. The error "option -: is unknown" comes from the final part:

An-Beat - Mentally Insine

Because you didn't apply quotes to it, it got parsed as arguments, which you can clearly see in the syntax-highlighted code.



标签: linux bash curl