Script that uses parameters and reads results

I am trying to write a script that takes in a URL with certain parameters, reads from the resulting web page a list of new URLs, and downloads them locally. I am very new to programming and have never used Python 3, so I am a little lost.

Here is example code to explain further:

param1 = 
param2 = 
param3 = 

requestURL = "http://examplewebpage.com/live2/?target=param1&query=param2&other=param3"

html_content = urllib2.urlopen(requestURL).read()

#I don't know where to go from here
#Something that can find when a URL appears on the page and append it to a list 
#Then download everything from that list

#this can download something from a link:
#file = urllib.URLopener()
#file.retrieve(url, newfilelocation)

The output from the request-URL is a very long page that can be in XML or JSON and has a lot of information not necessarily needed, so some form of searching is needed to find the URLs that need to be downloaded from later. The URLs found on the page lead directly to the needed files (They end in .jpg, .cat, etc).

Please let me know if you need any other information! My apologies if this is confusing.

Also, ideally I would have the downloaded files all go to a new folder (sub-dir) created for them with the filename as the current date and time, but I think I can figure this part out myself.

标签： python url download

2条回答

成全新的幸福

2楼-- · 2019-08-14 13:55

I would recommend checking out BeautifulSoup for parsing the returned page. With it, you can loop through the links and extract the link address fairly easy and append them to a list of the links.

0人赞添加讨论(0) 举报

冷血范

3楼-- · 2019-08-14 13:56

It looks like you are trying to build something similar to a web crawler, unless you want to render the content. You should explore the source code from scrapy this will help in understanding how others wrote the similar logic. I would suggest using requests library instead of urllib since it's easier. python library has builtin html, Json and XML parsers.

You should inspect the content-type header to understand what kind of content you are trying to download if the page type is unknown. There can be alternative strategies, scrapy should give you more ideas.

Hope this helps.

0人赞添加讨论(0) 举报

Script that uses parameters and reads results

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间