How to enable 'wget' to download the whole

I have a site which I want to download using Unix wget. If you look at the source code and content of the file it contain section called SUMMARY. However after issuing a wget command like this:

wget   -O downdloadedtext.txt  http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=mouse&c=gene&a=fiche&l=2610008E11Rik

The content of the downdloadedtext.txt is incomplete and different with the source code of that site. For example it doesn't contain SUMMARY section. Is there a correct way to obtain the full content correctly?

The reason I ask this because I want to automate the download from different values in that HTML.

标签： javascript html linux cgi wget

3条回答

狗以群分

2楼-- · 2019-04-28 12:14

You can use the -p (--page-prerequisites) flag to tell wget to retrieve linked resources. From man wget:

This option causes Wget to download all the files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets.

You might also look at the --follow-tags option, which lets you limit that process:

Wget has an internal table of HTML tag / attribute pairs that it considers when looking for linked documents during a recursive retrieval. If a user wants only a subset of those tags to be considered, however, he or she should be specify such tags in a comma-separated list with this option.

0人赞添加讨论(0) 举报

Rolldiameter

3楼-- · 2019-04-28 12:21

You need to put the link inside quotes:

 wget -O downdloadedtext.txt  'http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=mouse&c=gene&a=fiche&l=2610008E11Rik'

This is because the & has a special meaning and will split the command into multiple commands.

0人赞添加讨论(0) 举报

甜甜的少女心

4楼-- · 2019-04-28 12:29

The & character has special meaning in shells. Quote the URI so you actually request the URI you want to request.

0人赞添加讨论(0) 举报

How to enable 'wget' to download the whole

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间