I have a site which I want to download using Unix wget
.
If you look at the source code and content of the file it contain section called SUMMARY.
However after issuing a wget command like this:
wget -O downdloadedtext.txt http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=mouse&c=gene&a=fiche&l=2610008E11Rik
The content of the downdloadedtext.txt
is incomplete and different with the source code
of that site. For example it doesn't contain SUMMARY section. Is there a correct way to obtain the full content correctly?
The reason I ask this because I want to automate the download from different values in that HTML.
You can use the
-p
(--page-prerequisites
) flag to tellwget
to retrieve linked resources. Fromman wget
:You might also look at the
--follow-tags
option, which lets you limit that process:You need to put the link inside quotes:
This is because the & has a special meaning and will split the command into multiple commands.
The
&
character has special meaning in shells. Quote the URI so you actually request the URI you want to request.