Creating a static copy of a web page on UNIX comma

2019-04-15 01:56发布

问题:

I need to create a static copy of a web page (all media resources, like CSS, images and JS included) in a shell script. This copy should be openable offline in any browser.

Some browsers have a similar functionality (Save As... Web Page, complete) which create a folder from a page and rewrite external resources as relative static resources in this folder.

What's a way to accomplish and automatize this on Linux command line to a given URL?

回答1:

You can use wget like this:

wget --recursive --convert-links --domains=example.org http://www.example.org

this command will recursively download any page reachable by hyperlinks from the page at www.example.org not following links outside the example.org domain.

Check wget manual page for more options for controlling recursion.



回答2:

You want the tool wget to mirror a site do:

$ wget -mk http://www.example.com/

Options:

-m --mirror

Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.

-k --convert-links

After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.