I need to create a static copy of a web page (all media resources, like CSS, images and JS included) in a shell script. This copy should be openable offline in any browser.
Some browsers have a similar functionality (Save As... Web Page, complete) which create a folder from a page and rewrite external resources as relative static resources in this folder.
What's a way to accomplish and automatize this on Linux command line to a given URL?
You can use wget
like this:
wget --recursive --convert-links --domains=example.org http://www.example.org
this command will recursively download any page reachable by hyperlinks from the page at www.example.org not following links outside the example.org domain.
Check wget
manual page for more options for controlling recursion.
You want the tool wget
to mirror a site do:
$ wget -mk http://www.example.com/
Options:
-m --mirror
Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps
FTP
directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.
-k --convert-links
After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not
only the
visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets,
hyperlinks to non-HTML content, etc.