Automated link-checker for system testing [closed]

2019-01-30 07:59发布

问题:

I often have to work with fragile legacy websites that break in unexpected ways when logic or configuration are updated.

I don't have the time or knowledge of the system needed to create a Selenium script. Besides, I don't want to check a specific use case - I want to verify every link and page on the site.

I would like to create an automated system test that will spider through a site and check for broken links and crashes. Ideally, there would be a tool that I could use to achieve this. It should have as many as possible of the following features, in descending order of priority:

  • Triggered via script
  • Does not require human interaction
  • Follows all links including anchor tags and links to CSS and js files
  • Produces a log of all found 404s, 500s etc.
  • Can be deployed locally to check sites on intranets
  • Supports cookie/form-based authentication
  • Free/Open source

There are many partial solutions out there, like FitNesse, Firefox's LinkChecker and the W3C link checker, but none of them do everything I need.

I would like to use this test with projects using a range of technologies and platforms, so the more portable the solution the better.

I realise this is no substitute for proper system testing, but it would be very useful if I had a convenient and automatable way of verifying that no part of the site was obviously broken.

回答1:

I use Xenu's Link Sleuth for this sort of thing. Quickly check for no deadlinks etc. on a/any site. Just point it at any URI and it'll spider all links on that site.

Desription from site:

Xenu's Link Sleuth (TM) checks Web sites for broken links. Link verification is done on "normal" links, images, frames, plug-ins, backgrounds, local image maps, style sheets, scripts and java applets. It displays a continously updated list of URLs which you can sort by different criteria. A report can be produced at any time.

It meets all you're requirements apart from being scriptable as it's a windows app that requires manually starting.



回答2:

We use and really like Linkchecker:

http://wummel.github.io/linkchecker/

It's open-source, Python, command-line, internally deployable, and outputs to a variety of formats. The developer has been very helpful when we've contacted him with issues.

We have a Ruby script that queries our database of internal websites, kicks off LinkChecker with appropriate parameters for each site, and parses the XML that LinkChecker gives us to create a custom error report for each site in our CMS.



回答3:

What part of your list does the W3C link checker not meet? That would be the one I would use.

Alternatively, twill (python-based) is an interesting little language for this kind of thing. It has a link checker module but I don't think it works recursively, so that's not so good for spidering. But you could modify it if you're comfortable with that. And I could be wrong, there might be a recursive option. Worth checking out, anyway.



回答4:

You might want to try using wget for this. It can spider a site including the "page requisites" (i.e. files) and can be configured to log errors. I don't know if it will have enough information for you but it's Free and available on Windows (cygwin) as well as unix.



回答5:

InSite is a commercial program that seems to do what you want (haven't used it).

If I was in your shoes, I'd probably write this sort of spider myself...



回答6:

I'm not sure that it supports form authentication but it will handle cookies if you can get it going on the site and otherwise I think Checkbot will do everything on your list. I've used as a step in build process before to check that nothing broken on a site. There's an example output on the website.



回答7:

I have always liked linklint for checking links on a site. However, I don't think it meets all your criteria, particularly the aspects that may be JavaScript dependent. I also think it will miss the images called from inside CSS.

But for spidering all anchors, it works great.



回答8:

Try SortSite. It's not free, but seems to do everything you need and more.

Alternatively, PowerMapper from the same company has a similar-but-different approach. The latter will give you less information about detailed optimisation of your pages, but will still identify any broken links, etc.

Disclaimer: I have a financial interest in the company that makes these products.



回答9:

Try http://www.thelinkchecker.com it is an online application that checks number of outgoing links, page rank , anchor, number of outgoing links. I think this is the solution you need.