I'm trying to write my first Perl program. If you think that Perl is a bad language for the task at hand tell me what language would solve it better.
The program tests connectivity between given machine and remote Apache server. At first program requests the directory listing from the Apache server, than it parses the list and downloads all files one by one. Should there be a problem with file (connection resets before reaching the specified Content-Length) this should be logged and next file should be retrieved. There is no need to save the files or even check the integrity, I only need to log the time it takes to complete and all cases where connection resets.
To retrieve the list of links from Apache-generated directory index I plan to use regexp similar to
/href=\"([^\"]+)\"/
The regexp is not debugged yet, indeed.
What is the "reference" way to do HTTP request from Perl? I googled and found examples using many different libraries, some of them commercial. I need something that can detect disconnections (timeout or TCP reset) and handle these.
Another question. How do I store everything caught by my regexp when searching globally as a list of string with the minimal coding effort?
As more general answer, Perl is a perfectly fine language for doing HTTP requests, as are a host of other languages. If you're familiar with Perl, don't even hesitate; there are many excellent libraries available to do what you need.
As for the parsing markup with regular expressions part of your question, DON'T!
http://htmlparsing.icenine.ca explains some of the reasons why you shouldn't do this. Although what you're seemingly attempting to parse seems simple, use a proper parser.
Page linked above no longer exists...
http://www.cwhitener.com/htmlparsing
As far as the whole problem description goes, I would use WWW::Mechanize. Mechanize is a subclass of
LWP::UserAgent
that adds stateful behavior and HTML parsing. With mech, you can just do$mech->get($url_of_index_page)
, and then use$mech->find_all_links(criteria)
to select the links to follow.You have many questions in one. The answer to the question in the title of your post is to use LWP::Simple.
Most of your other questions are answered in perlfaq9 with appropriate pointers to further information.