I would like to do the following. Log into a website, click a couple of specific links, then click a download link. I'd like to run this as either a scheduled task on windows or cron job on Linux. I'm not picky about the language I use, but I'd like this to run with out putting a browser window up on the screen if possible.
相关问题
- How to use VBA or Powershell to export lists from
- How to trigger a click on a chrome extension butto
- driver.quit() does not close browser windows
- Automation support for Visual Basic 6 ListView
- Azure Key Vault access from ARM Template
相关文章
- Reading data from Excel in Haskell
- Toad: 10.6: Seek clear instructions on automating
- Need to mock google OAuth2 service
- Python | PhantomJS not clicking on element
- How to compare two images in Robot Framework
- Automate Connect-AzureAD using powershell in Azure
- How can I test performance of my web app at differ
- Automation testing tool for Regression testing of
You can use Watir with Ruby or Watin with mono.
Here are a list of headless browsers that I know about:
Headless browsers that have JavaScript support via an emulated DOM generally have issues with some sites that use more advanced/obscure browser features, or have functionality that has visual dependencies (e.g. via CSS positions and so forth), so whilst the pure JavaScript support in these browsers is generally complete, the actual supported browser functionality should be considered as partial only.
(Note: Original version of this post only mentioned HtmlUnit, hence the comments. If you know of other headless browser implementations and have edit rights, feel free to edit this post and add them.)
If the links are known (e.g, you don't have to search the page for them), then you can probably use
wget
. I believe that it will do the state management across multiple fetches.If you are a little more enterprising, then I would delve into the new goodies in Python 3.0. They redid the interface to their HTTP stack and, IMHO, have a very nice interface that is susceptible to this type of scripting.
libCURL could be used to create something like this.
Can you not just use a download manager?
There's better ones, but FlashGet has browser-integration, and supports authentication. You can login, click a bunch of links and queue them up and schedule the download.
You could write something that, say, acts as a proxy which catches specific links and queues them for later download, or a Javascript bookmarklet that modifies links to go to
"http://localhost:1234/download_queuer?url=" + $link.href
and have that queue the downloads - but you'd be reinventing the download-manager-wheel, and with authentication it can be more complicated..Or, if you want the "login, click links" bit to be automated also - look into screen-scraping.. Basically you load the page via a HTTP library, find the download links and download them..
Slightly simplified example, using Python:
That would download every link on example.com, after authenticating with the username/password of "username" and "password". You could, of course, find more specific links using BeautifulSoup's HTML selector's (for example, you could find all links with the class "download", or URL's that start with
http://cdn.example.com
).You could do the same in pretty much any language..
I once did that using the Internet Explorer ActiveX control (WebBrowser, MSHTML). You can instantiate it without making it visible.
This can be done with any language which supports COM (Delphi, VB6, VB.net, C#, C++, ...)
Of course this is a quick-and-dirty solution and might not be appropriate in your situation.