wget or curl from stdin

I'd like to download a web pages while supplying URLs from stdin. Essentially one process continuously produces URLs to stdout/file and I want to pipe them to wget or curl. (Think about it as simple web crawler if you want).

This seems to work fine:

tail 1.log | wget -i - -O - -q

But when I use 'tail -f' and it doesn't work anymore (buffering or wget is waiting for EOF?):

tail -f 1.log | wget -i - -O - -q

Could anybody provide a solution using wget, curl or any other standard Unix tool? Ideally I don't won't want to restart wget in the loop, just keep it running downloading URLs as they come.

标签： unix curl wget stdin xargs

4条回答

干净又极端

2楼-- · 2019-02-06 05:40

You can do this with cURL, but your input needs to be properly formatted. Example alfa.txt:

url example.com
output example.htm
url stackoverflow.com
output stackoverflow.htm

Alternate example:

url stackoverflow.com/questions
remote-name
url stackoverflow.com/documentation
remote-name

Example command:

cat alfa.txt | curl -K-

0人赞添加讨论(0) 举报

我只想做你的唯一

3楼-- · 2019-02-06 05:43

Use xargs which converts stdin to argument.

tail 1.log | xargs -L 1 wget

0人赞添加讨论(0) 举报

淡お忘

4楼-- · 2019-02-06 05:50

What you need to use is xargs. E.g.

tail -f 1.log | xargs -n1 wget -O - -q

0人赞添加讨论(0) 举报

Explosion°爆炸

5楼-- · 2019-02-06 05:50

Try piping the tail -f through python -c $'import pycurl;c=pycurl.Curl()\nwhile True: c.setopt(pycurl.URL,raw_input().strip()),c.perform()'

This gets curl (well, you probably meant the command-line curl and I'm calling it as a library from a Python one-liner, but it's still curl) to fetch each URL immediately, while still taking advantage of keeping the socket to the server open if you're requesting multiple URLs from the same server in sequence. It's not completely robust though: if one of your URLs is duff, the whole command will fail (you might want to make it a proper Python script and add try / except to handle this), and there's also the small detail that it will throw EOFError on EOF (but I'm assuming that's not important if you're using tail -f).

0人赞添加讨论(0) 举报

wget or curl from stdin

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间