Is it safe to pipe the output of several parallel

2020-02-18 04:24发布

I'm scraping data from the web, and I have several processes of my scraper running in parallel.

I want the output of each of these processes to end up in the same file. As long as lines of text remain intact and don't get mixed up with each other, the order of the lines does not matter. In UNIX, can I just pipe the output of each process to the same file using the >> operator?

9条回答
兄弟一词,经得起流年.
2楼-- · 2020-02-18 05:11

Briefly, no. >> doesn't respect multiple processes.

查看更多
何必那么认真
3楼-- · 2020-02-18 05:12

No. It is not guaranteed that lines will remain intact. They can become intermingled.

From searching based on liori's answer I found this:

Write requests of {PIPE_BUF} bytes or less shall not be interleaved with data from other processes doing writes on the same pipe. Writes of greater than {PIPE_BUF} bytes may have data interleaved, on arbitrary boundaries, with writes by other processes, whether or not the O_NONBLOCK flag of the file status flags is set.

So lines longer than {PIPE_BUF} bytes are not guaranteed to remain intact.

查看更多
趁早两清
4楼-- · 2020-02-18 05:15

As mentioned above it's quite a hack, but works pretty well =)

( ping stackoverflow.com & ping stackexchange.com & ping fogcreek.com ) | cat

same thing with '>>' :

( ping stackoverflow.com & ping stackexchange.com & ping fogcreek.com ) >> log

and with exec on the last one you save one process:

( ping stackoverflow.com & ping stackexchange.com & exec ping fogcreek.com ) | cat
查看更多
登录 后发表回答