可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
When I am trying to sort a file and save the sorted output in itself, like this
sort file1 > file1;
the contents of the file1 is getting erased altogether, whereas when i am trying to do the same with 'tee' command like this
sort file1 | tee file1;
it works fine [ed: "works fine" only for small files with lucky timing, will cause lost data on large ones or with unhelpful process scheduling], i.e it is overwriting the sorted output of file1 in itself and also showing it on standard output.
Can someone explain why the first case is not working?
回答1:
It doesn't work because '>' redirection implies truncation, and to avoid keeping the whole output of sort
in the memory before re-directing to the file, bash truncates and redirects output before running sort
. Thus, contents of the file1
file will be truncated before sort
will have a chance to read it.
回答2:
As other people explained, the problem is that the I/O redirection is done before the sort
command is executed, so the file is truncated before sort
gets a chance to read it. If you think for a bit, the reason why is obvious - the shell handles the I/O redirection, and must do that before running the command.
The sort
command has 'always' (since at least Version 7 UNIX) supported a -o
option to make it safe to output to one of the input files:
sort -o file1 file1 file2 file3
The trick with tee
depends on timing and luck (and probably a small data file). If you had a megabyte or larger file, I expect it would be clobbered, at least in part, by the tee
command. That is, if the file is large enough, the tee
command would open the file for output and truncate it before sort
finished reading it.
回答3:
It's unwise to depend on either of these command to work the way you expect.
The way to modify a file in place is to write the modified version to a new file, then rename the new file to the original name:
sort file1 > file1.tmp && mv file1.tmp file1
This avoids the problem of reading the file after it's been partially modified, which is likely to mess up the results. It also makes it possible to deal gracefully with errors; if the file is N bytes long, and you only have N/2 bytes of space available on the file system, you can detect the failure creating the temporary file and not do the rename.
Or you can rename the original file, then read it and write to a new file with the same name:
mv file1 file1.bak && sort file1.bak > file1
Some commands have options to modify files in place (for example, perl
and sed
both have -i
options (note that the syntax of sed's -i
option can vary). But these options work by creating temporary files; it's just done internally.
回答4:
Bash open a new empty file when reads the pipe, and then calls to sort.
In the second case, tee opens the file after sort has already read the contents.
回答5:
Redirection has higher precedence. So in the first case, > file1 executes first and empties the file.
回答6:
The first command doesn't work (sort file1 > file1
), because when using the redirection operator (>
or >>
) shell creates/truncates file before the sort
command is even invoked, since it has higher precedence.
The second command works (sort file1 | tee file1
), because sort
reads lines from the file first, then writes sorted data to standard output.
So when using any other similar command, you should avoid using redirection operator when reading and writing into the same file, but you should use relevant in-place editors for that (e.g. ex
, ed
, sed
), for example:
ex '+%!sort' -cwq file1
or use other utils such as sponge
.
Luckily for sort
there is the -o
parameter which write results to the file (as suggested by @Jonathan), so the solution is straight forward: sort -o file1 file1
.
回答7:
You can use this method
sort file1 -o file1
This will sort and store back to the original file. Also, you can use this command to remove duplicated line:
sort -u file1 -o file1