I have a bunch a files that contain many blank lines, and want to remove any repeated blank lines to make reading the files easier. I wrote the following script:
#!/bin/bash
for file in * ; do cat "$file" | sed 's/^ \+//' | cat -s > "$file" ; done
However, this had very unreliable results, with most files becoming completely empty and only a few files having the intended results. What's more, the files that did work seemed to change randomly every time I retried, as different files would get correctly edited in every run. What's going on?
Note: This is more of a theoretical question, because I realize I could use a workaround like:
#!/bin/bash
for file in * ; do
cat "$file" | sed 's/^ \+//' | cat -s > "$file"-tmp
rm "$file"
mv "$file"-tmp "$file"
done
But that seems unnecessarily convoluted. So why is the "direct" method so unreliable?
The unpredictability happens because there's a race condition between two stages in the pipeline,
cat "$file"
andcat -s > "$file"
.The first tries to open the file and read from it, while the other tries to empty the file.
If you have GNU sed, you can simply do
sed -i 'expression' *
You cannot read from a file if you are writing to it at the same time. The
>
redirection first clears the file, so there is nothing more to read.You can use
sed -i -e '/^$/d'
to remove empty lines (if your sed supports-i
), which creates the temporary file under the hood.