I'm looking for a script to search a file (or list of files) for a pattern and, if found, replace that pattern with a given value.
Thoughts?
I'm looking for a script to search a file (or list of files) for a pattern and, if found, replace that pattern with a given value.
Thoughts?
Actually, Ruby does have an in-place editing feature. Like Perl, you can say
This will apply the code in double-quotes to all files in the current directory whose names end with ".txt". Backup copies of edited files will be created with a ".bak" extension ("foobar.txt.bak" I think).
NOTE: this does not appear to work for multiline searches. For those, you have to do it the other less pretty way, with a wrapper script around the regex.
Another approach is to use inplace editing inside Ruby (not from the command line):
If you don't want to create a backup then change '.bak' to ''.
There isn't really a way to edit files in-place. What you usually do when you can get away with it (i.e. if the files are not too big) is, you read the file into memory (
File.read
), perform your substitutions on the read string (String#gsub
) and then write the changed string back to the file (File.open
,File#write
).If the files are big enough for that to be unfeasible, what you need to do, is read the file in chunks (if the pattern you want to replace won't span multiple lines then one chunk usually means one line - you can use
File.foreach
to read a file line by line), and for each chunk perform the substitution on it and append it to a temporary file. When you're done iterating over the source file, you close it and useFileUtils.mv
to overwrite it with the temporary file.Keep in mind that, when you do this, the filesystem could be out of space and you may create a zero-length file. This is catastrophic if you're doing something like writing out /etc/passwd files as part of system configuration management.
[ EDIT: note that in-place file editing like in the accepted answer will always truncate the file and write out the new file sequentially. There will always be a race condition where concurrent readers will see a truncated or partially-truncated file, which can be catastrophic. For that reason, I think the accepted answer should most likely not be the accepted answer. ]
You need to use an algorithm that:
reads the old file and writes out to the new file. (You need to be careful about slurping entire files into memory).
explicitly closes the new temporary file, which is where you may throw an exception because the file buffers cannot be written to disk because there is no space. (Catch this and cleanup the temporary file if you like, but you need to rethrow something or fail fairly hard at this point.
fixes the file permissions and modes on the new file.
renames the new file and drops it into place.
With ext3 filesystems you are guaranteed that the metadata write to move the file into place will not get rearranged by the filesystem and written before the data buffers for the new file are written, so this should either succeed or fail. The ext4 filesystem has also been patched to support this kind of behavior. If you are very paranoid you should call the
fdatasync()
system call as a step 3.5 before moving the file into place.Regardless of language, this is best practice. In languages where calling
close()
does not throw an exception (Perl or C) you must explicitly check the return ofclose()
and throw an exception if it fails.The suggestion above to simply slurp the file into memory, manipulate it and write it out to the file will be guaranteed to produce zero-length files on a full filesystem. You need to always use
FileUtils.mv
to move a fully-written temporary file into place.A final consideration is the placement of the temporary file. If you open a file in /tmp then you have to consider a few problems:
Probably more importantly, when you try to
mv
the file across a device mount you will transparently get converted tocp
behavior. The old file will be opened, the old files inode will be preserved and reopened and the file contents will be copied. This is most likely not what you want, and you may run into "text file busy" errors if you try to edit the contents of a running file. This also defeats the purpose of using the filesystemmv
commands and you may run the destination filesystem out of space with only a partially written file.This also has nothing to do with Ruby's implementation. The system
mv
andcp
commands behave similarly.What is more preferable is to open a Tempfile in the same directory as the old file. This ensures that there will be no cross-device move issues. The
mv
itself should never fail, and you should always get a complete and untruncated file. Any failures, such as device out of space, permission errors, etc., should be encountered during writing the Tempfile out.The only downsides to the approach of creating the Tempfile in the destination directory are:
Here's some code that implements the full-algorithm (windows code is untested and unfinished):
And here is a slightly tighter version that doesn't worry about every possible edge case (if you are on Unix and don't care about writing to /proc):
The really simple use case, for when you don't care about file system permissions (either you're not running as root, or you're running as root and the file is root owned):
TL;DR: That should be used instead of the accepted answer at a minimum, in all cases, in order to ensure the update is atomic and concurrent readers will not see truncated files. As I mentioned above, creating the Tempfile in the same directory as the edited file is important here to avoid cross device mv operations being translated into cp operations if /tmp is mounted on a different device. Calling fdatasync is an added layer of paranoia, but it will incur a performance hit, so I omitted it from this example since it is not commonly practiced.
Here an alternative to the one liner from jim, this time in a script
Save it in a script, eg replace.rb
You start in on the command line with
*.txt can be replaced with another selection or with some filenames or paths
broken down so that I can explain what's happening but still executable