Can someone give some hints of how to delete the last n lines from a file in Perl? I have a very large file of around 400 MB, and I want to delete some 125,000 last lines from it.
问题:
回答1:
You can use Tie::File to handle the file as an array.
use Tie::File; tie (@File, 'Tie::File', $Filename); splice (@File, -125000, 125000); untie @File;
An alternative is to use head
and wc -l
in the shell.
edit: grepsedawk reminds us of the -n
option to head
, no wc
necessary:
head -n -125000 FILE > NEWFILE
回答2:
As folks have suggested Tie::Array already, which does the job well, I'll lay out the basic algorithm should you want to do it by hand. There are sloppy, slow ways to do it that work well for small files. Here's the efficient way to do it for large files.
- Find the position in the file just before the Nth line from the end.
- Truncate everything after that point (using
truncate()
).
1 is the tricky part. We don't know how many lines there are in the file or where they are. One way is to count all the lines up and then go back to the Nth. This means we have to scan the whole file every time. More efficient would be to read backwards from the end of the file. You can do this with read()
but it's easier to use File::ReadBackwards which can go backwards line by line (while still using efficient buffered reads).
This means you read just 125,000 lines rather than the whole file. truncate()
should be O(1) and atomic and cost almost nothing no matter how large the file. It simply resets the size of the file.
#!/usr/bin/perl
use strict;
use warnings;
use File::ReadBackwards;
my $LINES = 10; # Change to 125_000 or whatever
my $File = shift; # file passed in as argument
my $rbw = File::ReadBackwards->new($File) or die $!;
# Count backwards $LINES or the beginning of the file is hit
my $line_count = 0;
until( $rbw->eof || $line_count == $LINES ) {
$rbw->readline;
$line_count++;
}
# Chop off everything from that point on.
truncate($File, $rbw->tell) or die "Could not truncate! $!";
回答3:
Do you know how many lines there are, or is there any other clue about this file? Do you have to do this over-and-over again, or is it just one time?
If I had to do this once, I'd load the file in vim, look at the last line number, then delete from the last line I want until the end:
:1234567,$d
The general programming way is to do it in two passes: one to determine the number of lines, and then one to get rid of the lines.
The simple way is to print the right number of lines to a new file. It's only efficient in terms of cycles and maybe a bit of disk thrashing, but most people have plenty of those. Some of the stuff in perlfaq5 should help. You get the job done and you get on with life.
while( ) { print $out; last if $. > $last_line_I_want; }
If this is something you have to do a lot or the data size is too large to rewrite it, you can create an index of lines and byte offsets and truncate() the file to the right size. As you keep the index, you only have to discover the new line endings because you already know where you left off. Some file-handling modules can handle all of that for you.
回答4:
I would just use a shell script for this problem:
tac file | sed '1,125000d' | tac
(tac is like cat but prints lines in reverse order. By Jay Lepreau and David MacKenzie. Part of GNU coreutils.)
回答5:
- go to the end of the file: fseek
- count backwards that many lines
- find out the file position: ftell
- truncate file to that position as length: ftruncate
回答6:
Schwern: Are the use Fnctl
and $rbw->get_handle
lines in your script necessary? Also, I'd recommend reporting truncate
errors in the case it doesn't return true.
-- Douglas Hunter (who would have commented on that post if he could have)
回答7:
Try this code:
my $i =0 ;
sed -i '\$d' filename while( $i++ < n ) ;
backquotes will also be there but i am unable to get them printed :(
回答8:
My suggestion, using ed
:
printf '$-125000,$d\nw\nq\n' | ed -s myHugeFile
回答9:
try this
:|dd of=urfile seek=1 bs=$(($(stat -c%s urfile)-$(tail -1 urfile|wc -c)))
回答10:
This example code will keep the index of the last 10 lines, as it scans the file. Then it uses the earliest index in the buffer, to truncate the file. This of course will only work if truncate works on your system.
#! /usr/bin/env perl
use strict;
use warnings;
use autodie;
open my $file, '+<', 'test.in'; # rw
my @list;
while(<$file>){
if( @list <= 10 ){
push @list, tell $file;
}else{
(undef,@list) = (@list,tell $file);
}
}
seek $file, 0, 0;
truncate $file, $list[0] if @list;
close $file;
This has the added benefit that it only uses up enough memory for the last ten indexes, and the current line.
回答11:
The most efficient way would be to seek to the end of the file, then incrementally read segments, while counting the number of newlines in each, and then use truncate (see perldoc -f truncate) to trim it down. There is also a module or two on CPAN for reading a file backwards.