In perl, I read in files from a directory, and I want to open them all simultaneously (but line by line) so that I can perform a function that uses all of their nth lines together (e.g. concatenation).
my $text = `ls | grep ".txt"`;
my @temps = split(/\n/,$text);
my @files;
for my $i (0..$#temps) {
my $file;
open($file,"<",$temps[$i]);
push(@files,$file);
}
my $concat;
for my $i (0..$#files) {
my @blah = <$files[$i]>;
$concat.=$blah;
}
print $concat;
I just a bunch of errors, use of uninitialized value, and GLOB(..) errors. So how can I make this work?
A lot of issues. Starting with call to "ls | grep" :)
Let's start with some code:
First, let's get list of files:
But it would be better to test if the given name relates to file or directory:
Now, let's open these files to read them:
But, we need a way to handle errors - in my opinion the best way is to add:
At the beginning of script (and installation of autodie, if you don't have it yet). Alternatively you can:
Now, that we have it, let's get the first line (as you showed in your example) from all of the inputs, and concatenate it:
Which is perfectly fine, and readable, but still can be shortened, while maintaining (in my opinion) readability, to:
Effect is the same - $concatenated contains first lines from all files.
So, whole program would look like this:
Now, it might be that you want to concatenate not just first lines, but all of them. In this situation, instead of
$concatenated = ...
code, you'd need something like this:Here is your problem:
First,
<$files[$i]>
isn't a valid filehandle read. This is the source of your GLOB(...) errors. See mobrule's answer for why this is the case. So change it to this:Second problem, You're mixing
@blah
(an array namedblah
) and$blah
(a scalar namedblah
). This is the source of your "uninitialized value" errors -$blah
(the scalar) hasn't been initialized, but you're using it. If you want the$n
-th line from@blah
, use this:I don't want to keep beating a dead horse, but I do want to address a better way to do something:
This reads in a list of all files in the current directory that have a ".txt" extension in them. This works, and is effective, but it can be rather slow - we have to call out to the shell, which has to fork off to run
ls
andgrep
, and that incurs a bit of overhead. Furthermore,ls
andgrep
are simple and common programs, but not exactly portable. Surely there's a better way to do this:Simple, short, pure Perl, no forking, no non-portable shells, and we don't have to read in the string and then split it - we can only store the entries we really need. Plus, it becomes trivial to modify the conditions for files that pass the test. Say we end up accidentally reading the file
test.txt.gz
because our regex matches: we can easily change that line to:We can do that one with
grep
(I believe), but why settle forgrep
's limited regular expressions when Perl has one of the most powerful regex libraries anywhere built-in?Use braces around
$files[$i]
inside the<>
operatorOtherwise Perl interprets
<>
as the file glob operator instead of the read-from-filehandle operator.You've got some good answers already. Another way to tackle the problem is to create a list-of-lists containing all of the lines from the files (
@content
). Then use theeach_arrayref
function from List::MoreUtils, which will create an iterator that yields line 1 from all files, then line 2, etc.