Perl: matching data in two files

2019-09-09 00:15发布

问题:

I would like to match and print data from two files (File1.txt and File2.txt). Currently, I'm trying to match the first letter of the second column in File1 to the first letter of the third column in File2.txt.

File1.txt
1  H  35
1  C  22
1  H  20

File2.txt
A  1 HB2 MET  1 
A  2 CA  MET  1
A  3 HA  MET  1

OUTPUT
1  MET  HB2  35
1  MET  CA   22
1  MET  HA   20 

Here is my script, I've tried following this submission: In Perl, mapping between a reference file and a series of files

#!/usr/bin/perl

use strict;
use warnings;

my %data;

open (SHIFTS,"file1.txt") or die;
open (PDB, "file2.txt") or die;

while (my $line = <PDB>) {
    chomp $line;
    my @fields = split(/\t/,$line);
    $data{$fields[4]} = $fields[2];
 }

 close PDB;

 while (my $line = <SHIFTS>) {
    chomp($line);
    my @columns = split(/\t/,$line);
    my $value = ($columns[1] =~ m/^.*?([A-Za-z])/ );
 }
    print "$columns[0]\t$fields[3]\t$value\t$data{$value}\n";

 close SHIFTS;
 exit;

回答1:

Here's one way using split() hackery:

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

my $f1 = 'file1.txt';
my $f2 = 'file2.txt';

my @pdb;

open my $pdb_file, '<', $f2
  or die "Can't open the PDB file $f2: $!";

while (my $line = <$pdb_file>){
    chomp $line;
    push @pdb, $line; 
}

close $pdb_file;

open my $shifts_file, '<', $f1
  or die "Can't open the SHIFTS file $f1: $!";

while (my $line = <$shifts_file>){

    chomp $line;

    my $pdb_line = shift @pdb;

    # - inner split: get the third element from the $pdb_line
    # - outer split: get the first element (character) from the
    #   result of the inner split

    my $criteria = (split('', (split('\s+', $pdb_line))[2]))[0];

    # - compare the 2nd element of the file1.txt line against
    #   the above split() operations

    if ((split('\s+', $line))[1] eq $criteria){
        print "$pdb_line\n";
    }
    else {
        print "**** >$pdb_line< doesn't match >$line<\n";
    }
}

Files:

file1.txt (note I changed line two to ensure a non-match worked):

1  H  35
1  A  22
1  H  20

file2.txt:

A  1 HB2 MET  1 
A  2 CA  MET  1
A  3 HA  MET  1

Output:

./app.pl
A  1 HB2 MET  1 
****>A  2 CA  MET  1< doesn't match >1  A  22<
A  3 HA  MET  1


标签: perl match