可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a txt file like this:

#Genera columnA columnB columnC columnD columnN
x1       1       3       7      0.9      2
x2       5       3       13     7        5
x3       0.1     0.8     7      1        0.4

and I want to extract X determinate number of columns, just suppose that we want columnA, columnC and columnN (this could be a matrix with 1, 2, 20, 100 or more columns) and What I want to print OUT (this example is just 3 but could be more):

#Genera columnA columnC columnN
    x1   1       7       2
    x2   5       13      5
    x3   0.1     7       0.4

I have tried

#!/usr/bin/perl
use strict;
use warnings;


my @wanted_fields = qw/columnA columnC columnN/;

open DATA, '<', "columns.txt" or die "cant open file\n";


my @datain = <DATA>;
close DATA;

my (@unit_name, $names, @lines, @conteo, @match_names, @columnas);

foreach (@datain){
    if ($_=~ m/^$/g)            {   next;           }
    elsif ($_=~ m/#Genera/g)    {   $names= $_;     }
    else                        {   push @lines, $_ }
}


@unit_name = split (/\t/, $names);
shift @unit_name;
my $count =0;

    foreach (@wanted_fields){
        my $unit_wanted =$_;
        chomp $unit_wanted;
        foreach (@unit_name){
            if ($_ =~ m/$unit_wanted/g){
                $count++;
                 push (@conteo, $count);
                 push (@match_names, $_);
                }
        }
    }


    foreach (@lines){
        chomp;
        @columnas = split (/\t/, $_);
            #push @xx, $columnas[0][3];

    }

I used the count to determinate the column to extract but in this case the number 2 do no correspond to columnC and 3 do not correspond to columnN well...... it is a any simple way to select any given columns, in this case I just want 3 but depend of the case could be 1,2 5, 10, 100 or more columns.

Thanks

回答1:

You can simplify like this and using hash slices.

#!/usr/bin/env perl
use strict;
use warnings;

my @wanted = ( '#Genera' , qw (  columnA columnC columnN ));

open my $input, '<', "file.txt" or die $!;

chomp ( my @header = split ' ', <$input> ); 

print join "\t", @wanted, "\n";
while ( <$input> ) { 
   my %row;
   @row{@header} = split; 
   print join "\t", @row{@wanted}, "\n";
}

Which outputs:

#Genera columnA columnC columnN 
x1  1   7   2   
x2  5   13  5   
x3  0.1 7   0.4

If you want to exactly match your indentation then add sprintf to the mix:

E.g.:

print join "\t", map { sprintf "%8s", $_} @wanted, "\n";
while ( <$input> ) { 
   my %row;
   @row{@header} = split; 
   print join "\t", map { sprintf "%8s", $_} @row{@wanted}, "\n";
}

Which then gives:

 #Genera     columnA     columnC     columnN           
      x1           1           7           2           
      x2           5          13           5           
      x3         0.1           7         0.4

回答2:

This program does as you ask. It expects the path to the input file as a parameter on the command line, which can then be read using the empty "diamond operator" <> without explicitly opening it

Each non-blank line of the file is split into fields, and the header line is identified by the first starting with a hash symbol #

A call to map converts the @wanted_fields array into a list of indexes into @fields where those column headers appear and stores it in array @idx

This array is then used to slice the wanted columns from @fields for every line of input. The fields are printed, separated by tabs

use strict;
use warnings 'all';

use List::Util 'first';

my @wanted_fields = qw/ columnA columnC columnN /;

my @idx;

while ( <> ) {
    next unless /\S/;

    my @fields = split;

    if ( $fields[0] =~ /^#/ ) {

        @idx = ( 0, map {
            my $wanted = $_;
            first { $fields[$_] eq $wanted } 0 .. $#fields;
        } @wanted_fields );
    }

    print join( "\t", @fields[@idx] ), "\n" if @idx;
}

output

#Genera columnA columnC columnN
x1  1   7   2
x2  5   13  5
x3  0.1 7   0.4

回答3:

There are command line switches that are used for this kind of application:

perl -lnae 'print join "\t", @F[1,3,5]' file.txt

Switch -a automatically creates variable @F for each line, split by whitespace. So @F[1,3,5] is an array slice of elements 1, 3, and 5.

The downside of this, of course, is that you have to use the column numbers instead of the names.

extract multiples columns from txt file perl

问题:

回答1:

回答2:

output

回答3:

收藏的人(0)

extract multiples columns from txt file perl

问题:

回答1:

回答2:

output

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮