-->

Perl - how to skip the lines already read in a pre

2019-07-10 03:29发布

问题:

I have a problem with following Perl program, which can be used to reorganize perform the trace of accesses to an application.

I've implemented following solution with jump rows function, because next in the future I could have 10 or more rotated files, of 50MB each.

I Want to skip the lines already read in previous processing (if the inode of the file has not changed), in this way I will work only with deltas.

I hope this code can help other users.

#!/usr/bin/perl

use strict;
use warnings 'all';

use File::Path qw<mkpath>;
use File::Spec;
use File::Copy;
use POSIX qw<strftime>;
use English;

# Dynamic Variables
my %older_count;
my %older_inode;
my @newer_filelist;
my @events;

my $OLD_IN_FILE = "";

# Static Variables
# Directories
my $IN_DIR               = "/tmp/appo/log";    # Input Directories
my $OUTPUT_LOG_DIRECTORY = "/tmp/appo/A14";    # Output directory

# Files
my $SPLITTED_OUTFILE = "parse_log.csv";           # Splitted by month output file
my $R_STATS          = ".rotation_statistics";    # Rotation Statistic file

## MAIN

# Loading old statistics
if (-e $R_STATS) { 
   open (STAT_FILE, $R_STATS) or die $!;

    while ( <STAT_FILE> ) {
       my @lines = split /\n/;
       my ( $file, $inode, $nrows ) = $lines[0] =~ /\A(.\w.*);(\d.*);(\d.*)/;    # Encapsulate values

       push @{ $older_count{$file} }, $nrows;
       push @{ $older_inode{$file} }, $inode;
   }

   close( STAT_FILE );
}

# Loading new events from log
foreach my $INPUT ( glob( "$IN_DIR/logrotate_*.log" ) ) {

    my $inode        = ( stat( $INPUT ) )[1];
    my $currentinode = $older_inode{$INPUT}[0];

    my $jumprow = 0;
    $jumprow = $older_count{$INPUT}[0] if $currentinode == $inode; 

# Get current file stastistics
   if ( $INPUT ne $OLD_IN_FILE ) {
       my $count = ( split /\s+/, `wc -l $INPUT` )[0];
       push @newer_filelist, {
             filename => $INPUT,
             inode    => $inode,
             count    => $count
       };
    }

    # Log opening
    open my $fh, '<', $INPUT or die "can't read open '$INPUT': $OS_ERROR";

    $/ = "\n\n";    # record separator

    while ( <$fh> ) {

        # next unless $. > $jumprow; # This instruction doesn't work

        # Log processing
        my @lines = split /\n/;
        my $i     = 0;

        foreach my $lines ( @lines ) {

            # Take only Authentication rows and skip others
            if ( $lines[$i] =~ m/\A#\d.\d.+#\d{4}\s\d{2}\s\d{2}\s\d{2}:\d{2}:\d{2}:\d{3}#\+\d+#\w+#\/\w+\/\w+\/Authentication/ ) {

                # Shows only LOGIN/LOGOUT access type and exclude GUEST users
                if ( $lines[ $i + 2 ] =~ m/Login/ || $lines[ $i + 2 ] =~ m/Logout/ && $lines[ $i + 3 ] !~ m/Guest/ ) {

                    my ( $y, $m, $d, $time ) = $lines[$i] =~ /\A#\d.\d.+#(\d{4})\s(\d{2})\s(\d{2})\s(\d{2}:\d{2}:\d{2}:\d{3})/;

                    my ( $action ) = $lines[ $i + 2 ] =~ /(\w+)/;
                    my ( $user )   = $lines[ $i + 3 ] =~ /\w+:\s(.+)/;

                    push @events, {
                        date   => "$y/$m/$d",
                        time   => $time,
                        action => $action,
                        user   => $user
                    };  # Array loader
                }
            }
            else {
                next;
            }

            $i++;
        }

        $OLD_IN_FILE = $INPUT;
    }
    close( $fh );
}

# Print Log statistics for futher elaborations
open( STAT_FILE, '>', $R_STATS ) or die $!;

foreach my $my_filelist ( @newer_filelist ) {
    print STAT_FILE join ';', $my_filelist->{filename}, $my_filelist->{inode}, "$my_filelist->{count}\n";
}

close( STAT_FILE );

my @by_user = sort { $a->{user} cmp $b->{user} } @events;    # Sorting by users

foreach my $my_list ( @by_user ) {

    my ( $y, $m ) = $my_list->{date} =~ /(\d{4})\/(\d{2})/;

    # Generate Directory YYYY-Month - #2009-January
    my $directory = File::Spec->catfile( $OUTPUT_LOG_DIRECTORY, "$m-$y" );

    unless ( -e $directory ) {
        mkpath( $directory, { verbose => 1 } );
    }

    my $log_file_path = File::Spec->catfile( $directory, $SPLITTED_OUTFILE );

    open( OUTPUT, '>>', $log_file_path ) or die $!;
    print OUTPUT join ';', $my_list->{date}, $my_list->{time}, $my_list->{action}, "$my_list->{user}\n";
}

close( OUTPUT );

My log files are

logrotate_1.0.log

#2.0^H#2018 05 29 10:09:45:969#+0200#Info#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103EC9E50000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER1#5##C47731E44D00000bae##0#Thread[HTTP Worker [@1473726842],5,Dedicated_Application_Thread]#Plain##
Login
User: USER4
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 29 11:51:06:541#+0200#Info#/Sy/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103EC9F50000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER4#6##A40B81404D03c0bae##0#Thread[HTTP Worker [@1264376989],5,Dedicated_Application_Thread]#Plain##
Login
User: USER1
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 30 11:54:03:906#+0200#Info#/Sy/Sec/Informtion#
#BC-JAS-SEC#security#C0000A7103EC9F50000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER4#6##A40B81404D03c0bae##0#Thread[HTTP Worker [@1264376989],5,Dedicated_Application_Thread]#Plain##
Login
User: USER4
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 30 11:59:59:156#+0200#Info#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA0C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER3#7##9ACF7Ec0bae##0#Thread[HTTP Worker [@124054179],5,Dedicated_Application_Thread]#Plain##
Logout
User: USER3
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 30 08:32:11:348#+0200#Warn#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA20E0000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#03c0bae##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Login
User: USER4
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 30 11:09:54:978#+0200#Info#/Sys/Sec/Information#
#BC-JAS-SEC#security#C0000A7103ECA20E0000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#03c0bae##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Login
User: USER2
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 06 01 08:11:30:008#+0200#Warn#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA20050000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#0##E0E##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Logout
User: USER2
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 06 01 11:11:29:658#+0200#Info#/Sys/Sec/Information#
#BC-JAS-SEC#security#C0000A7103ECA20050000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#0##E0E##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Logout
User: USER1
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 06 02 12:00:00:254#+0200#Info#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA20050000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#0##E0E##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Logout
User: Guest
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 06 02 12:05:00:465#+0200#Warn#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA20050000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#0##E0E##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Logout
User: USER9
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 06 02 12:50:00:065#+0200#Warn#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA20050000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#0##E0E##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Login
User: USER9
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 24 10:43:38:683#+0200#Info#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103EC9E50000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER1#5##C47731E44D00000bae##0#Thread[HTTP Worker [@1473726842],5,Dedicated_Application_Thread]#Plain##
Login
User: USER1
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

logrotate_0.0.log

#2.0^H#2018 05 24 11:05:04:011#+0200#Info#/Sy/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103EC9F50000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER4#6##A40B81404D03c0bae##0#Thread[HTTP Worker [@1264376989],5,Dedicated_Application_Thread]#Plain##
Login
User: USER4
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 24 11:04:59:410#+0200#Info#/Sy/Sec/Informtion#
#BC-JAS-SEC#security#C0000A7103EC9F50000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER4#6##A40B81404D03c0bae##0#Thread[HTTP Worker [@1264376989],5,Dedicated_Application_Thread]#Plain##
Login
User: USER4
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 24 11:05:07:100#+0200#Info#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA0C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER3#7##9ACF7Ec0bae##0#Thread[HTTP Worker [@124054179],5,Dedicated_Application_Thread]#Plain##
Logout
User: USER3
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 24 11:07:21:314#+0200#Warn#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA20E0000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#03c0bae##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Login
User: USER2
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 24 11:07:21:314#+0200#Info#/Sys/Sec/Information#
#BC-JAS-SEC#security#C0000A7103ECA20E0000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#03c0bae##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Login
User: USER2
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 26 10:48:02:458#+0200#Warn#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA20050000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#0##E0E##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Logout
User: USER2
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 28 10:00:25:000#+0200#Info#/Sys/Sec/Information#
#BC-JAS-SEC#security#C0000A7103ECA20050000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#0##E0E##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Logout
User: USER0
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

I have a problem using the statementat line 54:

#next unless $. > $jumprow;

I think it doesn't work because I use following record separator, but I don't understand what kind of separator I have to use for solve this problem:

$/ = "\n\n";  # record separator

For debug the code i've insert following statement:

print "next unless $. > $jumprow\n";

As i can see, $. value is not the same of row number of file (The cause is record separator with double new line ---> $/ = "\n\n";)

If i remove my double new line, script doesn't work

Details of my script: (1)First Step: Read STAT_FILE for see rows readed in last run

(2)Second Step: I encapsulate Date, Time, Action( login or logout) and User (if isn't Guest) into an array (@events). I Sort array by user (not by date as default).

(3)Third Step: I print into STAT_FILE information about my logfile readed

(4)Fourth Step: I Write sorted @event array into a file parse_log.csv in a directory named MM-YYYY (it depends from date of my event).

Could you help me to get a solution for my script please?

回答1:

I thought we covered this yesterday.

if ( $currentinode == $inode ) {
    # Get rows to jump for this $INPUT
    my $jumprow = $older_count{$INPUT}[0];
}
else {
    # If file has been changed
    my $jumprow = 0;
}

Each of these blocks declares a new $jumprow variable. And each of those variables ceases to exist when you exit the block that they were declared in (i.e. on the very next line).

If you want to access these variables outside of the if/else blocks, then you need to declare them at a higher level.

my $jumprow;
if ( $currentinode == $inode ) {
    # Get rows to jump for this $INPUT
    $jumprow = $older_count{$INPUT}[0];
}
else {
    # If file has been changed
    $jumprow = 0;
}

Or (more simply):

my $jumprow = 0;
$jumprow = $older_count{$INPUT}[0] if $currentinode == $inode;

Or

my $jumprow = $currentinode == $inode ? $older_count{$INPUT}[0] : 0;