Ok, so I'm obviously having some issues understanding how to work with hashes. Long story short, I'm attempting to parse through an ftp log and find the relevant flows for a specific search criteria. Basically what I'm trying to make it do is, say I have an IP address or a user name, it first does a pretty simple grep to try to minimize any data I don't need and send the output to an external file. If I'm searching for username testing1, then it does a grep on testing1 and sends the output to another file called output.txt:
Dec 2 00:14:09 ftp1 ftpd[743]: USER testing1
Dec 2 00:14:09 ftp1 ftpd[743]: FTP LOGIN FROM 192.168.0.2 [192.168.0.2], testing1
Dec 2 00:30:08 ftp1 ftpd[1261]: USER testing1
Dec 2 00:30:09 ftp1 ftpd[1261]: FTP LOGIN FROM 192.168.0.4 [192.168.0.4], testing1
Dec 2 01:12:33 ftp1 ftpd[11804]: USER testing1
Dec 2 01:12:33 ftp1 ftpd[11804]: FTP LOGIN FROM 192.168.0.2 [192.168.0.2], testing1
And below is an example of the originating log data:
Dec 1 23:59:03 ftp1 ftpd[4152]: USER testing1
Dec 1 23:59:03 ftp1 ftpd[4152]: PASS password
Dec 1 23:59:03 ftp1 ftpd[4152]: FTP LOGIN FROM 192.168.0.02 [192.168.0.2], testing1
Dec 1 23:59:03 ftp1 ftpd[4152]: PWD
Dec 1 23:59:03 ftp1 ftpd[4152]: CWD /test/data/
Dec 1 23:59:03 ftp1 ftpd[4152]: TYPE Image
I then go in, put all the processIDs that I find along with the time of that ID and put them into a hash. Which is what you see below:
$VAR1 = {
'743' => [
'00:1'
],
'20687' => [
'01:3'
],
'27186' => [
'15:3'
],
'6929' => [
'12:0'
],
'24771' => [
'09:0'
],
'11804' => [
'01:1'
],
'27683' => [
'08:3'
],
'14976' => [
'04:3'
],
};
It looks as if the time is being put into the hash as an array. I was unable to figure out why this is happening to I decided to work with it as an array. The following is how the hash of arrays are created:
# -------------------------------------------------------
# Extract PIDs and Time from lines, take out doubles
# -------------------------------------------------------
my $infile3 = 'output.txt';
my %pids;
my $found;
my $var;
open (INPUT2, $infile3) or die "Couldn't read $infile3.\n";
while (my $line = <INPUT2>) {
if($line =~ /(\d{2})\:(\d)/ ) {
my $hhmm = $1 . ":" . $2;
if ($line =~ /ftpd\[(.*?)\]/) {
$found = 0;
foreach $var(keys %pids){
if(grep $1 =~ $var, keys %pids){
$found = 1;
}
}
if ($found == 0){
push @{$pids{$1}}, $hhmm;
}
}
}
}
To speed things up I have decided to read all the lines that have the matching PIDs, whether they fit the flow or not, into an array so I don't have to keep reading in the originating file.
##-------------------------------------------------------
## read each line from file into an array
##-------------------------------------------------------
open (INPUT, $infile2) or die "Couldn't read $infile2.\n";
my @messages;
while (my $line = <INPUT>){
# if there is a match to the PID then put the line in the array
if ($line =~ /ftpd\[(.*?)\]/){
my $mPID = $1;
foreach my $key (keys %pids){
if ($key =~ $mPID){
push @messages, $line;
}
}
}
}
I'm now trying to match the line up with the PID and the Time to get the flow. I'm only matching the hh:m in the time for more of a chance to get the entire flow and because chances of other flows with a PID having the same timeframe is pretty slim. Eventually all these results will be send to an internal web page.
# -------------------------------------------------------
#find flow based on PID that was found from criteria
#-------------------------------------------------------
foreach my $line(@messages){
if(my($pid) = $line =~ m{ \[ \s*(\d+) \]: }x) {
if($line =~ /(\d{2})\:(\d)/){
my $time = $1 . ":" . $2;
if ($pids{$pid}[0] =~ /$time/){
push $pids{$pid}[0], $line;
}
}
}
}
Right now the above code for some reason is actually deleting the time from the hash once it is matched. I am unsure why this is happening.
I was able to get is working with a bash script but took decades for it to complete. Thanks to suggestions from people here I have decided to tackle it with Perl so am basically taking a crash course. I've read everything I can and have basic programming skills in C++ but obviously still need a lot of work. I also got it working using arrays but once again it was incredibly slow and i was getting a lot of flows that matched the process ID but were not the flows I was looking for. So after further suggestions I decided to work with hashes, have the process ID as the key, have a specific time referenced to that key, and then lines within the log that have both that key and time as the flow. I have had multiple questions on this already but have A. Not explained myself clearly and B. have been trying different things as I learn. But for the record everyone here has helped me tremendously and I hope that one day I can do the same for others on this list. For some reason I just can't get this stuff through my thick skull.
Anyways, hopefully I covered everything, I'm sure I'm starting to get on people's nerves with all these questions so I apologize.
UPDATE:
Well I think I figured out how to make it all hashes but doesn't look right. I changed push @{$pids{$1}}, $hhmm;
to $pids{$1}{$x} = $hhmm;
which creates the following:
$VAR1 = {
'743' => {
'' => '00:1'
},
'20687' => {
'' => '01:3'
},
But it doesn't look like it's referencing correctly so when I do print $pids{743};
all it prints is HASH(0x4caf10)
UPDATE:
Ok, I was able to put all the values into hashes by changing @{$pids{$1}}, $hhmm;
to $pids{$1} = $hhmm;
which seems to be working:
$VAR1 = {
'743' => '00:1',
'20687' => '01:3',
};
But now how do I check to see if the value '00:1' matches another variable? This is what I currently have and is not working:
if($pids{$pid} == qr/$time/){
$pids{$pid}{$time}[$y] = $line;
$y++;
};
This is how it should look after the match is made:
$VAR1 = {
'743' => '00:1',
'4771' => {
'23:5' => [
'Dec 1 23:59:23 ftp1 ftpd[4771]: USER test
',
'Dec 1 23:59:23 ftp1 ftpd[4771]: PASS password
',
'Dec 1 23:59:23 ftp1 ftpd[4771]: FTP LOGIN FROM 192.168.0.2 [192.168.0.2], test
',
'Dec 1 23:59:23 ftp1 ftpd[4771]: CWD /home/test/
',
'Dec 1 23:59:23 ftp1 ftpd[4771]: TYPE Image
',
'Dec 1 23:59:23 ftp1 ftpd[4771]: PASV
',
'Dec 1 23:59:23 ftp1 ftpd[4771]: RETR test
',
'Dec 1 23:59:23 ftp1 ftpd[4771]: QUIT
',
'Dec 1 23:59:23 ftp1 ftpd[4771]: FTP session closed
'
]
},