I'm trying to make a log analyser using perl. The analyser would run 24/7 in the background on an AIX server and read from pipes that syslog directs logs to (from the entire network). Basically:
logs from network ----> named pipe A --------> | perl daemon
----> named pipe B --------> | * reads pipes
----> named pipe c --------> | * decides what to do based on which pipe
So, for example, I want my daemon to be able to be configured to mail root@domain.com
all logs that are written to named pipe C
. For this, I'm assuming the daemon needs to have a hash (new to perl, but this seems like an appropriate data structure) that would be able to be changed on the fly and would tell it what to do with each pipe.
Is this possible? Or should I create a .conf
file in /etc
to hold the information. Something like this:
namedpipeA:'mail root@domain.com'
namedpipeB:save:'mail user@domain.com'
So getting anything from A
will be mailed to root@domain.com
and everything from B
will be saved to a log file (like it is usually) AND it will be sent to user@domain.com
Seeing as this is my first time using Perl and my first time creating a daemon, is there anyway for me to make this while adhering to the KISS principal? Also, are there any conventions that I should stick to? If you could take into consideration my lack of knowledge when replying it would be most helpful.
I'll cover part of your question: how to write a long-running Perl program that deals with IO.
The most efficient way to write a Perl program that handles many simultaneous IO operations is to use an event loop. This will allow us to write handlers for events, like "a line appeared on the named pipe" or "the email was sent successfully" or "we received SIGINT". Crucially, it will allow us to compose an arbitrary number of these event handlers in one program. This means that you can "multitask" but still easily share state between the tasks.
We'll use the AnyEvent framework. It lets us write event handlers, called watchers, that will work with any event loop that Perl supports. You probably don't care which event loop you use, so this abstraction probably doesn't matter to your application. But it will let us reuse pre-written event handlers available on CPAN; AnyEvent::SMTP to handle email, AnyEvent::Subprocess to interact with child processes, AnyEvent::Handle to deal with the pipes, and so on.
The basic structure of an AnyEvent-based daemon is very simple. You create some watchers, enter the event loop, and ... that's it; the event system does everything else. To get started, let's write a program that will print "Hello" every five seconds.
We start by loading modules:
Then, we'll create a time watcher, or a "timer":
Note that we assign the timer to a variable. This keeps the timer alive as long as
$t
is in scope. If we saidundef $t
, then the timer would be cancelled and the callback would never be called.About callbacks, that's the
sub { ... }
aftercb =>
, and that's how we handle events. When an event happens, the callback is invoked. We do our thing, return, and the event loop continues calling other callbacks as necessary. You can do anything you want in callbacks, including cancelling and creating other watchers. Just don't make a blocking call, likesystem("/bin/sh long running process")
ormy $line = <$fh>
orsleep 10
. Anything that blocks must be done by a watcher; otherwise, the event loop won't be able to run other handlers while waiting for that task to complete.Now that we have a timer, we just need to enter the event loop. Typically, you'll choose an event loop that you want to use, and enter it in the specific way that the event loop's documentation describes. EV is a good one, and you enter it by calling
EV::loop()
. But, we'll let AnyEvent make the decision about what event loop to use, by writingAnyEvent->condvar->recv
. Don't worry what this does; it's an idiom that means "enter the event loop and never return". (You'll see a lot about condition variables, or condvars, as you read about AnyEvent. They are nice for examples in the documentation and in unit tests, but you really don't want to ever use them in your program. If you're using them inside a.pm
file, you're doing something very wrong. So just pretend they don't exist for now, and you'll write extremely clean code right from the start. And that'll put you ahead of many CPAN authors!)So, just for completeness:
If you run that program, it will print "Hello" every five seconds until the universe ends, or, more likely, you kill it with control c. What's neat about this is that you can do other things in those five seconds between printing "Hello", and you do it just by adding more watchers.
So, now onto reading from pipes. AnyEvent makes this very easy with its AnyEvent::Handle module. AnyEvent::Handle can connect to sockets or pipes and will call a callback whenever data is available to read from them. (It can also do non-blocking writes, TLS, and other stuff. But we don't care about that right now.)
First, we need to open a pipe:
Then, we wrap it with an AnyEvent::Handle. After creating the Handle object, we'll use it for all operations on this pipe. You can completely forget about
$fh
, AnyEvent::Handle will handle touching it directly.Now we can use
$h
to read lines from the pipe when they become available:This will call the callback that prints "Got a line" when the next line becomes available. If you want to continue reading lines, then you need to make the function push itself back onto the read queue, like:
This will read lines and call
$handle_line->()
for each line until the file is closed. If you want to stop reading early, that's easy... just don'tpush_read
again in that case. (You don't have to read at the line level; you can ask that your callback be called whenever any bytes become available. But that's more complicated and left as an exercise to the reader.)So now we can tie this all together into a daemon that handles reading the pipes. What we want to do is: create a handler for lines, open the pipes and handle the lines, and finally set up a signal handler to cleanly exit the program. I recommend taking an OO approach to this problem; make each action ("handle lines from the access log file") a class with a
start
andstop
method, instantiate a bunch of actions, setup a signal handler to cleanly stop the actions, start all the actions, and then enter the event loop. That's a lot of code that's not really related to this problem, so we'll do something simpler. But keep that in mind as you design your program.Now you have a program that reads a line from an arbitrary number of pipes, prints each line received on any pipe (prefixed with the path to the pipe), and exits cleanly when you press Control-C!
First simplification - handle each named pipe in a separate process. That means you will run one perl process for each named pipe, but then you won't have to use event-based I/O or threading.
Given that, how about just passing the configuration data (path to the named pipe, email address to use, etc.) on the command line, e.g.:
Does that work for you?
To make sure the daemons stay up, have a look at a package like D. J. Bernstein's daemontools or supervisord (gasp! a python package).
Each of these packages tells you how to configure your rc-scripts so that they start at machine boot time.