Say you were writing a daemon that services a job queue. Various other software writes jobs for the daemon into the queue. The daemon polls the queue every few seconds for pending jobs. Assume the queue is implemented as a table in a MySQL database and that the daemon is a simple loop:
- get all due jobs from the queue
- do the jobs
- sleep for N seconds
- goto 1
The daemon must survive interrupted service from the MySQL DB server and disruption to DB connections.
Would you design the daemon to connect to the DB server once per cycle? i.e. connect before 1. and disconnect between 2 an 3?
Or would you have daemon keep a connection open? In which case it needs also to a) detect when the server or connection is not working, b) disconnect and reconnect, and c) do so without accumulating DB connections, dud connection descriptors or other dead resources.
If you have a preference, why?
Pros and cons?
Factors that enter into the design?
Any other approaches?
The answer here: mysql connection from daemon written in php doesn't say why it's better to keep the connection open. I read elsewhere that the per-connection overhead in MySQL is very light. So it isn't obvious why permanently consuming one server connection is better than connecting/disconnecting every few seconds.
In my case the daemon is written in PHP.
I'm actually work on something very close to what you described, but in my case the daemon doesn't poll for event it get's them asynchronously via XMPP (but that's besides the point).
Cut out the Middle Man
I think instead of storing the events in the database and polling them w/ MySQL you could probably use Gearman to send them from the client asynchronously (example).
Garbage Collection
PHP isn't really designed to run as a daemon, and it wasn't until PHP 5.3 when it got circular reference garbage collection that it became a viable option. It's very important that you use PHP 5.3 if you want any chance at long term running without memory leaks.
Another thing to keep in mind about GC is that memory is only free if it's not longer referenced (anywhere). So if you assign a variable to the global scope, it'll be there until the daemon exits. It's important that any code you create or use doesn't built up variables in places (ie, static log, not removing old data, etc).
Stat Cache
Another thing is that it's important to run clearstatcache
every so often. Since your PHP process isn't restarted it's important to do this call manually to prevent getting old stat data (which may or may not effect you). According to the documentation these functions are cached.
Affected functions include stat(), lstat(), file_exists(), is_writable(), is_readable(), is_executable(), is_file(), is_dir(), is_link(), filectime(), fileatime(), filemtime(), fileinode(), filegroup(), fileowner(), filesize(), filetype(), and fileperms().
Resource management
If your going to be using thing like MySQL during the lifetime of your process, I'd suggest making one connection at startup and keeping it alive. Even though it might use more ram on the MySQL side, you'll cut out some latency and CPU overhead by not having to connect every 1 second.
No URL request
This may seem obvious, but with CLI PHP there is no URL request info. Some libraries aren't written with this in mind, and this can cause some problems.
LooPHP
I'm going to pop a shameless plug in here for a framework I wrote to help with the management of PHP daemons. LooPHP is a run loop framework that lets you schedule event to happen or create listens for abstract sources (socket, stream, etc). In my case, I have the daemon doing more than 1 thing, so it's very helpful to have system keep track of all the timers for me so that I can effectively poll stream_select
for the XMPP connection.
If you want to make a reliable daemon, you will need to catch database errors/disconnects and reconnect in either case (disconnecting or staying connected). Since you need to do this anyway, you might as well re-use one connection.
In other words, just because you have a freshly opened connection it does not mean that the query won't fail and the connection need to be re-opened and tried again.
So, I believe that the cleanest way would be to preserve the connection. But only barely.
I think that the best thing you could do is to measure the time it takes to connect/disconnect to/from the database. Then try and come up with some sort of likelihood of the database server becoming unavailable. Determine the cost of a permanent server connection. And finally try to determine the cost (in hours, annoyance, or whatever) of adding code that deals with database connectivity issues. If you can successfully determine those numbers, and compare them, you have an answer. It is difficult for me (and I assume anyone) to come up with good guesstimates for those values, but the conclusion will probably be that it is a choice between performance versus feasibility (in terms of lower cost).