-->

alternative to polling database?

2020-05-29 16:31发布

问题:

I have an application that works as follows: Linux machines generate 28 different types of letter to customers. The letters must be sent in .docx (Microsoft Word format). A secretary maintains MS Word templates, which are automatically used as necessary. Changing from using MS Word is not an option.

To coordinate all this, document jobs are placed into a database table and a python program running on each of the windows machines polls the database frequently, locking out jobs and running them as necessary.

We use a central database table for the job information to coordinate different states ("new", "processing", "finished", "printed")... as well to give accurate status information.

Anyway, I don't like the clients polling the database frequently, seeing as they aren't working most of the time. Clients hpoll every 5 seconds.

To avoid polling, I kind of want a broadcast "there's some work to do" or "check your database for some work to do" message sent to all the client machines.

I think some kind of publish/subscribe message queue would be up to the job, but I don't want any massive extra complexity.

Is there a zero or near zero config/maintenance piece of software that would achieve this? What are the options?

X

回答1:

Is there any objective evidence that any significant load is being put on the server? If it works, I'd make sure there's really a problem to solve here.

It must be nice to have everything running so smoothly that you're looking at things that might only possibly be improved!



回答2:

Is there a zero or near zero config/maintenance piece of software that would achieve this? What are the options?

Possibly, but what you would save in configuration and implementation time would likely hurt performance more than your polling service ever could. SQL Server isn't made to do a push really (not easily anyway). There are things that you could use to push data out (replication service, log shipping - icky stuff), but they would be more complex and require more resources than your simple polling service. Some options would be:

  1. some kind of trigger which runs your executable using command-line calls (sp_cmdshell)

  2. using a COM object which SQL Server could open and run

  3. using a SQL Agent job to run a VBScript (which would again be considered "polling")

These options are a bit ridiculous considering what you have already done is much simpler.

If you are worried about the polling service using too many cycles or something - you can always throttle it back - polling every minute, every 10 minutes, or even just once a day might be more appropriate - this would be a business decision, so go ask someone in the business how fast it needs to be.

Simple polling services are fairly common, because they are, well... simple. In addition they are also low overhead, remotely stable, and error-tolerant. The down side is that they can hammer the database into dust if not carefully controlled.



回答3:

A message queue might work well, as they're usually setup to be able to block for a while without wasting resources. But with MySQL, I don't think that's an option.

If you just want to reduce load on the DB, you could create a table with a single row: the latest job ID. Then clients just need to compare that to their last ID to see if they need to run a full poll against the real table. This way the overhead should be greatly reduced, if it's an issue now.



回答4:

Unlike Postgres and SQL Server (or object stores like CouchDb), MySQL does not emit database events. However there are some coding patterns you can use to simulate this.

If you have one or more tables that you wish to monitor, you can create triggers on these tables that add a row to a "changes" table that records a queue of events to process. Your triggers filter the subset of data changes that you care about and create records in your changes table for each event you wish to perform. Because this pattern queues and persists events it works well even when the workers that process these events have outages.

You might think that MyISAM is the best choice for the changes table since it's mostly performing writes (or even MEMORY if you don't need to persist the events between database server outages). However, keep in mind that both Memory and MEMORY and MyISAM have only full-table locks so your trigger on an InnoDB table might hit a bottle neck when performing an insert into a MEMORY and MyISAM table. You may also require InnoDB for the changes table if you're using a ON DELETE CASCADE with another InnoDB table (requires both tables to use the same engine).

You might also use SHOW TABLE STATUS to check the last update time of you changes table to check if there's something to perform. This feature wont work for InnoDB tables.

These articles describes in more depth some of alternative ways to implement queues in MySQL and even avoid polling!

  • How to notify event listeners in MySQL
  • How to implement a queue in SQL
  • 5 subtle ways you're using MySQL as a queue, and why it'll bite you


标签: polling