Implementing a “rules engine” in Python

2019-01-31 01:46发布

I'm writing a log collection / analysis application in Python and I need to write a "rules engine" to match and act on log messages.

It needs to feature:

  • Regular expression matching for the message itself
  • Arithmetic comparisons for message severity/priority
  • Boolean operators

I envision An example rule would probably be something like:

(message ~ "program\\[\d+\\]: message" and severity >= high) or (severity >= critical)

I'm thinking about using PyParsing or similar to actually parse the rules and construct the parse tree.

The current (not yet implemented) design I have in mind is to have classes for each rule type, and construct and chain them together according to the parse tree. Then each rule would have a "matches" method that could take a message object return whether or not it matches the rule.

Very quickly, something like:

class RegexRule(Rule):
    def __init__(self, regex):
         self.regex = regex

    def match(self, message):
         return self.regex.match(message.contents)

class SeverityRule(Rule):
    def __init__(self, operator, severity):
         self.operator = operator

    def match(self, message):
         if operator == ">=":
             return message.severity >= severity
         # more conditions here...

class BooleanAndRule(Rule):
    def __init__(self, rule1, rule2):
         self.rule1 = rule1
         self.rule2 = rule2

    def match(self, message):
          return self.rule1.match(message) and self.rule2.match(message)

These rule classes would then be chained together according to the parse tree of the message, and the match() method called on the top rule, which would cascade down until all the rules were evaluated.

I'm just wondering if this is a reasonable approach, or if my design and ideas are way totally out of whack? Unfortunately I never had the chance to take a compiler design course or anything like that in Unviersity so I'm pretty much coming up with this stuff of my own accord.

Could someone with some experience in these kinds of things please chime in and evaluate the idea?

EDIT: Some good answers so far, here's a bit of clarification.

The aim of the program is to collect log messages from servers on the network and store them in the database. Apart from the collection of log messages, the collector will define a set of rules that will either match or ignore messages depending on the conditions and flag an alert if necessary.

I can't see the rules being of more than a moderate complexity, and they will be applied in a chain (list) until either a matching alert or ignore rule is hit. However, this part isn't quite as relevant to the question.

As far the syntax being close to Python syntax, yes that is true, however I think it would be difficult to filter the Python down to the point where the user couldn't inadvertently do some crazy stuff with the rules that was not intended.

5条回答
The star\"
2楼-- · 2019-01-31 02:36

You might also want to look at PyKE.

查看更多
霸刀☆藐视天下
3楼-- · 2019-01-31 02:41

Have you considered NebriOS? We just released this tool after building it for our own purposes. It's a pure Python/Django rules engine. We actually use it for workflow tasks, but it's generic enough to help your in your situation. You would need to connect to the remote DB and run the rules against it.

The Python you wrote would end up being more than one rule in the system. Here's one for example:

class SeverityRule(NebriOS):
    # add listener or timer here

check(self):
    if operator == ">="
    return message.severity >= severity
    # any number of checks here


action(self):
    csv.append(message.severity, datetime.now)
    send_email(kamil@example.com, """Severity level alert: {{message.severity}}""")

Check it out at http://nebrios.com. I didn't realize how much there was to building a rules engine application like Nebri until after my team started. It was a HUGE task that goes much deeper that it seems. ACL, queues, forms, KVPS's, efficient checks, useful errors, email parsing, dynamic forms, and the list goes on.

查看更多
乱世女痞
4楼-- · 2019-01-31 02:45

The only place that's different from Python syntax itself is the message ~ "program\\[\d+\\]: message" part, so I wonder if you really need a new syntax.

Update: OK, you have either usability or safety concerns -- that's reasonable. A couple suggestions:

  • Take a hint from Awk and streamline the pattern-matching syntax, e.g. /program\[\d+\]: message/ instead of message ~ "program\\[\d+\\]: message".

  • I'd implement it by translating to a Python expression as the input is parsed, instead of building a tree of objects to evaluate, unless you expect to be doing more operations on these things than evaluation. This should need less code and run faster. The top level might go something like:

    def compile(input):
        return eval('lambda message, severity: %s' % parse(input))
    

Another idea, further afield: write your app in Lua. It's designed for nonprogrammers to extend programs reasonably safely without needing to learn a lot. (It's been used that way successfully, and you can sandbox evaluation so user's code can't get at any capabilities you don't pass to it explicitly, I'm told.)

I'll shut up now. :-) Good luck!

查看更多
【Aperson】
5楼-- · 2019-01-31 02:51

It's a little hard to answer the question without knowing what the scope of the application is.

  • What are you trying to reason about?
  • What level of analysis are you talking about?
  • How complicated do you see the rules becoming?
  • How complicated is the interplay between different rules?

On one end of the spectrum is a simple one-off approach like you have proposed. This is fine if the rules are few, relatively simple, and you're not doing anything more complicated than aggregating log messages that match the specified rules.

On the other side of the spectrum is a heavy-weight reasoning system, something like CLIPS that has a Python interface. This is a real rules engine with inferencing and offers the ability to do sophisticated reasoning. If you're building something like a diagnostic engine that operates off of a program log this could be more appropriate.

EDIT:

I would say that the current implementation idea is fine for what you're doing. Anything much more and I think you probably run the risk of over-engineering the solution. It seems to capture what you want, matching on the log messages just based on a few different criteria.

查看更多
来,给爷笑一个
6楼-- · 2019-01-31 02:52

Do not invent yet another rules language.

Either use Python or use some other existing, already debugged and working language like BPEL.

Just write your rules in Python, import them and execute them. Life is simpler, far easier to debug, and you've actually solved the actual log-reading problem without creating another problem.

Imagine this scenario. Your program breaks. It's now either the rule parsing, the rule execution, or the rule itself. You must debug all three. If you wrote the rule in Python, it would be the rule, and that would be that.

"I think it would be difficult to filter the Python down to the point where the user couldn't inadvertently do some crazy stuff with the rules that was not intended."

This is largely the "I want to write a compiler" argument.

1) You're the primary user. You'll write, debug and maintain the rules. Are there really armies of crazy programmers who will be doing crazy things? Really? If there is any potential crazy user, talk to them. Teach Them. Don't fight against them by inventing a new language (which you will then have to maintain and debug forever.)

2) It's just log processing. There's no real cost to the craziness. No one is going to subvert the world economic system with faulty log handling. Don't make a small task with a few dozen lines of Python onto a 1000 line interpreter to interpret a few dozen lines of some rule language. Just write the few dozen lines of Python.

Just write it in Python as quickly and clearly as you can and move on to the next project.

查看更多
登录 后发表回答