Log normalization and the leading space
Log normalization is simple, but has its quirks. A common pitfall is syslog message format as induced by RFC3164. Let’s look at a common case: A log message has been sent to rsyslog. The message itself had no irregular characters. But, the message that should have been parsed by mmnormalize now has a leading space character. Basically, the message that should be parsed looks like this:
This is a test
Usually, one would think, that a simple parser can be used here. You might be correct, but there is a small caveat about this. The rulebase entry we currently have looks something like this:
rule=:%word1:word% %word2:word% %word3:word% %word4%
But strangely, rsyslog responds the following:
mmnormalize generated: {“originalmsg”: ” This is a test″, “unparsed-data”: ” This is a test″}
How comes, that rsyslog cannot parse the message? Why is there a leading space character in from of the message? The answer is, that messages are processed as RFC3164. In this RFC it is defined, that everything after the “:” of the syslog header is to be considered as the message. Thus, the message has a leading space now.
How is this to be solved? Simply insert the space to your rules in the rulebase. This will lead to a rule like this:
rule=: %word1:word% %word2:word% %word3:word% %word4%
Please note, that there has just the space character been added. Further, this is really only a example. The rule will fit to all messages that are 4 words long, so it is really not very suitable to be adopted to your configuration.