Current logging status is rather painful to parse: we have a date, a source field, a logging level and after that, the data is quite arbitrary and difficult to parse (need to have several different regular expressions to capture the values, and then more logic in order to classify them).
Adding to this, the context is often lacking and requires looking in prior logs to understand what happens, e.g. a c2s connection will first be logged under "socket", then have a proper ID; but we can only assume that those two logs are correlated, because there is no way to be sure (this could probably be part of another issue).
Same thing with received c2s stanzas: they often do not contain a from= and require backwards source field lookup in order to see what they are about.
If we could have *simple* structured and mostly context-free logging, it would tremendously help real-time server monitoring by allowing to write filters for popular tools, to have visualization of the server activity first (by finding out abusers and pinpoint external attacks), and then having a programmable interface which we can use to act on this data. (I am thinking of behemoths like fluentd/logstash coupled with elasticsearch, but it would also help a lot if one were to write custom tooling.)
By structured logging, I mean either a flat line of key=value items with specific event names and, or the same in json (without nested objects). That could still accommodate full messages in a "description" field or something (doing string interpolation on it with a few "macro" helpers would essentially keep the messages just as human-readable, afaik). Of course, this does not come for free and the code would certainly be more verbose, and probably a little slower (but I doubt logging would ever become the bottleneck anyway, unless it gets really complicated). It could log at the same time both to the default log file with human-readable logs, and to another files with the "easy to parse" logs if needed.
Here is an example of one such logging call:
slog("info", "s2sin5424", {event="received", event_class="c2s",
stanza_from="blah@jid", stanza_to="bloh@jid/blih", stanza_tag="iq", stanza_type="get",
description="Received[%(event_type)]: %(stanza_expand)"})
I don’t have a proposal for backwards compatibility, as doing that will require passing a table to the logging call since lua doesn't have named parameters; a new function name is probably the best compromise. Obviously this proposal probably misses a lot of potential issues because I am not very familiar with the prosody codebase or lua, so please tell me if I missed anything.
I am not aware of any library in lua allowing that kind of thing, so it would probably have to be homegrown (I believe a full-blown thing like [1] in python or [2] in go would be overkill).
[1]: https://structlog.readthedocs.io/en/stable/
[2]: https://github.com/sirupsen/logrus
MattJ
on
I'm assigning to waqas as he has done some work in this area.
Current logging status is rather painful to parse: we have a date, a source field, a logging level and after that, the data is quite arbitrary and difficult to parse (need to have several different regular expressions to capture the values, and then more logic in order to classify them). Adding to this, the context is often lacking and requires looking in prior logs to understand what happens, e.g. a c2s connection will first be logged under "socket", then have a proper ID; but we can only assume that those two logs are correlated, because there is no way to be sure (this could probably be part of another issue). Same thing with received c2s stanzas: they often do not contain a from= and require backwards source field lookup in order to see what they are about. If we could have *simple* structured and mostly context-free logging, it would tremendously help real-time server monitoring by allowing to write filters for popular tools, to have visualization of the server activity first (by finding out abusers and pinpoint external attacks), and then having a programmable interface which we can use to act on this data. (I am thinking of behemoths like fluentd/logstash coupled with elasticsearch, but it would also help a lot if one were to write custom tooling.) By structured logging, I mean either a flat line of key=value items with specific event names and, or the same in json (without nested objects). That could still accommodate full messages in a "description" field or something (doing string interpolation on it with a few "macro" helpers would essentially keep the messages just as human-readable, afaik). Of course, this does not come for free and the code would certainly be more verbose, and probably a little slower (but I doubt logging would ever become the bottleneck anyway, unless it gets really complicated). It could log at the same time both to the default log file with human-readable logs, and to another files with the "easy to parse" logs if needed. Here is an example of one such logging call: slog("info", "s2sin5424", {event="received", event_class="c2s", stanza_from="blah@jid", stanza_to="bloh@jid/blih", stanza_tag="iq", stanza_type="get", description="Received[%(event_type)]: %(stanza_expand)"}) I don’t have a proposal for backwards compatibility, as doing that will require passing a table to the logging call since lua doesn't have named parameters; a new function name is probably the best compromise. Obviously this proposal probably misses a lot of potential issues because I am not very familiar with the prosody codebase or lua, so please tell me if I missed anything. I am not aware of any library in lua allowing that kind of thing, so it would probably have to be homegrown (I believe a full-blown thing like [1] in python or [2] in go would be overkill). [1]: https://structlog.readthedocs.io/en/stable/ [2]: https://github.com/sirupsen/logrus
I'm assigning to waqas as he has done some work in this area.
Changes